this post was submitted on 26 Jan 2025

129 points (97.1% liked)

Linux

50577 readers

1085 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
No misinformation
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago

MODERATORS

AgreeableLandscape@lemmy.ml

nooter692@lemmy.ml

MarcellusDrum@lemmy.ml

cypherpunks@lemmy.ml

cyclohexane@lemmy.ml

d3Xt3r@lemmy.nz

129

What do y’all use to monitor many linux servers? (lemm.ee)

submitted 1 month ago* (last edited 2 weeks ago) by shootwhatsmyname@lemm.ee to c/linux@lemmy.ml

53 comments fedilink hide all child comments

I’m hoping to find something that:

has a nice dashboard
is quick and simple to install
is very lightweight and unobtrusive
can send alerts via http request

Edit: Thanks everyone, love this community! I went with Beszel, lots of other good recommendations too

top 50 comments

sorted by: hot top controversial new old

[–] reisub@discuss.tchncs.de 41 points 1 month ago (2 children)

Node exporter, Prometheus and grafana

[–] dann@hexbear.net 2 points 1 month ago

This

[–] MrPoopyButthole@lemmy.dbzer0.com 1 points 1 month ago

this is the way

[–] DarkDarkHouse@lemmy.sdf.org 33 points 1 month ago (3 children)

I use my family. It has a simple volume based alert for when services are offline.

[–] vfsh@lemmy.blahaj.zone 5 points 1 month ago

It'll even automatically configured variable alert volumes corresponding to the importance of the service!

[–] fmstrat@lemmy.nowsci.com 2 points 4 weeks ago

Until the UPS battery gets low and it beeps, and they look for a way to turn it off vs calling you. Yup.

load more comments (1 replies)

[–] Mora@pawb.social 22 points 1 month ago (3 children)

Beszel. Probably the easiest tool of all the mentioned in this thread.

https://github.com/henrygd/beszel

[–] JustARegularNerd@aussie.zone 3 points 1 month ago (1 children)

Seconded. My only complaint (which this might already be a feature I haven't found yet) is it doesn't seem to support multiple drives. But yes, it is shit easy to set up and has a beautiful UI

[–] Mora@pawb.social 5 points 1 month ago (1 children)

Totally possible:

https://beszel.dev/guide/additional-disks

[–] JustARegularNerd@aussie.zone 5 points 1 month ago

I no longer have any complaints about Beszel. Thank you!

[–] dan@upvote.au 2 points 3 weeks ago

I'm working on making it easier to install on Debian systems by creating a Debian package (and eventually a repo): https://github.com/henrygd/beszel/pull/497

load more comments (1 replies)

[–] iii@mander.xyz 13 points 1 month ago* (last edited 1 month ago)

uptime-kuma is what I use

[–] loganb@lemmy.world 11 points 1 month ago (2 children)

I personally use CheckMK.

Offer a free "Raw" version.
Can be deployed with docker.
OSS

One thing is that it can be a lot to take in at first and took me a while to get used to it.

[–] corsicanguppy@lemmy.ca 2 points 1 month ago

CheckMk user here via omd.

I'm looking for something else after the upgrade.

Black interface isn't pretty for me and the old interface was "meh too hard so we ditched it".
One half of the project split has a shit supply chain and just doesn't meet the bar for upgrade requirements.
The other half of the project split is a mess to config in an automated desired-state setup. It's all edge-triggered manual bullshit. NO. ENOUGH.

I miss 1.2 .

[–] hobbsc@lemmy.sdf.org 1 points 1 month ago

checkmk user here. i can second the adjustment phase. i tend to ignore my servers but when something goes sideways it's awesome to have checkmk's structure in place.

[–] RegalPotoo@lemmy.world 10 points 1 month ago (2 children)

Base ansible role installs Prometheus node exporter, configured with the text file collector
VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
Grafana for dashboards
Karma as a UI in front of Prometheus alert manager

[–] tetris11@lemmy.ml 1 points 1 month ago (1 children)

Cron jobs that generate metrics for specific systems and dump them for the text file collector

Details please

[–] RegalPotoo@lemmy.world 2 points 1 month ago

https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector - which makes node exporter watch a specific directory for files that contain metrics, then re-export them back to the central Prometheus server
Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don't need extra config in Prometheus to find the endpoints, and don't need to mess with firewall rules
Other systems don't directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape

[–] Toribor@corndog.social 1 points 1 month ago

Any chance you'd be willing to share playbooks or point me toward any resources you used?

I use Ansible to manage config across all my workstations/servers but I haven't gotten around to automating log shipping yet or aggregating system metrics.

[–] phoenixz@lemmy.ca 8 points 1 month ago* (last edited 1 month ago)

We just recently started using zabbix. Open source and has a web interface to get a central view that can be accessed from wherever we allow it.

So far it's been great but er have had little time and so far have used only 1% of what it can do

Still, I'd recommend it. Super easy to install, seems light weight, has clients for any os you'd need, can send out alerts (we currently use pushover for that)

[–] tath@social.tath.link 8 points 1 month ago (1 children)

Zabbix is pretty quick and easy. Many different services built in for sending notifications, along with your own custom (including webhooks). Fully customizable dashboard as well so you can add whatever you want/need at a glance.

[–] Impromptu2599@lemmy.world 1 points 1 month ago

I stopped by to say the same thing. I use Zabbix to monitor everything

[–] LainTrain@lemmy.dbzer0.com 7 points 1 month ago (3 children)

Cockpit.

[–] dkc@lemmy.world 4 points 1 month ago

I’ve been really enjoying Cockpit as well.

[–] hobbsc@lemmy.sdf.org 2 points 1 month ago (1 children)

is cockpit on a server by server basis or can you monitor multiple servers with it?

[–] cmc@discuss.tchncs.de 3 points 1 month ago

You can monitor multiple machines via the host switcher menu at the top-left of the screen: Multiple Machines

[–] corsicanguppy@lemmy.ca 2 points 1 month ago (1 children)

My cockpit experience has been unilaterally dreadful. I'm glad you're getting value out of it.

[–] LainTrain@lemmy.dbzer0.com 1 points 1 month ago

How comes?

[–] Andromxda@lemmy.dbzer0.com 5 points 1 month ago* (last edited 1 month ago) (2 children)

Netdata is exactly what you're looking for. It's basically an all in one monitoring and and alerting suite that collects and analyzes data, and provides a gorgeous web dashboard for you to view.

You can also manually replicate this using Prometheus, Grafana and other tools, but that requires a much bigger effort to set up.

Edit: There's a public demo instance where you can try everything out: https://frankfurt.netdata.rocks/

[–] ikidd@lemmy.world 2 points 1 month ago (3 children)

I think they went to 5 nodes max on the free version as of the last patch. That's damn near useless.

[–] Andromxda@lemmy.dbzer0.com 1 points 1 month ago

Oh that sucks. I haven't used it personally in quite a while, since I switched to the Grafana stack

[–] ipkpjersi@lemmy.ml 1 points 1 month ago (1 children)

Is that just for the centralized dashboard portion? I tend to use each instance of it standalone, and primarily for the email alerts.

[–] ikidd@lemmy.world 2 points 1 month ago (1 children)

I believe so. I imagine the next stage of the enshittification will be to force those standalones to register with a portal account.

load more comments (1 replies)

[–] Toribor@corndog.social 1 points 1 month ago* (last edited 1 month ago)

The five node limit is a dealbreaker for me too. I'm also annoyed the free version doesn't have any real built in options to secure data by default. I followed a TechnoTim tutorial to get the NetData/Prometheus/Grafana stuff setup but it was too limited and required too much manual effort.

[–] ipkpjersi@lemmy.ml 2 points 1 month ago

Seconding Netdata, I've been using it for years. It's pretty great.

[–] static09@lemmy.world 5 points 1 month ago

Check out Netdata or Zabbix.

[–] notabot@lemm.ee 4 points 1 month ago

Nagios. It does depend on what you mean by monitor though. Nagios is good at telling you that "service A on host B" is down" but less useful for looking at things like performance trends. I particularly like being able to setup dependencies between services, so I get the alert for the root cause, and not all of the services that have gone down because of it.

[–] utopiah@lemmy.ml 4 points 1 month ago (1 children)

send alerts via http request

On this specifically you might want to check ntfy as it's quite easy to setup and can give you notifications on pretty much any device (including iOS) via your own infrastructure all the way down to basics e.g. SSE. That mean you can subscribe to a topic, e.g. servers per physical location, alert level, etc and only get the ones you need.

[–] utopiah@lemmy.ml 5 points 1 month ago (1 children)

Node exporter, Prometheus and grafana

Otherwise much heavier but that's also what I use.

[–] MrPoopyButthole@lemmy.dbzer0.com 1 points 1 month ago

same

[–] eldereko@lemmy.dbzer0.com 3 points 1 month ago

telegraf, influxdb, grafana, and gatus

[–] ocean@lemmy.selfhostcat.com 3 points 1 month ago

I just see if it works when I need it. If I’m at home it works. If I’m at work it may work. If I’ve left to travel it’s 95% definitely down and cannot be fixed. This works well!

[–] sgh@lemmy.ml 2 points 1 month ago

While I use LibreNMS as it uses SNMP for monitoring (which is pretty much available everywhere), I don't believe it has http alerts, but I know for a fact that it can send Telegram messages.

[–] Cysioland@lemmygrad.ml 2 points 1 month ago

Zabbix

[–] Ozymandias1688@feddit.org 2 points 1 month ago

Serverbox. https://github.com/lollipopkit/flutter_server_box

[–] protokaiser@lemmy.world 1 points 1 month ago

I remember liking Sensu. We used it a little bit at my previous job, but I didn't get a chance to work with it much. I can't remember what we specifically used it for though. Sorry, wish I had more info for you.

[–] maniel@lemmy.ml 1 points 1 month ago

Telegraf+influxdb+grafana is what I use at work, it is a multi purpose tool though, can be used to monitor EVERYTHING though

[–] elucubra@sopuli.xyz 1 points 1 month ago

Ages ago I used to use Webmin. I have no clue as how it stacks up to others nowadays.

[–] hindy@mbin.lovetux.net 1 points 1 month ago

Hello,

I'm still using Nagios here. And for the availability of the services I'm using uptime-kuma (in a docker).

[–] spicehoarder@lemm.ee 1 points 1 month ago

Not exactly what you're looking for, but I like using proxmox

load more comments