Experience has taught me that everything that can be monitored with Nagios should be monitored.
I have lately had a few occasions of my SatNOGS client (which is a RPi 3a) malfunctioning. Luckily I was notified by email by the SatNOGS system so I could quickly remedy the situation. However, I would like my Nagios to notify me as soon as something bad occurs.
The recent occasions have all been that something has happened to the client software. The RPi itself was still running, so I need a Nagios plugin that can monitor the operation of the client software.
As far as I can judge there is no Nagios plugin for SatNOGS, so I’ll write my own. What I now need - and that is the real reason why I write this post - are some hints to how best to monitor the operation of the client software.
The client is idle about half of the time and doing a pass the remainder of the time. But isn’t it so that it every few minutes polls the database server for new work? If that is the case then monitoring the poll would be an obvious candidate for my plugin.
But any other suggestion would be most appreciated.
Yes, the client does poll the database server frequently (every 60 seconds?) so you could opt to use that as one trigger. That may produce a bunch of false positives if your network connection/DNS are not rock solid 24/7.
I do use Nagios at work but would have a hard time envisioning it working well with SatNOGS for things other than ping, files system available storage, temps, process status of the system the client is running on. All these system indicators can be normal but an observation might still be classified as ‘failed’ for other reasons (i.e, missing ogg/waterfall files because SDR device is in an error state, excessive time delay in delivery of valid observation results for whatever reason).
That said, I would be interested in any Nagios plugins that you would be willing to share if you decide to go this route!
My approach at this point – because I also want to better monitor my satnogs ground station status – is to download my station’s recent result using the network API. Then search for ‘recent failed’ status --> send out an email alert if a ‘failed’ observation exists. I have not scripted this yet but it’s on my to-do list
Thanks for your post. You present several interesting ideas and options. I will (try to) to pursue most if not all of them.
To me the main purpose is to ascertain that my ground station is capable of doing a pass, i.e. receive on the desired frequency and upload a record of what has been received to the database. The quality of what has been received is irrelevant in this connection.
Monitoring the polling will be one of the options. It is viable for me because my Internet connection is rock solid. I’m on a fiber connection and it has worked flawlessly since I got it several years back.
It also looks like taking your API approach may be useful. I haven’t used the API before so I’m in for learning something new
If/when I get something workable I’ll publish it here.
I have made a first whack at writing a plugin for Nagios that monitors the SatNOGS client’s poll of the database server. It is uploaded here check_satnogs_polls.txt (2.4 KB)
with an .txt extension (otherwise I wasn’t allowed to upload it) It must be renamed to ‘check_satnogs-polls’ and placed in /usr/lib/nagios/plugins if you use my ‘sudoers’ lines below.
There are a few caveats. In order to use it you must do as couple of things.
First, the plugin uses a rule in the firewall of the SatNOGS client to count the number of datagrams sent to the database server. This rule must be added by you. The rule is mentioned in the comments of the plugin. I used this command to create the rule (note the absence of a -j target!):
ip6tables -A OUTPUT -p tcp --dest srv01.libre.space
Second, the plugin must be run as root, since only root is allowed to access the firewall rules. For Nagios this is most easily achieved by using sudo, so you should add these lines to ‘sudoers’ using ‘visudo’:
# User alias specification User_Alias NAGIOS = nagios # Cmnd alias specification Cmnd_Alias NAGIOS_COMMANDS = /usr/lib/nagios/plugins/check_satnogs_polls # User privilege specification NAGIOS ALL = NOPASSWD: NAGIOS_COMMANDS