Experience has taught me that everything that can be monitored with Nagios should be monitored.
I have lately had a few occasions of my SatNOGS client (which is a RPi 3a) malfunctioning. Luckily I was notified by email by the SatNOGS system so I could quickly remedy the situation. However, I would like my Nagios to notify me as soon as something bad occurs.
The recent occasions have all been that something has happened to the client software. The RPi itself was still running, so I need a Nagios plugin that can monitor the operation of the client software.
As far as I can judge there is no Nagios plugin for SatNOGS, so I’ll write my own. What I now need - and that is the real reason why I write this post - are some hints to how best to monitor the operation of the client software.
The client is idle about half of the time and doing a pass the remainder of the time. But isn’t it so that it every few minutes polls the database server for new work? If that is the case then monitoring the poll would be an obvious candidate for my plugin.
But any other suggestion would be most appreciated.
Yes, the client does poll the database server frequently (every 60 seconds?) so you could opt to use that as one trigger. That may produce a bunch of false positives if your network connection/DNS are not rock solid 24/7.
I do use Nagios at work but would have a hard time envisioning it working well with SatNOGS for things other than ping, files system available storage, temps, process status of the system the client is running on. All these system indicators can be normal but an observation might still be classified as ‘failed’ for other reasons (i.e, missing ogg/waterfall files because SDR device is in an error state, excessive time delay in delivery of valid observation results for whatever reason).
That said, I would be interested in any Nagios plugins that you would be willing to share if you decide to go this route!
My approach at this point – because I also want to better monitor my satnogs ground station status – is to download my station’s recent result using the network API. Then search for ‘recent failed’ status --> send out an email alert if a ‘failed’ observation exists. I have not scripted this yet but it’s on my to-do list
Thanks for your post. You present several interesting ideas and options. I will (try to) to pursue most if not all of them.
To me the main purpose is to ascertain that my ground station is capable of doing a pass, i.e. receive on the desired frequency and upload a record of what has been received to the database. The quality of what has been received is irrelevant in this connection.
Monitoring the polling will be one of the options. It is viable for me because my Internet connection is rock solid. I’m on a fiber connection and it has worked flawlessly since I got it several years back.
It also looks like taking your API approach may be useful. I haven’t used the API before so I’m in for learning something new
If/when I get something workable I’ll publish it here.