I love Nagios because the toolset is so simple and powerful that we can monitor almost anything that has an IP address. We have an automated process that delivers data once a day via secure-FTP using WinSCP. The data is delivered via private link to our parent company and once or twice per year, the SSH host key changes. We never find out until we detect the job is failing. Today when it happened we set out to find a way to test for this condition so we might know ahead of time what's happening..
The solution uses the following:
- WinSCP console mode
- Nagios' NSClient++ and an NRPE check
- Windows shell file
The first issue is getting WinSCP to report back in an automated way that the host key is changing. I did that using this WinSCP command script
option batch on open firstname.lastname@example.org pwd exit
and this WinSCP command line:
winscp.exe /console /script=cmdocc.txt /log=tt.log
The option batch on command tells WinSCP to immediately cancel any input prompt. The open command tells winscp to open a connection using the stored connection specified.
When the "open" is executed successfully and nothing has changed the script then checks the current directory and exits. If the host key has changed, WinSCP prompts to accept it, but the "option batch on" command replies no, the connection fails and the script exists but not before logging the condition to the specified log file.
The final piece is this windows command shell.
@echo off :: TestHostKey - test the stored ssh host key and reports if it has changed :: Uses winscp batch script to connect to the appropriate host. If the host key is different, it will log to a file and exit :: Script tests for 'key not verified' in outlput log set WINSCPEXE=\netadmin\winscp\winscp.exe set WORK=\netadmin\nrpe cd %WORK% del tt.log /q %WINSCPEXE% /console /script=hostchk.txt /log=tt.log findstr /i /c:"Host key wasn't verified!" tt.log >null if %errorlevel% NEQ 1 ( echo Host Key does not match exit 1 ) ELSE ( echo Host key OK exit 0 )
The script orchestrates the call of WinSCP and after it exits uses findstr to look for "Host key wasn't verified!" in the log file. Based on the results, it sets the exit code and sends an output string to stdout.
This is where Nagios comes in. Nagios uses two pieces of information to monitor a host - the exit code of the check command and the output string. The exit value allows Nagios to decide if the service is healthy and the output string is usually some clear text for the human.
We use NSCLient++. To make this work I had to make the following changes to NSC.ini
- enable NRPEListener.dll by uncommenting the entry in the modules section
- set the NRPE port by uncommenting the port line in the NRPE section
- set use_ssl=1 in the same section
- Add the following entry to the NRPE Handlers section
Note: make sure all the other samples are commented out in that section unless you are using them.
On the Nagios server, use the check_nrpe command to issue the check_hostkey against this server on a scheduled basis and tell someone when there is a problem.
This is a bit of a house of cards that took me about 90 minutes to piece together, but the point in all this is to show that with a bit of ingenuity you can put together a solution to test anything with Nagios.