Greg's Tech blog

Using Nagios to monitor for WinSCP & SSH Host key check

Friday 27 of January, 2012

I love Nagios because the toolset is so simple and powerful that we can monitor almost anything that has an IP address. We have an automated process that delivers data once a day via secure-FTP using WinSCP. The data is delivered via private link to our parent company and once or twice per year, the SSH host key changes. We never find out until we detect the job is failing. Today when it happened we set out to find a way to test for this condition so we might know ahead of time what's happening..

The solution uses the following:

  • WinSCP console mode
  • Nagios' NSClient++ and an NRPE check
  • Windows shell file

The first issue is getting WinSCP to report back in an automated way that the host key is changing. I did that using this WinSCP command script

option batch on
open auser@theirhost.company.com

and this WinSCP command line:

winscp.exe /console /script=cmdocc.txt /log=tt.log

The option batch on command tells WinSCP to immediately cancel any input prompt. The open command tells winscp to open a connection using the stored connection specified.

When the "open" is executed successfully and nothing has changed the script then checks the current directory and exits. If the host key has changed, WinSCP prompts to accept it, but the "option batch on" command replies no, the connection fails and the script exists but not before logging the condition to the specified log file.

The final piece is this windows command shell.

@echo off
:: TestHostKey - test the stored ssh host key and reports if it has changed
:: Uses winscp batch script to connect to the appropriate host.  If the host key is different, it will log to a file and exit
:: Script tests for 'key not verified' in outlput log
set WINSCPEXE=\netadmin\winscp\winscp.exe
set WORK=\netadmin\nrpe

cd %WORK%
del tt.log /q

%WINSCPEXE% /console /script=hostchk.txt /log=tt.log
findstr /i /c:"Host key wasn't verified!" tt.log >null

if %errorlevel% NEQ 1 ( 
	echo Host Key does not match
	exit 1
 ) ELSE (
	echo Host key OK
	exit 0	

The script orchestrates the call of WinSCP and after it exits uses findstr to look for "Host key wasn't verified!" in the log file. Based on the results, it sets the exit code and sends an output string to stdout.

This is where Nagios comes in. Nagios uses two pieces of information to monitor a host - the exit code of the check command and the output string. The exit value allows Nagios to decide if the service is healthy and the output string is usually some clear text for the human.

We use NSCLient++. To make this work I had to make the following changes to NSC.ini

  • enable NRPEListener.dll by uncommenting the entry in the modules section
  • set the NRPE port by uncommenting the port line in the NRPE section
  • set use_ssl=1 in the same section
  • Add the following entry to the NRPE Handlers section

Note: make sure all the other samples are commented out in that section unless you are using them.

On the Nagios server, use the check_nrpe command to issue the check_hostkey against this server on a scheduled basis and tell someone when there is a problem.

This is a bit of a house of cards that took me about 90 minutes to piece together, but the point in all this is to show that with a bit of ingenuity you can put together a solution to test anything with Nagios.