Nagios stops checking hosts

4 posts / 0 new
Last post
lichtnet
Offline
Last seen: 6 years 4 months ago
Joined: 17.10.2013 - 08:57
Nagios stops checking hosts

Dear community,

i have Nagwin installed on a Windows server 2008 R2 standard with service pack 1.

Sporadically I can see in the webinterface that the LastCheck happend multiple hours ago. When I re-shedule a check, nothing happens. I tried to restart the "nagwin_nagios" service but the system cannot stop it. The only way to get it stop, is to kill it by the Process ID. (Console -> kill PID).

Then i can start the service again and Nagios works fine.

I already checked the Windows Event Logs and didn't find anything special.

The nagios errorlog is full of the message: "/bin/sh: /usr/bin/perl: No such file or directory"

and sometimes i get these messages:

Use of uninitialized value $message in pattern match (m//) at check_winping.pl line 96, <PINGOUT> line 3.

Use of uninitialized value $message in concatenation (.) or string at check_winping.pl line 136, <PINGOUT> line 3.

Obviously is perl not installed, but i think that should not be a problem, because the server worked fine for months.

 

I hope you can help me!

Thanks, Philipp

itefix
Offline
Last seen: 36 min 14 sec ago
Joined: 01.05.2008 - 21:33
There is no clue about which

There is no clue about which version you use and how your setup looks like. It seems that you run a version without Perl (/usr/bin/perl) - Try to instll the free edition 1.2.0 which includes perl. You can also install the latest check_winping to avoid messages like "uninitialized .....".

One other thing is that it is not necessarily Nagwin that causes the problem - It may be affected by an other change in your system.

 

lichtnet
Offline
Last seen: 6 years 4 months ago
Joined: 17.10.2013 - 08:57
I am using Nagwin version

I am using Nagwin version 1.3.0 free and in my config i am checking about 100 hosts an 200 services. Some hosts have winrpe services configured, other are only checked by check-host-alive.

I have cloned the machine, installed version 1.2.0 and it seems, that it works. The question is: will it work in the future?
Because we need the availability reports at the end of the year and i see no way to reinstall nagwin every time when the problem occurs.

As far as I know, there was no change on the system, so i think that the problem is cuased by nagwin.

itefix
Offline
Last seen: 36 min 14 sec ago
Joined: 01.05.2008 - 21:33
Version 1.2.0 is a

Version 1.2.0 is a full-feature distribution including Pnp4Nagios which runs a perl script for each service and host check. As you have lots of hosts and services in your configuration, this configuration can create too much load on the Nagios process. You can try to run Pnp4Nagios in bulk mode to reduce the load. See a related thread for a recipe.

If you get lots of messages like No such signal: SIGIOT at /var/opt/pnp4nagios/libexec/process_perfdata.pl line 1183., you can just comment out that line.

A last alternative can be to turn off collection of performance data.