Thursday, December 12, 2013

Nagios solution on Fedora 18/19


Abstract:

  • Install Nagios core V3.5.1 with web support on any Fedora box [ Monitoring Server]
  • Install Nagios core V3.5.1 on any other suporting server [ Nagios Client ]
  • Install all standard plugins on Nagios client
  • Install NRPE plugin on Monitoring Server and Nagios client

Nagios installation and deployment

Nagios is an industry standard host monitoring software with very flexible architecture. With wide variety of plugin tools, anything in the local/Nagios client can be monitored using this software. And a user can set the thresholds to for alarming the user or administrators on and when an event is occurred.

Installation steps


1 - Install apache web server with php at Monitoring Server
yum install httpd php

2 - Create a new nagios user account and give it a password. Monitoring Server
/usr/sbin/useradd -m nagiospasswd nagios
3 - Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd apache

4- Then install Nagios core and its plugins at both Monitoring Server and client
yum install nagios
yum install nagios-plugins-all.x86_64
yum install nagios-plugins-nrpe.x86_64
5 - Verify the sample Nagios configuration files.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
6 - Fedora ships with SELinux (Security Enhanced Linux) installed and in Enforcing mode by default. This may cause permission errors when using IPMI plugins and system log viewer plugins. So execute below steps on Monitoring Server.
See if SELinux is in Enforcing mode.
Getenforce
Put SELinux into Permissive mode.
setenforce 0

Start the nagios services

After nagios is installed and configured, execute below commands to start httpd server and nagios service at Monitoring Server.
systemctl start nagios.service
systemctl start httpd.service
Then open a browser instance and add the url as localhost/nagios. This will open nagios main page. On this page the links to different monitoring entities will be available. Eg services, host etc. On clicking these lnks current system status and parameters will displayed on the browser window.

Plugins and verification.

The power of nagios is its plugins. There few standard plugins which monitor load, hard disk, ping are installed on the nagios client box using the command yum install nagios-plugins-all.x86_64 . All these plugins are tested successfully.
The installed plugins are
check_disk - monitored the mounted file system
check_procs - monitors the processes running
check_swap - check swap of the local system.
check_users - number of the users currently loged in.
check_nrpe - runs at the monitoring nagios machine. which executes the nagios daemon on the remote server. this in turn executes the commands defined in the nrpe.cfg(more details are explained later part of this document)

On browser window, the information will be visible by mouse click. On command line execution, first need to find out the location of the plugins. Here in nagios client box these plugins are available in /usr/lib64/nagios/plugins/. And plugins can be executed by passing proper parameters. Sample commands are listed below, which can verify on either Monitoring Server or client.

Note : Here w stands for the warning thresholds and C stands for the critical thresholds.
/usr/lib64/nagios/plugins/check_users -w 5 -c 10
/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
/usr/lib64/nagios/plugins/check_disk -w 20% -c 10%
/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
check_procs -w 2:2 -c 2:1024 -C portsentry ( Warning if not two processes with command name portsentry. Critical if < 2 or > 1024 processes )
check_procs -w 10 -a '/usr/local/bin/perl' -u root ( Warning alert if > 10 processes with command arguments containing '/usr/local/bin/perl' and owned by root)
check_procs -w 50000 -c 100000 --metric=VSZ (Alert if VSZ of any processes over 50K or 100K )
check_procs -w 10 -c 20 --metric=CPU (Alert if CPU of any processes over 10%% or 20%%)

Power KVM box is verified with all the installed plugins both by browser UI and commands line methods.

Monitoring a Remote server using Nagios.

There are different methods to monitor remote servers using nagios. NRPE is the method widely accepted and popular. NRPE is a plugin, which can be installed on monitoring host and Nagios client after nagios is installed. There are two main components with NRPE plugin. One is check_nrpe. This plugin will reside at Monitoring server and second is the nrpe, which resides at Nagios client.

check_log plugin

Check_log plugin can be configured to monitor any log file resides on either nagios client or server. For monitoring the system logs, sufficient permissions needs to be set for the users. Modify both commands.cfg and localhost.cfg for defining the service and commands. The command definition can be like below.
define command{
command_name check_sys_log
command_line $USER1$/check_log -F /var/log/messages -O /tmp/oldlog -q 'Error - Could not complete SSL handshake'
}
check_log plugin keep a copy of the monitoring log file to a temporary place as part of the initialization process. And then it takes difference between the current log and previously taken copy. If the difference contains the mentioned error, it reports.

NRPE Configuration

NRPE needs to be installed at both server and client side. yum install nagios-plugins-nrpe.x86_64 is the command used to install this plugin. Below are the outline of the Monitoring Server and client configuration
  1. Since nrpe is running with xinetd, install xinetd on Nagios client.
  2. Open /etc/xinetd.d/nrpe at Nagios client and add Monitoring server IP. This allows check_nrpe from Monitoring server to communicate with Nagios client.
  3. After nagios is configured at monitoring host, add remote execution commands at /usr/local/nagios/etc/nrpe.cfg. Whenever a user tries to get the information from a Nagios client, check_nrpe will use these commands to collect the data from Nagios client.

Some sample commands at Nagios client nrpe.cfg
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10%

then execute below commands from Nagios server.
  • check_nrpe -H -c check_total_procs
  • check_nrpe -H  -c check_hda1
Note :
/etc/hosts.allow, /etc/services and /etc/xinetd.d/nrpe needs to be configured appropriately to work NRPE

SNMP plugin configuration

Add below command in command.cfg og monitoring server.
check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.4.1.2021.2.1.5.1 -w 100 -c 200
here object id 1.3.6.1.4.1.2021.2.1.5.1 needs to be enabled at nagios client machine with proper community string and permissions.

Other configurations at Monitoring server for SNMP

../objects/.cfg: Add 'check_command check_snmp_proc! '
../objects/commands.cfg: Add 'check_snmp_proc' command definition
/etc/nagios/objects.cfg: Add service_description SNMPD_PROC

Note : Another utility snmptt needs to be installed to translate the useful traps to nagios.

If anybody needs any more info, ping here as a comment.