[racktables-users] RackTables + Nagios + Bacula (was: How to handle virtual machines)

  • From: Frank Altpeter <frank-racktables@xxxxxxxxxxx>
  • To: racktables-users@xxxxxxxxxxxxx
  • Date: Thu, 5 Feb 2009 18:46:29 +0100

Hi there,

Since there were some questions about how I combined RackTables with
nagios and bacula, here's a little bit from my efforts in doing this.
Needs to mention that it's customized to our internal structure and cannot 
be taken over into another environment without prior modification :)


- check_back.pl
this is a little script, added as nagios service check, which checks the
local nagios configuration for hosts and looks into the mysql table(s) of
one or more systems (bacula and racktables in my case) if they match any
nagios host. Any host found, that is not found in nagios config, is
reported, so the IT can see if new systems are recorded at racktables DB
and configured to be backed up.

- check_bacula_clientstatus.pl
This is a nagios check that queries the bacula database for the last
backup made and gives warning and/or critical alarm if the backup is over
due or failed. It sets a link in the description field which leads
directly to the report page of the bacula-webconsole (bweb).

- check_racktable.pl
This is my check to see if the monitored system is correctly entered at
the racktables system. It checks the presence of the object, and queries
the information of it, so it can show within nagios where the object is
located (e.g. rack location, switch port, uplink switch name, power
connection, etc.). So, if a system goes red, you can directly see where to
look if you need to go to the machine console. Of course it also defines a
link in the description field to direcly jump to the corresponding
racktables asset entry.

- generate_configparts.pl
My first creation of auto-generating nagios configuration parts based on
mysql database queries. This project is quite new and this is the first
attempt, but more will follow since I find that interesting :)
This script can - based on an array of database configurations - create
configuration parts for nagios which then can be used in servicechecks.
It more or less just creates hostgroups with n members in it, based on
sql queries. My current implementation asks bacula for all configured
clients and puts them in a hostgroup backuphosts, which is used in the
servicechecks bacula_clientstatus and bacula_fd to ensure that all
bacula clients are monitored. So I can throw away the mail functions on
bacula. The second array is fetching hosts (object names) from racktables
db which have a custom field set. This field is named "Service" which
means these are hosts for which we are managing software updates. So, when
I set the "Service" field for a specific host to "Premium Updates" (which
is in sales speak "we cover software updates within 24 hours after
announcement), nagios will automatically put this host in a specific
check, which goes via nrpe to the host and queries the available updates
(via zypper, portsnap, apt-get, etc.). This could of course easily
expanded to n queries to build a complex auto-generated nagios config
based on n custom fields from racktables db. Oh, and don't forget that
this needs a "cfg_dir=/etc/nagios/config" in nagios.cfg to work :)


Well, so far from me. Hope this helps someone :)
But, as always, no warranty - but it's not great magic... just a little
perl scripting from an admin which does have automation as hobby :)


 
Le deagh dhùraghd,

        Frank Altpeter

-- 
FA-RIPE *** http://www.altpeter.de/
|   If you fall and break your legs, don't come running to me. -Samuel
|   Goldwyn

#!/usr/bin/env perl
# $Id: check_back.pl 2499 2009-01-27 12:11:42Z freddy $
#
# WARNING:
# This script is managed by /etc/cron.daily/autoupdate - DO NOT MODIFY HERE
#

use POSIX;
use DBI;
use DBD::mysql;

%options = (
        '-verbose'      => "Give verbose output about what I'm doing",
        '-rt'           => "Check against the RackTables database",
        '-bacula'       => "Check against the bacula database",
);

# Enter hosts to be skipped in this check (mostly notebooks)
%skip = (
        'hostname1.domain.tld' => 1,
        'hostname2.domain.tld' => 1,
        );

use Getopt::Long;
GetOptions( keys %options );

if( $opt_rt ) {
    $checktype = 'RackTables assets';
    $dbuser = 'USERNAME';
    $dbpass = 'PASSWORD';
    $dbhost = 'HOSTNAME';
    $dbname = 'DATABASENAME';
    $dbquery = "SELECT name from RackObject WHERE objtype_id in ( 4, 8 )";

} elsif( $opt_bacula ) {
    $checktype = 'Bacula clients';
    $dbuser = 'USERNAME';
    $dbpass = 'PASSWORD';
    $dbhost = 'HOSTNAME';
    $dbname = 'DATABASENAME';
    $dbquery = "SELECT Name AS name from Client";
} else {
    print "ERROR: One check option must be given\n";
    exit 3;
}

%EXIT = (
        '0'     => 'OK',
        '1'     => 'WARNING',
        '2'     => 'CRITICAL',
        '3'     => 'UNKNOWN',
);

# Simple but effective...
my $dsn = "DBI:mysql:database=$dbname;host=$dbhost;port=3306";
my $nagiosdir = '/etc/nagios/config';
my $nagioscmd = "fgrep host_name ${nagiosdir}/hosts.*.cfg";
open( NAGIOS, "$nagioscmd |" );
while( <NAGIOS> ) {
        chomp;
        ($file, $key, $value ) = split /\s+/;

        warn "*** Found nagios object '$value'\n" if $opt_verbose;
        $nagios{$value} = $value;
}
close( NAGIOS );

my $exit = 1;

warn "*** Connecting to database ${dbname} on ${dbhost} ...\n" if $opt_verbose;
my $dbh = DBI->connect( $dsn, $dbuser, $dbpass );

my $sth = $dbh->prepare( sprintf( $dbquery, $opt_hostname ) );
$sth->execute();

#
# Now, for every item found at the db, check if it's found at nagios config
#
while( my $ref = $sth->fetchrow_hashref() ) {
    if( ( $ref->{name} =~ /(platzhalter|\.DUPE)/i ) || $skip{$ref->{name}} ) {
        warn "*** Ignoring object '" . $ref->{name} . "'\n" if $opt_verbose;
        next;
    }
    if( defined( $nagios{$ref->{name}} ) ) {
            warn "*** Object ok: " . $ref->{name} . "\n" if $opt_verbose;
    } else {
            warn "*** MISSING object: " . $ref->{name} . "\n" if $opt_verbose;
            push( @missing, $ref->{name} );

    }
}

if( @missing ) {
        $exit = 1;
        $exit = 2 if( $#missing > 10 );
        $message = "Missing " . ( $#missing + 1 ) . " $checktype at nagios: " . 
join( ", ", sort @missing ) . "\n";
} else {
        $message = "All $checktype are monitored\n";
        $exit = 0;
}

print $EXIT{$exit} . ": " . $message;
exit $exit;
#!/usr/bin/env perl
# $Id: check_bacula_clientstatus.pl 2426 2009-01-13 10:14:09Z freddy $
#

use Getopt::Long;


$opt_warning = 26;      # Default warning after one day + 2 hours
$opt_critical = 50;     # Default critical after two days + 2 hours

%options = (
        "-hostname=s"   => 'Check hostname <s> for backup status',
        "-warning=i"    => "Advance to WARNING status after <i> hours 
[$opt_warning]",
        "-critical=i"   => "Advance to CRITICAL status after <i> hours 
[$opt_critical]",
        "-help"         => "Show this help text",
    );

GetOptions( keys %options );

if( ! $opt_hostname ) {
    warn "Error: Value for option -hostname must not be empty\n";
}   
if( $opt_critical <= $opt_warning ) {
    warn "Error: Value for CRITICAL ($opt_critical) must be larger than WARNING 
($opt_warning)\n";
}

if( $opt_help || ! $opt_hostname || $opt_critical <= $opt_warning ) {
    warn "\nUsage: $0 [" . join( "] [", keys %options ) . "]\n";
    warn "\nAvailable options:\n";
    warn map( sprintf( "  %-15s %s\n", $_, $options{$_} ), keys %options );
    exit 3;
}


%CODE = (
        '0'     => "OK",
        '1'     => "WARNING",
        '2'     => "CRITICAL",
        '3'     => "UNKNOWN",
);

$warnSeconds = $opt_warning * 3600;
$critSeconds = $opt_critical * 3600;

use DBI;
my $dbh = DBI->connect('DBI:mysql:database=DATABASENAME;host=HOSTNAME', 
'USERNAME', 'PASSWORD');
die("Cannot open mysql conn") if(!$dbh);

my $sth = $dbh->prepare( "select UNIX_TIMESTAMP(NOW()) - Job.JobTDate AS 
JobTDate,Job.Name AS Name,Job.JobStatus AS JobStatus,Status.JobStatusLong AS 
JobStatusLong from Job left join Status ON Job.JobStatus = Status.JobStatus 
WHERE Name = '$opt_hostname' order by JobTDate LIMIT 1;" );

$sth->execute();
$exit = 1;
$return = 'Client ' . $opt_hostname . ' not found in database or not yet backed 
up';

while( $row = $sth->fetchrow_hashref ) {
    $JobTDate = sprintf( "%.2d:%.2dh", $row->{JobTDate} / 3600, 
$row->{JobTDate} / 60 % 60);
    $return = sprintf( "%s since %s (%s seconds)", $row->{JobStatusLong}, 
$JobTDate, $row->{JobTDate} );
    $link = "<a 
href='http://HOSTNAME_OF_BWEB/report.php?default=1&server=$opt_hostname'>(Details)</a>";

    # We want OK only if latest backup status is new
    # AND return status is "T" (Completed Successfully) or "R" (Running)
    # Everything else must at least select a WARNING state

    if( ( $row->{JobTDate} < $warnSeconds ) && $row->{JobStatus} =~ /(T|R)/ ) {
        $exit = 0;
    } else {
        $exit = 1;
    }

    # If the last status is older than critSeconds, advance to CRITICAL 
    $exit = 2 if $row->{JobTDate} > $critSeconds;

    # Advance to CRITICAL if last job was a fatal error
    $exit = 2 if $row->{JobStatus} eq "f";

}

print "$CODE{$exit}: $return $link\n";
exit $exit;
#!/usr/bin/env perl
# $Id: check_racktable.pl 2251 2008-11-27 13:13:30Z freddy $
#
# Script for checking nagios -> racktables dependency
#
# WARNING:
# This script is managed by /etc/cron.daily/autoupdate - DO NOT MODIFY HERE
#

use POSIX;
use DBI;
use DBD::mysql;
use Getopt::Long;

%options = (
        '-verbose'      => "Give verbose output about what I'm doing",
        '-hostname=s'   => "Use <s> as hostname (mandatory)",
        '-textmode'     => "Don't output HTML code",
        '-nosleep'      => "Prevent the sleeping while testing",
);

GetOptions( keys %options );

if( ! $opt_hostname ) {
    exception( 3, "Error: Option -hostname is mandatory" );
}

# RackTables configuration
my $dbuser = 'USERNAME';
my $dbpass = 'PASSWORD';
my $dbhost = 'HOSTNAME';
my $dbname = 'DATABASENAME';

# Nagios does ignore STDERR, so we redirect it to STDOUT - we won't see errors 
otherwise
$| = 1;
open STDERR, ">&STDOUT";

# TODO: TESTING We need some sleep() for preventing mysql to block us
$sleep = int( rand(600) );

# How to query the oid, name, etc. from a hostname - contains 1 %s for the 
hostname
my $dbquery = "SELECT id,name,label,barcode,asset_no from RackObject where name 
= '%s';";

# How to find out at which rack the host is located - contains 1 %s for the oid
my $rtquery = "SELECT id,name FROM Rack WHERE id = ( SELECT DISTINCT(rack_id) 
FROM RackSpace WHERE object_id = %s );";

# How to query the uplink data from an oid - contains 1 %s for the oid
my $uplinkquery = "SELECT Port.name AS Port_name, RemotePort.name AS 
RemotePort_name, RackObject.name AS RackObject_name FROM ( ( ( Port INNER JOIN 
Dictionary ON Port.type = dict_key NATURAL JOIN Chapter ) LEFT JOIN Link ON 
Port.id=Link.porta OR Port.id=Link.portb ) LEFT JOIN Port AS RemotePort ON 
Link.portb=RemotePort.id OR Link.porta=RemotePort.id ) left JOIN RackObject ON 
RemotePort.object_id=RackObject.id where chapter_name = 'PortType' AND 
Port.object_id=%s AND (Port.id != RemotePort.id OR RemotePort.id is null) AND 
RemotePort.name IS NOT NULL AND ( Port.type = 24 OR RemotePort.type = 24 ) 
ORDER BY Port_name;";

my $details = " (<a 
href='https://HOSTNAME_TO_RACKTABLES/RackTables/?page=object&object_id=%s'>Details</a>)";

my $oid;
my $location;
my $dsn = "DBI:mysql:database=$dbname;host=$dbhost;port=3306";
my $return = "Object not found at RackTables database";
my $exit = 1;

warn "*** Sleeping $sleep seconds to prevent SQL flooding...\n" if $opt_verbose;
sleep $sleep if ! $opt_nosleep;

warn "*** Connecting to database ${dbname} on ${dbhost} ...\n" if $opt_verbose;
my $dbh = DBI->connect( $dsn, $dbuser, $dbpass, { PrintError => 0, RaiseError 
=> 0 } ) || exception( 3, "Cannot connect to database" );

my $sth = $dbh->prepare( sprintf( $dbquery, $opt_hostname ) ) || exception( 3, 
"Cannot prepare statement dbquery" );
$sth->execute() || exception( 3, "Cannot execute statement dbquery" );

while( my $ref = $sth->fetchrow_hashref() ) {
    warn "*** Found object id " . $ref->{id} . "\n" if $opt_verbose;
    if( $ref->{name} eq $opt_hostname ) {
        $oid = $ref->{id};
        warn "*** Verified object id " . $ref->{id} . " to be our host ...\n" 
if $opt_verbose;
        $return = "Found at RackTables without rack allocation";
        $exit = 0;
        last;
    }

}

if( $oid ) {
    $sth = $dbh->prepare( sprintf( $rtquery, $oid ) ) || exception( 3, "Cannot 
prepare statement rtquery" );
    $sth->execute || exception( 3, "Cannot execute statement rtquery" );

    while( my $rtref = $sth->fetchrow_hashref() ) {
        $return = "Found at rack " . ( $opt_textmode ? $rtref->{name} : 
"<b><u>" . $rtref->{name} . "</u></b>" );
        $exit = 0;
    }
}

### Try to get uplink description
# Returns: Port_name, RemotePort_name, RackObject_name

if( $oid ) {
    $sth = $dbh->prepare( sprintf( $uplinkquery, $oid ) ) || exception( 3, 
"Cannot prepare statement uplinkquery" );
    warn "*** Executing query: " . sprintf( $uplinkquery, $oid ) . "\n" if 
$opt_verbose;
    $sth->execute || exception( 3, "Cannot execute statement uplinkquery" );

    my $tmpl;

    warn "*** Found " . $sth->rows() . " connections\n" if $opt_verbose;

    if( $sth->rows() == 1 ) {
        $tmpl = " connected via %s to %s Port %s";
    } elsif( $sth->rows() > 1 ) {
        $tmpl = "%s:%s:%s";
        $return .= " with " . $sth->rows() . " links: ";
    } else {
        $return .=", missing uplink documentation";
    }


    while( my $rtref = $sth->fetchrow_hashref() ) {

        warn sprintf( "*** Found connection: $tmpl\n", $rtref->{Port_name}, 
$rtref->{RackObject_name}, $rtref->{RemotePort_name} ) if $opt_verbose;

        $rtref->{Port_name} =~ s/ //g;
        $rtref->{RemotePort_name} =~ s/ //g;
        push( @returns, sprintf( $tmpl, $rtref->{Port_name}, 
$rtref->{RackObject_name}, $rtref->{RemotePort_name} ) );
        $exit = 0;
    }
    $return .= join( ", ", @returns );
}

warn "*** Disconnecting from database ...\n" if $opt_verbose;
$sth->finish;
$dbh->disconnect();

warn "*** Generating output for nagios ...\n" if $opt_verbose;
print "$return";
print sprintf( $details, $oid ) if $oid && ! $opt_textmode;
print "\n";

exit $exit;


sub exception() {
    $exit = shift;
    $message = join( " ", @_ );
    print $message . " (" . $sth->errstr . ")\n";
    exit $exit;
}
#!/usr/bin/env perl
# $Id: generate_configparts.pl 2525 2009-01-30 13:13:53Z freddy $
#
# Generic configuration generator for nagios
#

use DBI;
use DBD::mysql;
use File::Copy;
use warnings;

$confname = "/etc/nagios/config/GENERATED_%s.cfg";
%config = (
        'servicehosts'  => {
                'descr' => 'Hosts that are in service',
                'dbuser' => 'USERNAME',
                'dbpass' => 'PASSWORD',
                'dbhost' => 'RACKTABLES_HOSTNAME',
                'dbname' => 'DATABASE',
                'dbquery' => 'SELECT name FROM RackObject where id in ( SELECT 
object_id FROM AttributeValue WHERE attr_id = 10001 AND uint_value in ( 50024, 
50025, 50026, 50027 ) );',
        },
        'backuphosts' => {
                'descr' => 'Hosts that are backed up by bacula',
                'dbuser' => 'USERNAME',
                'dbpass' => 'PASSWORD',
                'dbhost' => 'HOSTNAME',
                'dbname' => 'BACULA_DATABASE',
                'dbquery' => 'SELECT name FROM Client;',
        },
);

foreach $loop ( keys %config ) {
    my @hosts;
    my $configfile = sprintf( "$confname", $loop );
    my $dsn = 
"DBI:mysql:database=$config{$loop}{'dbname'};host=$config{$loop}{'dbhost'};port=3306";
    my $dbh = DBI->connect( $dsn, $config{$loop}{'dbuser'}, 
$config{$loop}{'dbpass'}, { PrintError => 0, RaiseError => 0 } ) || die "Cannot 
connect to database";
    my $sth = $dbh->prepare( $config{$loop}{'dbquery'} ) || die "Cannot prepare 
statement dbquery";
    $sth->execute() || die "Error: Cannot execute statement dbquery";

    while( my $ref = $sth->fetchrow_hashref() ) {
        $name = $ref->{name};
        push( @hosts, $ref->{name} );
    }


    copy( "$configfile", "$configfile.bak" ) if -r "$configfile";

    open( CONF, "> $configfile" ) && select CONF;

    print '# $Id: generate_configparts.pl 2525 2009-01-30 13:13:53Z freddy $' . 
"\n";
    print "# WARNING: THIS FILE HAS BEEN CREATED AUTOMATICALLY\n";
    print "# DO NOT MODIFY MANUALLY, YOUR CHANGES WILL BE OVERWRITTEN\n";
    print "# Data source: 
$config{$loop}{'dbuser'}\@$config{$loop}{'dbhost'}/$config{$loop}{'dbname'}\n";
    print "\n";
    print "define hostgroup {\n";
    print "    hostgroup_name   GENERATED_$loop\n";
    print "    alias            $config{$loop}{'descr'}\n";

    foreach my $list ( sort @hosts ) {
        print "    members              $list\n";
    }
    print "}\n";

    close( CONF );
    select STDOUT;

    $diff = `diff -bc $configfile.bak $configfile`;
    if( $diff ) {
        $reload++;
        $alldiffs .= $diff;
    }
}

if( $reload ) {
    print $alldiffs;
    print "Differences detected in $reload files, reloading nagios\n";
    system( "rcnagios reload" );
    if( $? ne 0 ) {
        print "Reload failed, restoring original config files\n";
        copy( "$configfile.bak", "$configfile" ) if -r "$configfile.bak";
        system( "cat /var/log/nagios/config.err" );
    }
}

Other related posts: