Re: monitor rac database services

  • From: "Radoulov, Dimitre" <cichomitiko@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 15 Jun 2011 18:25:19 +0200


The attachment was blocked, renamed to txt.


Dimitre

On 15/06/2011 18:20, Radoulov, Dimitre wrote:
Hi,
I don't think there is a straightforward way for retrieving this info.

I wrote a small Perl script using a test rac 11.2 with 2 nodes running on my pc.
The script is definitely not production ready yet :)

When you have time you could try it and let me know if it's working for you.



Regards
Dimitre


On 14/06/2011 16:10, D'Hooge Freek wrote:
Hi,

I'm currently bussy writing a perl check to verify the status of a rac database service. When the service is down, while the target status is "online" a critical alert has to be given and when the service is running on an "available" instance instead of the preferred instance a warning has to be given.

I have written such a check in the past for 10gR2, but now I need one for 11gR2. As the output formats have changed a lot between 10.2 and 11.2, a large part of the script has to be rewritten (also instances seems to be not as tightly coupled to nodes in 11.2). So I was wondering if there is a (supported) way to query the cluster registry without depending on the output formats of srvctl / crsctl. Which does not seem to be build for easy use in scripts.

Information I need to retrieve is which instances and services exists for a database, what the available / preferred instances are of a service, the current state and target of a service and on which instance a service is currently running.


Kind regards,

Freek D'Hooge
Uptime
Oracle Database Administrator
email:freek.dhooge@xxxxxxxxx
tel +32(0)3 451 23 82
http://www.uptime.be
disclaimer:www.uptime.be/disclaimer

--
//www.freelists.org/webpage/oracle-l




#!/usr/bin/perl

use warnings;
use strict;
use Getopt::Std;
use File::Basename;
use Sys::Hostname;

my %opts;

getopts('h:d:m:', \%opts);

my $sname = basename $0;
my $host = hostname;
my $level = "warning";


sub usage() {
  print STDERR <<EOU;

        usage:

        $sname -h <grid_home> -d <database_name> [ -m <address1> ]

                -h   grid_home
                -d   database
                -m   send mail alert

EOU
  exit 1;
  }

usage unless defined $opts{h} && defined $opts{d};

my (
  $key,
  $target,
  $flag,
  @instances,
  @services,
  @targets,
  @states,
  $state,
  $prefered,
  $current,
  $mailto,
  $send_mail,
  $message,
  );

if ( defined $opts{m} ) {
  $send_mail = 1;
  $mailto    = $opts{m};
  }

my ( $gh, $db ) = ( $opts{h}, $opts{d} );


open my $srvctl, "$gh/bin/srvctl config database -d $opts{d} |"
  or die "Error executing $gh/bin/srvctl config database -d $db\n";

$|++;

while ( <$srvctl> ) {
  /^Database instances:\s+(.*)/ and @instances = split /,/, $1;
  /^Services:\s+(.*)/ and @services = split /,/, $1;
  }
close $srvctl
  or warn "Error executing $gh/bin/srvctl config database -d $db:  $!\n";

# get instances status

for ( @instances ) {
  my $stat = qx|$gh/bin/srvctl status instance -d $db -i $_|;
  if ( $stat =~ /not running/ ) {
    if ( $send_mail ) {
      $message = "\n$stat\n";
      $level   = "critical";
      }
    else {
      warn "\n$stat\n";
      }
    }
  }

for my $service ( @services ) {

  # get the target
  my $oraservicename = "ora.$db.\L$service.svc";

  open my $resstate, "$gh/bin/crsctl status resource $oraservicename|"
    or warn "Cannot execute $gh/bin/srvctl status resource $oraservicename\n";

  while ( <$resstate> ) {
    /^TARGET=(.+)/ and $target  = $1;
    /^STATE=(.+)/  and $state   = $1;
    }
   
  close $resstate or warn "Close $gh/bin/srvctl status resource 
$oraservicename\n";     

  @targets = $target =~ /(ONLINE|OFFLINE)/g;
  @states = $state =~ /(ONLINE|OFFLINE)/g;

        
  for ( 0 .. @targets ) {
    if ( $targets[$_] ne $states[$_] ) {
      if ( $send_mail ) {
        $message .= "$service\n\ttarget: $targets[$_]\n\tstate: 
$states[$_]\n\n";
        }
      else {
        warn "$service\n\ttarget: $targets[$_]\n\tstate: $states[$_]\n\n";
        }
      }
    }



  open my $serviceconf, "$gh/bin/srvctl config service -d $db -s $service |"
    or warn "Cannot execute $gh/bin/srvctl config service -d $db -s $service\n";

  /Preferred instances:\s+(\S.+)/ and $prefered = $1
    while ( <$serviceconf> );

  close $serviceconf or warn "Close $gh/bin/srvctl config service -d $db -s 
$service\n";

  $current = qx|$gh/bin/srvctl status service -d $db -s $service|;

  chomp $current;

  if ( $current =~ /not running/ and $target eq 'ONLINE' ) {
    if ( $send_mail ) {
      $message .= $current;
      $level   = "critical";
      }
    else {
      warn "$current\n";
      }
    next;
    }

  $current =~ s/.*is running on instance\S+\s+(\S.+).*/$1/;

  if ( $current ne $prefered ) {
    if ( $send_mail ) {
      $message .= "$service is running on non prefered instance: $current\n";
      }
    else {
      warn "$service is running on non prefered instance: $current\n";
      }
    }
  }

if ( $send_mail ) {
  open my $sendmail, '| /usr/lib/sendmail -oi -t -odq'
    or die "Can't fork for sendmail: $!\n";

  print $sendmail <<EOM;
From: $ENV{'USER'}
To: $mailto
Subject: $host [$level]

$message

EOM

close $sendmail
  or warn "sendmail didn't close nicely\n";
  }

Other related posts: