RE: How to get notified for a frozen oracle database - NEED HELP

  • From: "Taft, David" <TaftD@xxxxxxxxxxx>
  • To: "'Raj.Pal@xxxxxxxxxxxx'" <Raj.Pal@xxxxxxxxxxxx>, myneel@xxxxxxxxx, oracle-l@xxxxxxxxxxxxx
  • Date: Fri, 13 Apr 2007 12:56:11 -0400

Raj,
 
This is pretty much what we do in our shop, but I found that select against
v$database would sometimes succeed even though users were calling the help
desk saying everything was hung.  I found doing a truncate against an empty
dummy table to be more effective.  Here is a snippet of code that gives a
pretty good idea of how I handle it.  
 
Cheers,
 
David Taft
 
P.S. Our monitoring runs from cron on each server, which doesn't satisfy
Keerthi's original request (paging via EM/Grid Control), but you'll notice
at the bottom of the snippet, that I do make a call to Big Brother
monitoring.  I am guessing the same could be done with EM.

<SNIP>
 
# Execute in background so that script doesn't hang if the database is hung.
# SqlPlus function writes output to a temporary file.
SqlPlus "SET TIMING ON
         TRUNCATE TABLE dbmon.status;" &
 
# Give truncate a chance to complete before entering the loop (especially
for databases with heavy resource contention).
sleep 5
 
# Minute counter.
count=0
 
# Comparison loop counter.
count2=`expr $timeout + 1`
#
# NOTE: count2 must be one greater, otherwise the '$count -eq $timeout'
#       statement below will never execute.
 
# Initialize background job check to zero.
stat=0
 
# Check results of above SQL*Plus session for database status.
while [ $count -lt $count2 ];
do
   # Check if background job is still running.
   stat=`kill -0 $! 2>/dev/null;echo $?`
 
   # If exit status of kill command is greater than zero, then job is
finished.
   if [ "$stat" != "0" ]; then
 
      case `cat $tmpfile` in
         *"Table truncated"* )
            # If truncate executed, then everything is OK.
            break
            ;;
         *"table or view does not exist"*)
            # Table never created or dropped?
            message="Unable to access the dbstatus table DBMON.STATUS."
            break
            ;;
         *"ORACLE not available"*)
            # Verify database didn't go down while the script was sleeping.
            message="ORACLE not available."
            break
            ;;
         *"ORACLE initialization or shutdown in progress"*)
            # Verify database is not in a CLOSED state.
            message="ORACLE initialization or shutdown in progress.
                     Database not open."
            break
            ;;
         *)
            # Query is not still running and $count has not exceeded
$timeout,
            # so there must be a problem connecting to the database.
            message="Problem connecting to ORACLE or the DBMON schema."
            break
            ;;
      esac

   elif [ $count -eq $timeout ]; then
      # Database is hung if query still running after specified timeout, so
      # kill the truncate statement running in the background.
 
      kill -9 $! >/dev/null 2>&1
      message="Database is hung."
      break
 
   else
      # Query is still running and the timeout has not been exceeded, so
      # increment counter and wait one minute before looping again.
 
      count=`expr $count + 1`
      sleep 60
   fi
done
 
# If $message is not null, then there is a problem.
if [ "$message" != "" ]; then
 
    PageDBA      "$message"
    MakeLogEntry "$message"
    WriteCSV     "$message"
    NotifyBigBrother "red"
 
  # Exit with status of 1.
  Cleanup 1
 
<SNIP>


From: Pal, Raj [mailto:Raj.Pal@xxxxxxxxxxxx] 




Have your script log in to extract a known statement (ie. Select name from
v$database).

Run this in the background (ie. &) and send it to a dummy/temporary file
(eg. /tmp/monitor.tmp).

Give your script a configurable number of seconds to find the file.

If it doesn't find it after 'x' seconds, then send an alert.

Just make sure you delete the potentially existing dummy file before
repopulating it.



 




  _____  


From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Keerthi Kumar


 

Is there a way to get notified for a frozen oracle database to a pager via
EM 10g GRID. We have Database Up/Down event but could not find anything in
the list for this type of problem.



Other related posts: