Re: Measure database availability beyond 99.9%

  • From: Daniel Fink <daniel.fink@xxxxxxxxxxxxxx>
  • Date: Fri, 29 Aug 2008 13:27:14 -0600

Options - You can use the database to monitor itself (or another database), but that is not going to provide 100% accuracy. What happens in the case of instance failure? Will any shutdown triggers fire? If you read the alert log, you may not find an instance terminated entry, so you have to guess when it went down. If you are running a health check script, what happens if there is a failure of the script that is untrapped? Or you can use host (unix, windows) tools/scripts. But these may only tell you if the SMON process is running or a privileged user can log in. They may not be able to tell if the database is up, but the network is experiencing problems, so the database is 'down' as far as the application is concerned.


Opinion - 99.9% tracking would reveal any cumulative downtime in excess of 9 hours in a year. Why would this not be sufficient precision? If you are wanting it down to the second, then you are talking a 99.999999% precision (annually). If an outage were recorded in minutes, you can publish a 99.999% figure with a variance of +-.1% (or something to that effect).

Bottom line - Seconds precision would be difficult to monitor and provide no real meaning. Of course...this is purely a technical perspective and there is no doubt someone in management/marketing who wants to brag about a 99.99999999999999% uptime or include it in some contract with no real clue as to what that really means or entails.

Regards,
Daniel Fink

--
Daniel Fink

Help me support The Children's Hospital of Denver! I'm riding in the 2008 Courage Classic - 157 miles in 3 days Help me reach my goal of $2,500.00 in donations. Visit my Personal Rider Page http://www.couragetours.com/2008/danielwfink to donate

OptimalDBA.com - Oracle Performance, Diagnosis, Data Recovery and Training

OptimalDBA    http://www.optimaldba.com
Oracle Blog   http://optimaldba.blogspot.com

Lost Data?    http://www.ora600.be/



Niall Litchfield wrote:
Aaaarrrrgh! I'm sure there's a purpose that isn't lying to justify
expensive investments. I just cannot see it. Real HA must do service
level monitoring  (aka can the users work) what you seem to propose
has no clear benefit, please tell me I'm wrong.

On 28/08/2008, Ingrid Voigt <GiantPanda@xxxxxxx> wrote:
Hi,

we are looking for a tool to measure and report the availability of our
databases in the HA range, i.e. with high precision. At this time we are
only interested in the database state, not whether the customers can work.

The database versions involved are 9.2 - 10.2, 11 coming next year. All
editions: SE1, SE and EE.

So far, we have been using EM Grid Control, but beyond 99,9% this is not
precise enough. Too many failures of the agent/the Grid Control system
rather than the database and too much time between "database back up"
and "agent notices that database is back up". A switch in the failsafe
clusters takes less than a minute and should be reported to the second,
if possible.

We can get startup time easily from a database trigger or the alertlog,
but have not good way to measure shutdown time so far. Is there
something good available (free would be nice) or do we have to build it
  ourselves?


Thanks for your help.


Regards
Ingrid Voigt
--
//www.freelists.org/webpage/oracle-l






Other related posts: