Re: Measure database availability beyond 99.9%
- From: Daniel Fink <daniel.fink@xxxxxxxxxxxxxx>
- Date: Fri, 29 Aug 2008 13:27:14 -0600
Options - You can use the database to monitor itself (or another
database), but that is not going to provide 100% accuracy. What happens
in the case of instance failure? Will any shutdown triggers fire? If you
read the alert log, you may not find an instance terminated entry, so
you have to guess when it went down. If you are running a health check
script, what happens if there is a failure of the script that is untrapped?
Or you can use host (unix, windows) tools/scripts. But these may only
tell you if the SMON process is running or a privileged user can log
in. They may not be able to tell if the database is up, but the network
is experiencing problems, so the database is 'down' as far as the
application is concerned.
Opinion - 99.9% tracking would reveal any cumulative downtime in excess
of 9 hours in a year. Why would this not be sufficient precision? If you
are wanting it down to the second, then you are talking a 99.999999%
precision (annually). If an outage were recorded in minutes, you can
publish a 99.999% figure with a variance of +-.1% (or something to that
effect).
Bottom line - Seconds precision would be difficult to monitor and
provide no real meaning. Of course...this is purely a technical
perspective and there is no doubt someone in management/marketing who
wants to brag about a 99.99999999999999% uptime or include it in some
contract with no real clue as to what that really means or entails.
Regards,
Daniel Fink
--
Daniel Fink
Help me support The Children's Hospital of Denver!
I'm riding in the 2008 Courage Classic - 157 miles in 3 days
Help me reach my goal of $2,500.00 in donations.
Visit my Personal Rider Page http://www.couragetours.com/2008/danielwfink to donate
OptimalDBA.com - Oracle Performance, Diagnosis, Data Recovery and Training
OptimalDBA http://www.optimaldba.com
Oracle Blog http://optimaldba.blogspot.com
Lost Data? http://www.ora600.be/
Niall Litchfield wrote:
Aaaarrrrgh! I'm sure there's a purpose that isn't lying to justify
expensive investments. I just cannot see it. Real HA must do service
level monitoring (aka can the users work) what you seem to propose
has no clear benefit, please tell me I'm wrong.
On 28/08/2008, Ingrid Voigt <GiantPanda@xxxxxxx> wrote:
Hi,
we are looking for a tool to measure and report the availability of our
databases in the HA range, i.e. with high precision. At this time we are
only interested in the database state, not whether the customers can work.
The database versions involved are 9.2 - 10.2, 11 coming next year. All
editions: SE1, SE and EE.
So far, we have been using EM Grid Control, but beyond 99,9% this is not
precise enough. Too many failures of the agent/the Grid Control system
rather than the database and too much time between "database back up"
and "agent notices that database is back up". A switch in the failsafe
clusters takes less than a minute and should be reported to the second,
if possible.
We can get startup time easily from a database trigger or the alertlog,
but have not good way to measure shutdown time so far. Is there
something good available (free would be nice) or do we have to build it
ourselves?
Thanks for your help.
Regards
Ingrid Voigt
--
http://www.freelists.org/webpage/oracle-l
- References:
- Measure database availability beyond 99.9%
- From: Ingrid Voigt
- Re: Measure database availability beyond 99.9%
- From: Niall Litchfield
Other related posts:
- » Measure database availability beyond 99.9%
- » Re: Measure database availability beyond 99.9%
- » RE: Measure database availability beyond 99.9%
- » RE: Measure database availability beyond 99.9%
- » Re: Measure database availability beyond 99.9%
- » Re: Measure database availability beyond 99.9%
- » RE: Measure database availability beyond 99.9%
Aaaarrrrgh! I'm sure there's a purpose that isn't lying to justify expensive investments. I just cannot see it. Real HA must do service level monitoring (aka can the users work) what you seem to propose has no clear benefit, please tell me I'm wrong. On 28/08/2008, Ingrid Voigt <GiantPanda@xxxxxxx> wrote:
Hi, we are looking for a tool to measure and report the availability of our databases in the HA range, i.e. with high precision. At this time we are only interested in the database state, not whether the customers can work. The database versions involved are 9.2 - 10.2, 11 coming next year. All editions: SE1, SE and EE. So far, we have been using EM Grid Control, but beyond 99,9% this is not precise enough. Too many failures of the agent/the Grid Control system rather than the database and too much time between "database back up" and "agent notices that database is back up". A switch in the failsafe clusters takes less than a minute and should be reported to the second, if possible. We can get startup time easily from a database trigger or the alertlog, but have not good way to measure shutdown time so far. Is there something good available (free would be nice) or do we have to build it ourselves? Thanks for your help. Regards Ingrid Voigt -- http://www.freelists.org/webpage/oracle-l
- Measure database availability beyond 99.9%
- From: Ingrid Voigt
- Re: Measure database availability beyond 99.9%
- From: Niall Litchfield