RE: Database Outages - Best Practices

  • From: "Michael Fontana" <mfontana@xxxxxxxxx>
  • To: <>, <ryan_gaffuri@xxxxxxxxxxx>
  • Date: Mon, 14 Feb 2005 15:25:08 -0600

This is interesting, but it seems to dispel the notion that this very same
management is going to have to do quite a bit of coordination to accommodate
you.  For example, customers must be notified, application support personnel
may be needed to shutdown and restart services and applications.

Right now, the only kind of outages we really have are unplanned ones - when
failures occur.  Some managers have actually said we're better off just
"having them happen" then raise the ire of the business by asking for them!

Rediculous logic, I know, but it sure does get the resources lined up and
the work gets accomplished!

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of stephen booth
Sent: Monday, February 14, 2005 3:16 PM
To: ryan_gaffuri@xxxxxxxxxxx
Cc: mfontana@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: Re: Database Outages - Best Practices

On Mon, 14 Feb 2005 20:38:11 +0000, ryan_gaffuri@xxxxxxxxxxx
<ryan_gaffuri@xxxxxxxxxxx> wrote:
> most outages are typically related to builds. your adding tables, dropping
tables, data migration, or the application is deploying a new version.
> standard is outside of regular business hours. negotiate notice in
advance. Negotiated frequency. Negotiated downtime. Negotiated notification
process if there is a delay in bringing the system back up.
> you want it all written down in advance, with appropriate phone numbers
and you want a client signature.

Ditto on all of the above.

One useful way I've found of getting management to agree outages is to
not call them outages.  Call them 'Power Possession'.  A power
possession is a period when you or any of the other techs and
engineers associated with a system  (database, OS, hardware, network,
power, building management &c) have the option or downing the system,
the datacentre or even the network for maintenence.

Schedule them well inadvance (try for a year), as frequently as
possible, for times time the system isn't busy and for a good long
period of time.  Most times you won't need them and will be able to
tell the business "It's OK, we don't need to take the service down,
this time".  When work needs to be done try to get as much
non-interfering work into that window as possible (obviously don't try
to upgrade the OS and database software at the same time but you
should be OK working on the database whilst a network switch is being 
upgraded or after a test firing of the backup generators (always make
sure that your backup generators are test fired on the manufacturers
reccommended schedule, this is the voice of bitter experience
speaking), for example).

It's better to ask a silly question than to make a silly assumption.


Other related posts: