Re: 1 minutes: best downtime story

  • From: Maureen English <maureen.english@xxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 28 Mar 2013 11:42:33 -0800

We have a scheduled power outage during our annual Fire Alarm and Safety Test.
We start shutting down non-production servers Friday evening, less critical
production servers on Saturday, and by midnight Sunday, everything has been
shutdown cleanly.

We start bringing things up in a very orderly fashion around 2am on Sunday
and typically have everything up by about 4pm...14 hours later....

Yesterday, at about 3:30pm, we were notified that we lost power and the UPS
systems had about 30 minutes left.  I'm not sure what caused the power to
go out, or why there was only a 30 minute window, but we got almost every
production system shutdown cleanly before everything crashed at about 4:10pm.

An hour later, the power was back and stable, so we started bringing things up.
We had a few problems with some critical machines, but by 11pm, only *6* hours
after the power came back, all of the problems with the critical machines had
been resolved and just about everything was back up and functioning!

So, this isn't really a 'best downtime', it's a 'best teamwork' story.  The
preparation we did last Fall for our scheduled outage saved us so much work
and allowed us to bring everything up fast and orderly last night...even when
were all pretty tired!

- Maureen

Other related posts: