RE: When control files go bad

  • From: "Mercadante, Thomas F (LABOR)" <Thomas.Mercadante@xxxxxxxxxxxxxxxxx>
  • To: <rjoralist@xxxxxxxxxxxxxxxxxxxxx>, "Oracle L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 1 Jun 2009 15:11:11 -0400

Rich,

From your log file, it looks like it attempted to write to ctl #1 and 2,
failed to do so and then simply shut itself down (CKPT: terminating
instance).

So your database was "ok" in that your control files don't seem to have
been corrupted.

I'm sure your restart log file showed recovery occurring.

Oracle is getting smarter and smarter!

Tom

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Rich Jesse
Sent: Monday, June 01, 2009 3:02 PM
To: Oracle L
Subject: When control files go bad

Hey all,

Our 10.1.0.5.0 DBs on AIX had some "issues" this weekend after the A/C
suffered multiple failures in the server room.  The DB server itself was
OK,
but the SAN did an emergency shutdown from a temperature alarm.

Our SAN houses all datafiles, redo logs, archived logs, FRA, and 2/3 of
the
control files (remember that last part!).

The alert.log shows something very close to this:

Sat May 30 18:10:57 2009
Errors in file /oracle/admin/db/bdump/oprd_ckpt_324056.trc:
ORA-00221: error on write to controlfile
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/oracle/data/db/control02.ctl'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 9
Additional information: 3
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/oracle/data/db/control01.ctl'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 9
Additional information: 3
Sat May 30 18:10:57 2009
CKPT: terminating instance due to error 221

After the A/C was back online and the ambient temp in operating range
again,
the SAN was restarted and had it's cache flushed to disk.  The DB server
was
halted (not shutdown) and restarted.  I started the DB manually with
nomount, mount, and finally open, all successfully.

My question -- why???  I fully expected to have to rebuild the
controlfile
or at least copy controlfile 3 back to 1 and 2, but all were apparently
consistent prior to startup (in hindsight, I should have copied them to
another place before attempting a restart!).  And this same scenario was
for
three DBs across two physical servers.

The current working theory is that Oracle had nothing to do with the
controlfiles being up-to-date, but that it was the SAN flush to disk.
Or is
it possible that Oracle determined that controlfile 3 was the up-to-date
one
and did the copy back to 1 and 2 for me?  I didn't think that
functionality
existed since there's nothing in the alert.log about that and scanning
the
docs didn't turn up anything either.

The last time I had this happen to me, there was no local controlfile
and
the SAN got disconnected.  I ended up rebuilding the controlfile from
the
daily trace.

Thoughts?

Rich


--
//www.freelists.org/webpage/oracle-l




--
//www.freelists.org/webpage/oracle-l


Other related posts: