RE: When control files go bad

  • From: "Goulet, Richard" <Richard.Goulet@xxxxxxxxxxx>
  • To: <rjoralist@xxxxxxxxxxxxxxxxxxxxx>, "Oracle L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 1 Jun 2009 15:18:52 -0400

Rich,

        I'd believe that flushing the san to disk did the job.  Most
san's mark a write complete when their disk cache has captured the
write.  The memory then gets flushed to disk on san shutdown (normally)
or at predetermined times. 


Dick Goulet
Senior Oracle DBA
PAREXEL International

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Rich Jesse
Sent: Monday, June 01, 2009 3:02 PM
To: Oracle L
Subject: When control files go bad

Hey all,

Our 10.1.0.5.0 DBs on AIX had some "issues" this weekend after the A/C
suffered multiple failures in the server room.  The DB server itself was
OK,
but the SAN did an emergency shutdown from a temperature alarm.

Our SAN houses all datafiles, redo logs, archived logs, FRA, and 2/3 of
the
control files (remember that last part!).

The alert.log shows something very close to this:

Sat May 30 18:10:57 2009
Errors in file /oracle/admin/db/bdump/oprd_ckpt_324056.trc:
ORA-00221: error on write to controlfile
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/oracle/data/db/control02.ctl'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 9
Additional information: 3
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/oracle/data/db/control01.ctl'
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 9
Additional information: 3
Sat May 30 18:10:57 2009
CKPT: terminating instance due to error 221

After the A/C was back online and the ambient temp in operating range
again,
the SAN was restarted and had it's cache flushed to disk.  The DB server
was
halted (not shutdown) and restarted.  I started the DB manually with
nomount, mount, and finally open, all successfully.

My question -- why???  I fully expected to have to rebuild the
controlfile
or at least copy controlfile 3 back to 1 and 2, but all were apparently
consistent prior to startup (in hindsight, I should have copied them to
another place before attempting a restart!).  And this same scenario was
for
three DBs across two physical servers.

The current working theory is that Oracle had nothing to do with the
controlfiles being up-to-date, but that it was the SAN flush to disk.
Or is
it possible that Oracle determined that controlfile 3 was the up-to-date
one
and did the copy back to 1 and 2 for me?  I didn't think that
functionality
existed since there's nothing in the alert.log about that and scanning
the
docs didn't turn up anything either.

The last time I had this happen to me, there was no local controlfile
and
the SAN got disconnected.  I ended up rebuilding the controlfile from
the
daily trace.

Thoughts?

Rich


--
//www.freelists.org/webpage/oracle-l



--
//www.freelists.org/webpage/oracle-l


Other related posts: