RE: db corruption

  • From: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxxxxx>
  • To: "Kevin Closson" <kevinc@xxxxxxxxxxxxx>, "ORACLE-L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 15 Aug 2006 13:17:35 -0400

Yep, I'm saying that when I see the error, it always occurs on the last
block of the datafile, and that it's precisely one block that's zeroed
out, to the byte.  In 8.1.7.4, there was the nasty side effect, due to a
bug, that Rman would try to read the disk, error out, read the mirror,
error out, and then bounce back to the disk, back to the mirror, and get
caught in an endless loop, and spew *lots* of errors to the alert.log.
We opened a TAR and got a patch for that, and that allowed us to back up
the problem file(s).  At that point, doing a restore re-formatted the
problem block, and the problem disappeared.

Just recently, I saw the same occurance again in 9.2.0.6, this time on
just a single datafile, and 9.2.0.6 fortunately doesn't have the bug
that causes Rman to get stuck in the endless loop.  I'm convinced this
is some obscure bug, but without a neat, tidy, reproducible test case, I
hate the idea of even thinking about opening an SR.


This reminds me of the days when we were on Dynix/ptx and found a really
obscure bug in the kernel's filesystem layer that caused archive log
corruption.  That was a fun one....days on end on the phone w/ Oracle
kernel developers and Dynix/ptx kernel engineers.  Oh yeah, that was
fun.....

-Mark


--
Mark J. Bobak
Senior Oracle Architect
ProQuest Information & Learning

Ours is the age that is proud of machines that can think and suspicious
of men who try to.  --H. Mumford Jones, 1892-1980


-----Original Message-----
From: Kevin Closson [mailto:kevinc@xxxxxxxxxxxxx] 
Sent: Tuesday, August 15, 2006 1:07 PM
To: Bobak, Mark; ORACLE-L
Subject: RE: db corruption


>>>Oracle 8.1.7.4 and 9.2.0.6, Solaris Sparc, raw volumes, served up 
>>>from an EMC DMX, via VxVM.

>>>I've had one database where this occurred once.  I've had another 
>>>where this happened to 60-70 datafiles.  Exact same type of 
>>>corruption, always the last block in the datafile.
>>>

Are you saying it is the last block precisely that is zeroed out? And
that is 100% of the block?  Given the config you've described, I think
only Oracle or VxVM could be to blame for that. Since the DMX is
virtualized by VxVM and the problem happened on 60-70 datafiles, I think
the odds are too astronomical that the DMX did it since those magic last
blocks are strewn all about.
--
//www.freelists.org/webpage/oracle-l


Other related posts: