RE: Harry Houdini Corruptions?

  • From: "Nabil Jamaleddin" <nmjamaleddin@xxxxxxxxxxxxxxxx>
  • To: <knecht.stefan@xxxxxxxxx>, "'oracle-l-freelists'" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 10 Sep 2015 07:49:25 -0500

I doubt this is the case but about 15 years ago I seen this same thing when
the SA had two different volume managers managing the same disk subsystem.



I would think the encryption has to somehow be trigging this with the info
below. Maybe the blocks were never corrupted at all, maybe there is a
scheduled job that fixes them in the afternoon. Maybe there is a job that runs
and corrupts them in the am. Have you talk to your SA? Have you looked into
all your logfiles?



You are going to have to somehow get more information to solve this problem.











From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On
Behalf Of Stefan Knecht
Sent: Thursday, September 10, 2015 3:40 AM
To: oracle-l-freelists
Subject: Harry Houdini Corruptions?



Hi all



Got a scenario on a client system that has me puzzled.



RHEL box, running on top of a local disk hardware RAID 1-0, and linux kernel
(luks) encryption on top of that.



Database files are on an ext3 filesystem created on top of that encrypted raid
device.



Daily, but only from Monday to Friday, in the wee hours of the morning, a
handful (between 3 to 7) consecutive blocks get corrupted. They're garbage
data, not Oracle formatted blocks. They're in different files and in different
places.The pattern is totally random, sometimes it's table blocks, sometimes
indexes or LOBs. But every day, it's a bunch of them in consecutive order.



We detect those by running RMAN validate on all the files every 3 hours.



Then, around noon the same day, a re-validation runs again, and the corrupt
blocks are now valid.



So basically it follows this pattern:



Monday, 3:30AM - file 7 blocks 2200555 to 2200559 corrupt

Monday, 6:30AM - same blocks reported as corrupt

Monday, 9:30AM - same blocks reported as corrupt

Monday, just after noon - no corrupt blocks found.

Nothing until the next day.

Tuesday, 3:30AM - file 4 blocks 3101220 to 3101224 corrupt

Tuesday, 6:30AM - same blocks reported as corrupt

Tuesday, 9:30AM - same blocks reported as corrupt

Tuesday, just after noon - no corrupt blocks found.



And on and on it goes.



What on earth could be doing this?



There's no anti-virus or something crazy like that running. No OS jobs found
that would touch anything like this.



The only non-standard thing about that setup is the encryption, that I have not
encountered on a database server before. But I have a hard time understanding
(and particularly proving) that the encryption could be doing that.



Does any one have any wild ideas ? :)



Cheers



Stefan












--


------------------------------------------------------------------
This email is intended solely for the use of the addressee and may
contain information that is confidential, proprietary, or both.
If you receive this email in error please immediately notify the
sender and delete the email.
------------------------------------------------------------------

Other related posts: