RE: My worst nightmare - ORA-8103

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <maureen.english@xxxxxxxxxx>
  • Date: Wed, 11 Jun 2014 13:16:12 -0400

Gerwin's post about a possible undo vector bug seems promising.
With a bit of dancing you should be able to identify the keys of the missing
rows from the indexes (except maybe if they are also fubar due to an undo
vector bug).

Blocks needing some undo are indeed not corrupt per se. Trying to do delayed
block cleanout or somesuch operation on them in the face of a missing
relevant undo vector should go splat.

If this is a lost i/o only to the table, then the indexes should be good.

If you select just the rowid and columns present in an index the plan should
just read the index, in which case the count of the index should be greater
than the count of the table.

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Maureen English
Sent: Tuesday, June 10, 2014 6:55 PM
To: Kenny Payton
Cc: oracle-l@xxxxxxxxxxxxx; jonathan@xxxxxxxxxxxxxxxxxx
Subject: Re: My worst nightmare - ORA-8103

This is really an odd problem.  We ran the rman validate commands on
datafiles and the whole database.
We also ran dbverify on all of the datafiles.  Nothing comes back marked as
corrupt.  So, maybe the word corrupt isn't quite right.  The data is truly
inaccessible, though.  Oracle says it's lost I/O:

> The ora-8103 was raising accessing rdba: 0x30c76bb9 (195/486329), 
> because the type=0x00= unkown
>
> scn: 0x0000.00000000 seq: 0x01 flg: 0x05 tail: 0x00000001
> frmt: 0x02 chkval: 0x5ad8 type: 0x00=unknown
>
> This block looks as a new block, I don't think had any data in.
> Means you had lost IO.

But when I copy all of the rows from the table that don't give me an
ORA-8103 error, using an Oracle provided script, I'm missing 480 rows....
Whether or not those rows have needed data in them or not is a good
question....

- Maureen

On 6/10/2014 1:23 PM, Kenny Payton wrote:
>
> You might want to run a "validate database" in RMAN and check
v$database_block_corruption.  You could also start at a datafile level if
you can narrow down where you think the corruption exists with a "validate
datafile" command.
>
> Once you identify the blocks you you can then look to RMAN for block
recovery as Jonathan suggested.  If you have had successful backups, and you
haven't set MAXCORRUPT to a non-zero value, then you should be able to use
those backups to recover the blocks.  Once the blocks are recorded in
v$database_block_corruption you could use "RECOVER CORRUPTION LIST" to
restore/recover these blocks.
>
>
> Kenny
>
>
>
> On Jun 10, 2014, at 5:05 PM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
wrote:
>
>>
>> The error number is still a puzzle - I would have expected 1578 or 1410
(corrupt block, or invalid rowid), but I guess if a block has got corrupted
so that it seems to belong to an object with a higher data object id than
expected then you might get an 8103.  (That's speculation - there may be
other more obvious reasons why you'd get an 8103 that I've overlooked).
>>
>> Recover from last know is the "obvious" strategy - but if you've been
doing rman backups then there's a "block recover" option that could tell
rman to pick specific blocks from the file backup and roll forward through
archived redo logs from there.  I guess this is why Oracle support have been
trying to find the rowids of the lost data. Traditionally you do this by
forcing a tablescan (e.g. select count(*) where non-indexed, nullable column
= value you don't expect to see) and checking what gets into the dump file.
Recover the guilty block, and repeat until no more corrupt blocks. Perhaps
you've already got the list of blocks since you know there are 480 rows
missing, though.
>>
>>
>>
>> Regards
>> Jonathan Lewis
>> http://jonathanlewis.wordpress.com
>> @jloracle
>>
>> ________________________________________
>> From: Maureen English [maureen.english@xxxxxxxxxx]
>> Sent: 10 June 2014 21:39
>> To: Jonathan Lewis; oracle-l@xxxxxxxxxxxxx
>> Subject: Re: My worst nightmare - ORA-8103
>>
>> No partitioned objects.
>>
>> Per Oracle Support, it's due to lost I/O.  The blocks are apparently 
>> empty, not formatted, data is probably not recoverable.  The Support 
>> Analyst suggested restore the datafiles for the tablespace from the 
>> last good backup then doing a restore/recover to roll forward to the
current time.
>>
>> - Maureen
>>
>> On 6/10/2014 11:07 AM, Jonathan Lewis wrote:
>>>
>>>
>>> Ora-8103 is "object no longer exists" - how does this tie in with a
corrupted block ?  Is this a partitioned object with a missing partition ?
>>>
>>>
>>> Regards
>>> Jonathan Lewis
>>> http://jonathanlewis.wordpress.com
>>> @jloracle
>>>
>>> ________________________________________
>>> From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] 
>>> on behalf of Maureen English [maureen.english@xxxxxxxxxx]
>>> Sent: 10 June 2014 19:28
>>> To: oracle-l@xxxxxxxxxxxxx
>>> Subject: My worst nightmare - ORA-8103
>>>
>>> Hi,
>>>
>>> Original post to BOracle list...apologies to those who are seeing this
again.
>>>
>>> On 5/31, the refresh of a materialized view in our reporting 
>>> instance failed with an ORA-8103 error, as did a gather stats job 
>>> for the same table in the production database.  We had an 
>>> application upgrade done on 6/1 and copied our production database 
>>> to a preprod version on 6/2, so the errors weren't caught immediately.
>>>
>>> Users complained that the current data wasn't available in the 
>>> reporting instance but my attempts to refresh and recreate all failed
with the same ORA-8103 error.
>>>
>>> I've been working with Oracle Support since early last week and keep 
>>> hitting brick walls.  At the moment, we've managed to copy 
>>> 'uncorrupted' rows out of the table in our preproduction database 
>>> and are 480 rows short in a table with 550M rows in it.  Oracle is 
>>> currently working to identify the rowids of the corrupt blocks based on
the output in a trace file generated by a failing query.
>>>
>>> I'm looking at Document 336133.1 while I wait for more info from Oracle.
>>>
>>> Anyone have any comments/suggestions/other info that might help 
>>> identify and fix the problem?  We really can't afford to lose that 
>>> much data.  We're working on recovering our database to a different 
>>> location to try to get back any data that we lose, but since I don't
have any idea what caused the corruption, I'm lost.
>>>
>>> - Maureen
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>
>> --
>> //www.freelists.org/webpage/oracle-l
>>
>>
>

--
Maureen English
Lead Database Administrator
University of Alaska
Fairbanks, AK
(907) 450-8329
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: