ASSM tablespaces, HWM synchronization issues and a plethora of 10g bugs

From: "Don Granaman" <DonGranaman@xxxxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 5 Feb 2009 11:14:02 -0600

Does anyone have any experience with or knowledge of the problems with
ASSM tablespaces in 10g - especially those related to HWM
synchronization for objects in ASSM tablespaces?  The signature of those
errors is:

 

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [],
[], [], []

 

We have been getting these errors since moving from 9.2.0.4 to 10.2.0.4
in July 2008.  (We had long-standing issues with ASSM in 9.2 and were
told that they were "fixed in 10g".  Well, sort of...more like "bug
enhancement".  The 9i ASSM bugs were replaced by more pervasive 10g ASSM
bugs!)

 

It seems that these are usually transient errors - a query that fails
with this error once can almost always be immediately run again with
success.  I filed a service request and spent weeks dealing with support
on the issue, but there was never a satisfactory resolution so I simply
gave up.  The errors were never show-stoppers and subsided after the
first month, but for the last few weeks they have been coming up again.
I filed yet another service request and this time they going off in
another (probably wrong) direction - wanting me to "fix the corruption"
by exporting, dropping and recreating the affected objects, then
importing the data back in.  There are several problems with this
pseudo-solution:

 

1)       In testing, the supposed "corruption" reappears shortly (hours
or less) after the rebuild.

 

2)       Some of the affected objects in production are between 100 GB
and 1 TB - and are core to an ultra-critical 24xForever system.

 

There is (supposedly) a patch (6474009) that when combined with
event="43809 trace name context forever, level 1" sort of "masks" the
problem, but there are evidently no patches to actually fix the core
problem (admitted by support, but 'It's fixed in 11.1.0.7".  I've heard
it before, when we were in 9i - "It's fixed in 10g".).

 

You can check for the HWM discrepancy with the (undocumented)
DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM.  After some haranguing, support
did publish Doc_ID: 726653.1 - the definition.  This was developed to
help diagnose and repair the 6474009 bug, but has its own little jewel
of a bug - 6493013 (DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with
check_only=0 can corrupt blocks).

 

It seems that this is the "groundhog day" bug.  No matter what, it seems
to start over again with the same symptoms.  So far, everything I've
tried (in a test system) either only very temporarily "fixes" the
problem - or has its own little galaxy of bugs.

 

Perhaps the only *real* solution is to move *everything* out of ASSM
tablespaces, but that isn't really a great option either as this is a
RAC system.

 

Don Granaman (nee: OraSaurus)

 

Confidentiality Notice: The content of this communication, along with
any attachments, is covered by federal and state law governing
electronic communications and may contain confidential and legally
privileged information. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination,
distribution, use or copying of the information contained herein is
strictly prohibited. If you have received this communication in error,
please immediately contact us by telephone at 402.361.3000 or e-mail
security@xxxxxxxxxxxxxxxx Thank you.

References:
- RE: Method for migrating schema from PROD to DEV
  - From: John Hallas

ASSM tablespaces, HWM synchronization issues and a plethora of 10g bugs

Other related posts: