Re: be aware before thinking about deploy 11gR2 RAC

  • From: amonte <ax.mount@xxxxxxxxx>
  • To: Martin Bach <development@xxxxxxxxxxxxxxxxx>
  • Date: Wed, 24 Feb 2010 15:28:25 +0100

Hi

These are the errors, a bit tricky to read! Right now this is a showstopper
for us to go with 11gR2 RAC.



*ASM alert when pmon was killed:*
Mon Feb 01 11:25:02 2010
Shutting down instance (abort)
License high water mark = 6
USER (ospid: 279022): terminating the instance
Mon Feb 01 11:25:03 2010
ORA-1092 : opitsk aborting process
Mon Feb 01 11:25:03 2010
ORA-1092 : opitsk aborting process
Instance terminated by USER, pid = 279022
Mon Feb 01 11:25:04 2010
Instance shutdown complete
Mon Feb 01 11:25:05 2010
....................
Starting ORACLE instance (normal)
....................
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
NOTE: diskgroup resource ora.OCR.dg is online
NOTE: diskgroup resource ora.DATA01.dg is online
Mon Feb 01 11:25:24 2010
Starting background process ASMB



*Database alert after restarted by crsd*
Mon Feb 01 11:25:03 2010
NOTE: ASMB terminating
Errors in file
/u01/app/oracle/diag/rdbms/cpmdga/cpmdga1/trace/cpmdga1_asmb_332148.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 48 Serial number: 107
ASMB (ospid: 332148): terminating the instance due to error 15064
Mon Feb 01 11:25:03 2010
System state dump is made for local instance
System State dumped to trace file
/u01/app/oracle/diag/rdbms/cpmdga/cpmdga1/trace/cpmdga1_diag_413936.trc
Mon Feb 01 11:25:03 2010
ORA-1092 : opitsk aborting process
Mon Feb 01 11:25:03 2010
License high water mark = 15
Trace dumping is performing id=[cdmp_20100201112503]
Instance terminated by ASMB, pid = 332148
USER (ospid: 368740): terminating the instance
Instance terminated by USER, pid = 368740
......................
Mon Feb 01 11:25:25 2010
PMON started with pid=2, OS id=360536
......................
Mon Feb 01 11:25:32 2010
ALTER DATABASE MOUNT
NOTE: Loaded library: System
SUCCESS: diskgroup DATA01 was mounted
*ERROR: failed to establish dependency between database racdb and diskgroup
resource ora.DATA01.dg*
Successful mount of redo thread 1, with mount id 1487734179
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT



*crsd log*
2010-02-01 11:25:02.472: [UiServer][13629] Done for ctx=11a880230
2010-02-01 11:25:02.683: [UiServer][13886] S(11a87ff10): set Properties (
grid,11a87c9b0)
2010-02-01 11:25:02.693: [UiServer][13629] processMessage called
2010-02-01 11:25:02.693: [UiServer][13629] Sending message to PE. ctx=
11a8802b0
2010-02-01 11:25:02.694: [   CRSPE][13115] Processing PE command id=1090.
Description: [Stat Resource : 11a8821b0]
2010-02-01 11:25:02.696: [   CRSPE][13115] PE Command [ Stat Resource :
11a8821b0 ] has completed
2010-02-01 11:25:02.697: [   CRSPE][13115] UI Command [Stat Resource :
11a8821b0] is replying to sender.
2010-02-01 11:25:02.702: [UiServer][13629] Done for ctx=11a8802b0
2010-02-01 11:25:02.705: [    AGFW][11830] Agfw Proxy Server received the
message: RESOURCE_STATUS[Proxy] ID 20481:195684
2010-02-01 11:25:02.705: [    AGFW][11830] Received state change for ora.asm
racsrv1 1 [old state = ONLINE, new state = OFFLINE]
2010-02-01 11:25:02.705: [    AGFW][11830] Received state LABEL change for
ora.asm racsrv1 1 [old label  = Started, new label = Abnormal Termination]
2010-02-01 11:25:02.705: [    AGFW][11830] Agfw Proxy Server sending message
to PE, Contents =
[MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:120619]
2010-02-01 11:25:02.706: [    AGFW][11830] Agfw Proxy Server replying to the
message: RESOURCE_STATUS[Proxy] ID 20481:195684
2010-02-01 11:25:02.706: [   CRSPE][13115] State change received from
racsrv1 for ora.asm racsrv1 1
................
2010-02-01 11:25:02.708: [   CRSPE][13115] Resource Resource Instance
ID[ora.asm racsrv1 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=racsrv1
CURRENT_RCOUNT=0
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=Abnormal Termination
INCARNATION=0
STATE_CHANGE_VERS=0
LAST_FAULT=0
DEGREE_ID=1
ID=ora.asm racsrv1 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||ONLINE STATERECOVERED| has failed!
2010-02-01 11:25:02.708: [  CRSRPT][13372] Publishing event: Cluster
Resource State Change Event for ora.asm:racsrv1 : 11a881fd0
2010-02-01 11:25:02.708: [  CRSRPT][13372] Publish to eons buffered event :
11a881fd0
2010-02-01 11:25:02.709: [   CRSPE][13115] Processing unplanned state change
for [ora.asm racsrv1 1]
...............
2010-02-01 11:25:02.720: [   CRSPE][13115] CRS-2672: Attempting to start
'ora.asm' on 'bd1c
.................
2010-02-01 11:25:02.797: [    AGFW][11830] Received state change for
ora.OCR.dg racsrv1 1 [old state = ONLINE, new state = OFFLINE]
2010-02-01 11:25:02.798: [    AGFW][11830] Agfw Proxy Server sending message
to PE, Contents =
[MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:120653]
2010-02-01 11:25:02.798: [    AGFW][11830] Agfw Proxy Server replying to the
message: RESOURCE_STATUS[Proxy] ID 20481:195696
2010-02-01 11:25:02.798: [    AGFW][11830] Received the reply to the
message: RESOURCE_PROBE[ora.OCR.dg racsrv1 1] ID 4097:120626 from the agent
/u01/grid/11.2.0/bin/oraagent_grid
2010-02-01 11:25:02.798: [   CRSPE][13115] State change received from
racsrv1 for ora.OCR.dg racsrv1 1
2010-02-01 11:25:02.799: [    AGFW][11830] ora.OCR.dg racsrv1 1 received
state from probe request. Old state = OFFLINE, New state = OFFLINE
2010-02-01 11:25:02.799: [   CRSPE][13115] Processing PE command id=1093.
Description: [Resource State Change (ora.OCR.dg racsrv1 1) : 11a87e2d0]
2010-02-01 11:25:02.799: [    AGFW][11830] Agfw Proxy Server sending the
last reply to PE for message:RESOURCE_PROBE[ora.OCR.dg racsrv1 1] ID
4097:120623
2010-02-01 11:25:02.799: [   CRSPE][13115] RI [ora.OCR.dg racsrv1 1] new
external state [OFFLINE] old value: [ONLINE] on racsrv1 label = []
2010-02-01 11:25:02.800: [   CRSPE][13115] Resource Resource Instance
ID[ora.OCR.dg racsrv1 1]. Values:
................
2010-02-01 11:25:03.177: [   CRSPE][13115] RI [ora.racdb.db 1 1] new
external state [UNKNOWN] old value: [ONLINE] on racsrv1 label = [Abnormal
Termination] override:OFFLINE
2010-02-01 11:25:03.177: [   CRSPE][13115] Set State Details to [Abnormal
Termination] from [ Open] for [ora.racdb.db 1 1]
..................
2010-02-01 11:25:03.335: [   CRSPE][13115] Got agent-specific msg:
ORA-01092: ORACLE instance terminated. Disconnection forced
Process ID: 0
Session ID: 0 Serial number: 0

2010-02-01 11:25:03.449: [    AGFW][11830] Received the reply to the
message: RESOURCE_START[ora.OCR.dg racsrv1 1] ID 4098:120660 from the agent
/u01/grid/11.2.0/bin/oraagent_grid
2010-02-01 11:25:03.451: [    AGFW][11830] Agfw Proxy Server sending the
reply to PE for message:RESOURCE_START[ora.OCR.dg racsrv1 1] ID 4098:120659
2010-02-01 11:25:03.452: [    AGFW][11830] Received the reply to the
message: RESOURCE_START[ora.OCR.dg racsrv1 1] ID 4098:120660 from the agent
/u01/grid/11.2.0/bin/oraagent_grid
2010-02-01 11:25:03.452: [   CRSPE][13115] Received reply to action [Start]
message ID: 120659
2010-02-01 11:25:03.452: [   CRSPE][13115] Got agent-specific msg:
ORA-01092: ORACLE instance terminated. Disconnection forced
....................
2010-02-01 11:25:19.019: [   CRSPE][13115] ICE has queued an operation.
Details: Operation [START of [ora.racdb.db 1 1] on [racsrv1] : 11a8e1f50]
cannot run cause it needs W lock for: WO for Placement Path RI:[ora.racdb.db
1 1] server [racsrv1] target states [ONLINE INTERMEDIATE ], locked by op
[START of [ora.racdb.db 1 1] on [racsrv1] : 11a893b90]. Owner: CRS-2683: It
is locked by 'SYSTEM' for command 'Unplanned Resource State Change :
ora.racdb.db'
....................
2010-02-01 11:25:32.061: [UiServer][13629] Container [ Name: ORDER
    MESSAGE:
    TextMessage[CRS-2545: Cannot operate on 'ora.racdb.db with
CARDINALITY_ID=1'. It is locked by 'SYSTEM' for command 'Unplanned Resource
State Change : ora.racdb.db']
    MSGTYPE:
    TextMessage[1]
    OBJID:
    TextMessage[ora.racdb.db 1 1]
    WAIT:
    TextMessage[0]
]




2010/2/24 Martin Bach <development@xxxxxxxxxxxxxxxxx>

> Hi Alex,
>
> On 02/24/2010 08:09 AM, amonte wrote:
> > Hi
> >
> [...]
> >
> > We have been doing extensive destructive tests to check for these issues.
> >
>
> Would you mind sharing these with the rest of the list? I'd also be
> interested to see alert.log contents more detail from the ohasd logs :)
>
> Thanks and regards,
>
> Martin
>
> --
> Martin Bach
> OCM 10g
> http://martincarstenbach.wordpress.com
> http://www.linkedin.com/in/martincarstenbach
>

Other related posts: