Standby hung following a network disconnect

  • From: Vasu Rajagopal <vrajagopal@xxxxxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 2 Jul 2010 23:15:00 -0400

Hi,

I have a Data Guard issue ,  It is RAC Production on 10.2.0.4  using  LGWR 
ASYNC redo transport to DR site which is also a RAC configuration (Physical 
Standby).
One of the standby database (in Real time apply mode) is hanging after NETWORK 
DISCONNECT error,  causing  few Gb/sec read I/O on Stanby Redo Logs (SRLs)  and 
it seems to be stuck forever.
This has occurred almost twice a week in the last 2 months .

As a temporary work-around,  Cancelled the  managed recovery process (MRP) and 
put it into ARCH apply mode, that seems to be working ,  though we would like 
to have the DR site
Running in REAL TIME APPLY mode.

I have got an update from Oracle saying :
Most of the issues relating to ora-3135 have ended up being a router / switch / 
firewall / http protocol issue, asking  , If cisco router is used then to 
disable the fixup protocol for the sqlnet port , etc.
However, I am not sure why the LNS/MRP processes are unable to recover and get 
back into normal mode after detecting the timeout .

Looking for inputs on ways to diagnose/resolve this.
Thanks,
Vasu

Here is the excerpt from log files showing disconnect :

Primary DB (mydb) alert log
----------------------------------------
Errors in file /u001/app/oracle/admin/mydb/bdump/mydb1_lns1_22694.trc:
ORA-03135: connection lost contact
Fri Jun 25 16:00:42 2010
LGWR: I/O error 3135 archiving log 2 to 
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_NAME=dgmydb2)(SERVER=dedicated)))'

Primary LNS Trace file --- mydb1_lns1_22694.trc
===========================
Sending online log thread 1 seq 14857 [logfile 2] to standby
Archiving to destination 
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_NAME=dgmydb2)(SERVER=dedicated)))
 ASYNC blocks=20480
Log file opened [logno 2]
*** 2010-06-25 16:00:42.157
RFS network connection lost at host 
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_NAME=dgmydb2)(SERVER=dedicated)))'
Error 3135 writing standby archive log file at host 
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_NAME=dgmydb2)(SERVER=dedicated)))'
ORA-03135: connection lost contact
*** 2010-06-25 16:00:42.170 64208 kcrr.c

Standby  alert log
-----------------------------------------------
Mem# 0: /netapp/oracle/dr/redologs11/dgmydb/group_32.1296.718556259
Fri Jun 25 16:00:42 2010
RFS[2]: Possible network disconnect with primary database
Fri Jun 25 18:14:17 2010
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[6]: Assigned to RFS process 10755
RFS[6]: Identified database type as 'physical standby'


________________________________
Fiberlink Disclaimer: The information transmitted is intended only for the 
person or entity to which it is addressed and may contain confidential and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.

Other related posts: