Hi Charles, Did you find the good explanation on what happened during the time period reflected in allert.log bit you published? My version is as following: -A- 10:45:02 2009 both RFS processes lost their connections to the processes those has been created by on Primary database. To my understanding RFS processes are LOCAL=NO processes on Standby side, therefore those are created from Primary side. Just to confirm that you can do the following: ========================================================================================== RAN ON STANDBY SQL> SELECT PID, STATUS,PROCESS,CLIENT_PROCESS,thread#, sequence#, block# FROM V$MANAGED_STANDBY where PROCESS='RFS'; PID STATUS PROCESS CLIENT_P THREAD# SEQUENCE# BLOCK# ---------- ------------ --------- -------- ---------- ---------- ---------- 6319 IDLE RFS LGWR 1 127 14 6419 IDLE RFS UNKNOWN 0 0 0 SQL> $ ps -fp 6319 oracle 6319 1 0 07:27 ? 00:00:00 oracleGTST (LOCAL=NO) $ /usr/sbin/lsof -p 6319 | grep TCP oracle 6319 oracle 18u IPv4 519196 TCP rhel4_1.testing.com:1528->rhel4_1.testing.com:8248 (ESTABLISHED) # As both Standby and Primary are on the same host in my cfg you see both records. In real world case you would see just one on primary side. $ netstat -tp | grep 8248 tcp 0 0 rhel4_1.testing.com:1528 rhel4_1.testing.com:8248 ESTABLISHED 6319/oracleGTST tcp 0 0 rhel4_1.testing.com:8248 rhel4_1.testing.com:1528 ESTABLISHED 6301/ora_lns1_STST ===== CONCLUSION ===== > 6301/ora_lns1_STST is the farther (on primary) of RFS LGWR process (on secondary) oracle 6419 1 4 07:27 ? 00:00:11 oracleGTST (LOCAL=NO) oracle 6419 oracle 18u IPv4 519550 TCP rhel4_1.testing.com:1528->rhel4_1.testing.com:8256 (ESTABLISHED) tcp 0 0 rhel4_1.testing.com:1528 rhel4_1.testing.com:8256 ESTABLISHED 6419/oracleGTST tcp 0 0 rhel4_1.testing.com:8256 rhel4_1.testing.com:1528 ESTABLISHED 17964/ora_arc1_STST ===== CONCLUSION ===== > 17964/ora_arc1_STST is the farther (on primary) of RFS UNKNOWN process (on secondary) ========================================================================================== It might happen for 2 reasons: 1. network interrupted (you said it is unlikely) 2. both processes diapered on Primary side. If this is the case you would find some messages in Primary alert.log file - B - 10:45:15 2009 Aster redolog switch Primary database successfully tried to reestablish RFS processes. Those started to ship redolog stream. Standby DB recognized that it missing a bit of REDO stream to continue real time redo processing and contacted Primary for gap resolution using FAL_SERVER parameter. Those just my 0.02$, Regards, Yury On Sat, Jun 6, 2009 at 4:09 AM, Charles Schultz<sacrophyte@xxxxxxxxx> wrote: > Very interesting, but we are not receiving any ora-600 errors. As shown in > the example, the issue is self-correcting; I am merely trying to understand > the mechanics at a deeper level. > > On Fri, Jun 5, 2009 at 12:20, Goulet, Richard <Richard.Goulet@xxxxxxxxxxx> > wrote: >> >> Check out Metalink >> Doc ID: 386417.1 >> understanding >> >> Dick Goulet >> Senior Oracle DBA >> PAREXEL International >> >> >> ________________________________ >> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] >> On Behalf Of Charles Schultz >> Sent: Friday, June 05, 2009 12:57 PM >> To: ORACLE-L >> Subject: Requesting a RFS walkthrough >> >> Good day, list, >> I am curious if someone would be able to divulge some of the under >> pinnings of RFS. The snippet below is from our alert.log on the standby; >> this is one of two 10.2.0.2 physical standby databases with no DB Broker >> using real-time apply, and we have been seeing this situation happen more >> and more often. Our networking group tells us there are no issues with the >> infrastructure, so I am struggling to understand why this happening. >> My basic understanding of RFS explains the situation like so: >> >> At 10:45:02, the heartbeat process attempts to talk to Primary but is >> unable to. I assume there are two similar messages because we have two >> FAL_SERVERS, but I could be way off in that. >> The next few verbose messages show a connection being reestablished with >> the primary. >> At 10:45:20, FAL attempts to get logseq 6045. Why does it fail? The log >> was archived on the Primary at 10:45:15 (v$archived_log.completion_time). >> It fails again at 10:45:55, but succeeds at 10:45:56. Why? >> >> Fri Jun 5 10:45:02 2009 >> RFS[143]: Possible network disconnect with primary database >> Fri Jun 5 10:45:05 2009 >> RFS[144]: Possible network disconnect with primary database >> Fri Jun 5 10:45:15 2009 >> RFS[142]: Successfully opened standby log 22: >> '+DATA/gcprod/onlinelog/group_22.352.633713051' >> Fri Jun 5 10:45:15 2009 >> RFS[139]: Successfully opened standby log 25: >> '+DATA/gcprod/onlinelog/group_25.262.633713253' >> Fri Jun 5 10:45:16 2009 >> Redo Shipping Client Connected as PUBLIC >> -- Connected User is Valid >> RFS[147]: Assigned to RFS process 18916 >> RFS[147]: Identified database type as 'physical standby' >> Primary database is in MAXIMUM PERFORMANCE mode >> Primary database is in MAXIMUM PERFORMANCE mode >> Fri Jun 5 10:45:16 2009 >> Redo Shipping Client Connected as PUBLIC >> -- Connected User is Valid >> RFS[148]: Assigned to RFS process 18918 >> RFS[148]: Identified database type as 'physical standby' >> Primary database is in MAXIMUM PERFORMANCE mode >> Fri Jun 5 10:45:16 2009 >> RFS[147]: Successfully opened standby log 21: >> '+DATA/gcprod/onlinelog/group_21.351.633713041' >> Fri Jun 5 10:45:16 2009 >> Primary database is in MAXIMUM PERFORMANCE mode >> RFS[148]: Successfully opened standby log 24: >> '+DATA/gcprod/onlinelog/group_24.282.633713233' >> Fri Jun 5 10:45:20 2009 >> Fetching gap sequence in thread 1, gap sequence 6045-6045 >> FAL[client]: Error fetching gap sequence, no FAL server specified >> Fri Jun 5 10:45:55 2009 >> FAL[client]: Failed to request gap sequence >> GAP - thread 1 sequence 6045-6045 >> DBID 1222165961 branch 631792716 >> FAL[client]: All defined FAL servers have been attempted. >> ------------------------------------------------------------- >> Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization >> parameter is defined to a value that is sufficiently large >> enough to maintain adequate log switch information to resolve >> archivelog gaps. >> ------------------------------------------------------------- >> Fri Jun 5 10:45:56 2009 >> RFS[145]: Archived Log: '+FRA/gcprod/archivelog/1_6045_631792716.dbf' >> Fri Jun 5 10:46:30 2009 >> Media Recovery Log +FRA/gcprod/archivelog/1_6045_631792716.dbf >> >> -- >> Charles Schultz > > > > -- > Charles Schultz > -- Jurijs +371 29268222 (+2 GMT) ============================================ http://otn.oracle.com/ocm/jvelikanovs.html -- //www.freelists.org/webpage/oracle-l