Re: Requesting a RFS walkthrough

  • From: Jurijs Velikanovs <j.velikanovs@xxxxxxxxx>
  • To: sacrophyte@xxxxxxxxx
  • Date: Mon, 15 Jun 2009 15:41:48 +1000

Hi Charles,

Did you find the good explanation on what happened during the time
period reflected in allert.log bit you published?
My version is as following:
-A- 10:45:02 2009 both RFS processes lost their connections to the
processes those has been created by on Primary database.
To my understanding RFS processes are LOCAL=NO processes on Standby
side, therefore those are created from Primary side.

Just to confirm that you can do the following:
==========================================================================================
RAN ON STANDBY
SQL> SELECT PID, STATUS,PROCESS,CLIENT_PROCESS,thread#, sequence#,
block# FROM V$MANAGED_STANDBY where PROCESS='RFS';

       PID STATUS       PROCESS   CLIENT_P    THREAD#  SEQUENCE#     BLOCK#
---------- ------------ --------- -------- ---------- ---------- ----------
      6319 IDLE         RFS       LGWR              1        127         14
      6419 IDLE         RFS       UNKNOWN           0          0          0

SQL>

$ ps -fp 6319
oracle    6319     1  0 07:27 ?        00:00:00 oracleGTST (LOCAL=NO)
$ /usr/sbin/lsof -p 6319 | grep TCP
oracle  6319 oracle   18u  IPv4 519196              TCP
rhel4_1.testing.com:1528->rhel4_1.testing.com:8248 (ESTABLISHED)
# As both Standby and Primary are on the same host in my cfg you see
both records. In real world case you would see just one on primary
side.
$ netstat -tp | grep 8248
tcp        0      0 rhel4_1.testing.com:1528
rhel4_1.testing.com:8248    ESTABLISHED 6319/oracleGTST
tcp        0      0 rhel4_1.testing.com:8248
rhel4_1.testing.com:1528    ESTABLISHED 6301/ora_lns1_STST
===== CONCLUSION ===== > 6301/ora_lns1_STST is the farther (on
primary) of RFS LGWR process (on secondary)

oracle    6419     1  4 07:27 ?        00:00:11 oracleGTST (LOCAL=NO)
oracle  6419 oracle   18u  IPv4 519550              TCP
rhel4_1.testing.com:1528->rhel4_1.testing.com:8256 (ESTABLISHED)
tcp        0      0 rhel4_1.testing.com:1528
rhel4_1.testing.com:8256    ESTABLISHED 6419/oracleGTST
tcp        0      0 rhel4_1.testing.com:8256
rhel4_1.testing.com:1528    ESTABLISHED 17964/ora_arc1_STST
===== CONCLUSION ===== > 17964/ora_arc1_STST is the farther (on
primary) of RFS UNKNOWN process (on secondary)
==========================================================================================
It might happen for 2 reasons:
1. network interrupted (you said it is unlikely)
2. both processes diapered on Primary side. If this is the case you
would find some messages in Primary alert.log file

- B - 10:45:15 2009 Aster redolog switch Primary database successfully
tried to reestablish RFS processes. Those started to ship redolog
stream. Standby DB recognized that it missing a bit of REDO stream to
continue real time redo processing and contacted Primary for gap
resolution using FAL_SERVER parameter.

Those just my 0.02$,
Regards,
Yury


On Sat, Jun 6, 2009 at 4:09 AM, Charles Schultz<sacrophyte@xxxxxxxxx> wrote:
> Very interesting, but we are not receiving any ora-600 errors. As shown in
> the example, the issue is self-correcting; I am merely trying to understand
> the mechanics at a deeper level.
>
> On Fri, Jun 5, 2009 at 12:20, Goulet, Richard <Richard.Goulet@xxxxxxxxxxx>
> wrote:
>>
>> Check out Metalink
>> Doc ID: 386417.1
>>  understanding
>>
>> Dick Goulet
>> Senior Oracle DBA
>> PAREXEL International
>>
>>
>> ________________________________
>> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
>> On Behalf Of Charles Schultz
>> Sent: Friday, June 05, 2009 12:57 PM
>> To: ORACLE-L
>> Subject: Requesting a RFS walkthrough
>>
>> Good day, list,
>> I am curious if someone would be able to divulge some of the under
>> pinnings of RFS. The snippet below is from our alert.log on the standby;
>> this is one of two 10.2.0.2 physical standby databases with no DB Broker
>> using real-time apply, and we have been seeing this situation happen more
>> and more often. Our networking group tells us there are no issues with the
>> infrastructure, so I am struggling to understand why this happening.
>> My basic understanding of RFS explains the situation like so:
>>
>> At 10:45:02, the heartbeat process attempts to talk to Primary but is
>> unable to. I assume there are two similar messages because we have two
>> FAL_SERVERS, but I could be way off in that.
>> The next few verbose messages show a connection being reestablished with
>> the primary.
>> At 10:45:20, FAL attempts to get logseq 6045. Why does it fail? The log
>> was archived on the Primary at 10:45:15 (v$archived_log.completion_time).
>> It fails again at 10:45:55, but succeeds at 10:45:56. Why?
>>
>> Fri Jun  5 10:45:02 2009
>> RFS[143]: Possible network disconnect with primary database
>> Fri Jun  5 10:45:05 2009
>> RFS[144]: Possible network disconnect with primary database
>> Fri Jun  5 10:45:15 2009
>> RFS[142]: Successfully opened standby log 22:
>> '+DATA/gcprod/onlinelog/group_22.352.633713051'
>> Fri Jun  5 10:45:15 2009
>> RFS[139]: Successfully opened standby log 25:
>> '+DATA/gcprod/onlinelog/group_25.262.633713253'
>> Fri Jun  5 10:45:16 2009
>> Redo Shipping Client Connected as PUBLIC
>> -- Connected User is Valid
>> RFS[147]: Assigned to RFS process 18916
>> RFS[147]: Identified database type as 'physical standby'
>> Primary database is in MAXIMUM PERFORMANCE mode
>> Primary database is in MAXIMUM PERFORMANCE mode
>> Fri Jun  5 10:45:16 2009
>> Redo Shipping Client Connected as PUBLIC
>> -- Connected User is Valid
>> RFS[148]: Assigned to RFS process 18918
>> RFS[148]: Identified database type as 'physical standby'
>> Primary database is in MAXIMUM PERFORMANCE mode
>> Fri Jun  5 10:45:16 2009
>> RFS[147]: Successfully opened standby log 21:
>> '+DATA/gcprod/onlinelog/group_21.351.633713041'
>> Fri Jun  5 10:45:16 2009
>> Primary database is in MAXIMUM PERFORMANCE mode
>> RFS[148]: Successfully opened standby log 24:
>> '+DATA/gcprod/onlinelog/group_24.282.633713233'
>> Fri Jun  5 10:45:20 2009
>> Fetching gap sequence in thread 1, gap sequence 6045-6045
>> FAL[client]: Error fetching gap sequence, no FAL server specified
>> Fri Jun  5 10:45:55 2009
>> FAL[client]: Failed to request gap sequence
>>  GAP - thread 1 sequence 6045-6045
>>  DBID 1222165961 branch 631792716
>> FAL[client]: All defined FAL servers have been attempted.
>> -------------------------------------------------------------
>> Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
>> parameter is defined to a value that is sufficiently large
>> enough to maintain adequate log switch information to resolve
>> archivelog gaps.
>> -------------------------------------------------------------
>> Fri Jun  5 10:45:56 2009
>> RFS[145]: Archived Log: '+FRA/gcprod/archivelog/1_6045_631792716.dbf'
>> Fri Jun  5 10:46:30 2009
>> Media Recovery Log +FRA/gcprod/archivelog/1_6045_631792716.dbf
>>
>> --
>> Charles Schultz
>
>
>
> --
> Charles Schultz
>



-- 
Jurijs
+371 29268222 (+2 GMT)
============================================
http://otn.oracle.com/ocm/jvelikanovs.html
--
//www.freelists.org/webpage/oracle-l


Other related posts: