Re: Application of archived logs on physical standby very slow

  • From: Niall Litchfield <niall.litchfield@xxxxxxxxx>
  • To: Martin Bach <development@xxxxxxxxxxxxxxxxx>
  • Date: Thu, 28 Jan 2010 12:58:23 +0000

What does the 10046 trace for the MRP process show up? You may well also
endup doing os level tracing with strace or similar.



On Thu, Jan 28, 2010 at 9:15 AM, Martin Bach
<development@xxxxxxxxxxxxxxxxx>wrote:

> Hi Niall!
>
> couldn't agree more. I seem to have the dilemma where my storage array
> seems to be in good working order and my OS doesn't report any unusually
> high IO service times. I'm wondering where else I could look for clues as to
> why, trying to follow a methodical approach to troubleshooting.
>
> The load is clearly caused by Oracle processes, shutting the standby down
> reduces the iowaits to 0 (but that doesn't really solve my problem :)
>
> Regards,
>
> Martin
>
>
> On 01/27/2010 07:06 PM, Niall Litchfield wrote:
>
>> I'd have thought this problem was eminently suited to method-r.
>>
>> On 1/27/10, Johnson, William L (TEIS)<WLJohnson@xxxxxxxxxxxxxxxxxxx>
>>  wrote:
>>
>>> What do you have your parallel_execution_message_size set to on the
>>> standby
>>> database?
>>>
>>> -----Original Message-----
>>> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:
>>> oracle-l-bounce@xxxxxxxxxxxxx]
>>> On Behalf Of Martin Bach
>>> Sent: Wednesday, January 27, 2010 12:13 PM
>>> To: ORACLE-L
>>> Subject: Application of archived logs on physical standby very slow
>>>
>>> Dear listers,
>>>
>>> I'd like to add another "interesting" problem to today's interesting
>>> RMAN problem. I have quite a bit of output from diagnostic tools,
>>> hopefully this doesn't got too badly mangled...
>>>
>>> Our alerting system fired when my physical standby database went out of
>>> sync with production. DR and production share the same storage (an entry
>>> level MSA) and are multipathed. I could observe this on the hosts (both
>>> RHEL 5.2 32bit, don't ask):
>>>
>>> Wed Jan 27 10:42:42 2010
>>> RFS[2]: Archived Log: '/u03/oradata/dr/arch/1_49264_596779339.dbf'
>>> Primary database is in MAXIMUM PERFORMANCE mode
>>> Wed Jan 27 10:47:55 2010
>>> Media Recovery Log /u03/oradata/dr/arch/1_49252_596779339.dbf
>>> Wed Jan 27 10:55:27 2010
>>> Media Recovery Log /u03/oradata/dr/arch/1_49253_596779339.dbf
>>> Wed Jan 27 11:01:46 2010
>>>
>>> Online redo logs are 500M in size and so are most archived logs, and as
>>> you can see it takes 5-10 minutes to apply just one, which is consistent.
>>>
>>> top - 11:25:37 up 57 days, 17:34, 13 users,  load average: 0.55, 1.05,
>>> 1.19
>>> Tasks: 209 total,   1 running, 208 sleeping,   0 stopped,   0 zombie
>>> Cpu0  :  2.8%us,  1.9%sy,  0.0%ni, 51.9%id, 43.5%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu1  :  1.9%us,  0.9%sy,  0.0%ni, 42.6%id, 54.6%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu2  :  0.9%us,  0.0%sy,  0.0%ni, 98.1%id,  0.9%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu3  :  0.0%us,  0.9%sy,  0.0%ni, 99.1%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Mem:   8310064k total,  8019968k used,   290096k free,    36456k buffers
>>> Swap:  8152948k total,    70124k used,  8082824k free,  7568912k cached
>>>
>>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>
>>> 21647 oracle    15   0 2178m  48m  31m S    3  0.6   0:00.41 ora_mrp0_dr
>>>
>>>   7496 oracle    15   0 2145m 1.5g 1.5g S    2 18.4 190:47.10
>>> ora_p000_dr
>>>   7500 oracle    15   0 2145m 1.5g 1.5g S    2 18.4 199:38.12
>>> ora_p002_dr
>>>   2580 root      10  -5     0    0    0 S    1  0.0  17:42.07
>>> [kjournald]
>>>
>>>   7498 oracle    16   0 2145m 1.5g 1.5g D    1 18.4 189:40.17
>>> ora_p001_dr
>>>
>>> The only processes on cpu are MRP0 and its parallel slaves. Funny
>>> enough, iostat doesn't seem to see a problem:
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>             0.00    0.00    0.25   24.75    0.00   75.00
>>>
>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>>> cciss/c0d0        0.00     4.00  0.00  8.00     0.00    48.00    12.00
>>>     0.01    0.88   0.50   0.40
>>> cciss/c0d1        0.00     5.00  0.00 18.00     0.00    92.00    10.22
>>>     0.02    0.89   0.06   0.10
>>> sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> sdc               0.00   211.00  0.00 189.00     0.00  1600.00    16.93
>>>      0.49    2.57   2.57  48.60
>>> sdd               0.00   203.00  0.00 183.00     0.00  1544.00    16.87
>>>      0.50    2.71   2.74  50.10
>>> sde               0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> sdf               0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-2              0.00     0.00  0.00 786.00     0.00  3144.00     8.00
>>>      2.09    2.64   1.26  98.70
>>> dm-3              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-4              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-5              0.00     0.00  0.00 786.00     0.00  3144.00     8.00
>>>      2.09    2.65   1.26  98.80
>>> dm-6              0.00     0.00  0.00 23.00     0.00    92.00     8.00
>>>     0.02    0.83   0.04   0.10
>>> dm-7              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>> dm-8              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>>     0.00    0.00   0.00   0.00
>>>
>>> Service time is pretty decent I'd say and the SAN administrators tell me
>>> that everything on the array is fine. I checked for AIO and that's also
>>> used:
>>>
>>> [oracle]$ cat /proc/slabinfo | grep ^k
>>> kioctx                25     60    192   20    1 : tunables  120   60
>>>   8 : slabdata      3      3      0
>>> kiocb                 64     90    128   30    1 : tunables  120   60
>>>   8 : slabdata      3      3      0
>>>
>>> Data files are scattered over mount points /u02-/u06, and there are no
>>> multipathing (or other I/O related errors) reported in dmesg and
>>> /var/log/messages. Archive logs are received on the DR box in time, and
>>> there are no gaps. Our sys admin tells me that device mapper multipath
>>> is happily distributing IO round robin. Oh yes, and it's Oracle 10.2.0.4
>>> 32bit.
>>>
>>> Now does anyone have an idea on how to proceed?
>>>
>>> Regards,
>>>
>>> Martin
>>> --
>>> Martin Bach
>>> OCM 10g
>>> http://martincarstenbach.wordpress.com
>>> http://www.linkedin.com/in/martincarstenbach
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>>
>>>
>>
> --
> Martin Bach
> OCM 10g
> http://martincarstenbach.wordpress.com
> http://www.linkedin.com/in/martincarstenbach
>



-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

Other related posts: