RE: Application of archived logs on physical standby very slow

From: "Johnson, William L (TEIS)" <WLJohnson@xxxxxxxxxxxxxxxxxxx>
To: "development@xxxxxxxxxxxxxxxxx" <development@xxxxxxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 27 Jan 2010 12:18:58 -0500
What do you have your parallel_execution_message_size set to on the standby 
database?

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Martin Bach
Sent: Wednesday, January 27, 2010 12:13 PM
To: ORACLE-L
Subject: Application of archived logs on physical standby very slow

Dear listers,

I'd like to add another "interesting" problem to today's interesting 
RMAN problem. I have quite a bit of output from diagnostic tools, 
hopefully this doesn't got too badly mangled...

Our alerting system fired when my physical standby database went out of 
sync with production. DR and production share the same storage (an entry 
level MSA) and are multipathed. I could observe this on the hosts (both 
RHEL 5.2 32bit, don't ask):

Wed Jan 27 10:42:42 2010
RFS[2]: Archived Log: '/u03/oradata/dr/arch/1_49264_596779339.dbf'
Primary database is in MAXIMUM PERFORMANCE mode
Wed Jan 27 10:47:55 2010
Media Recovery Log /u03/oradata/dr/arch/1_49252_596779339.dbf
Wed Jan 27 10:55:27 2010
Media Recovery Log /u03/oradata/dr/arch/1_49253_596779339.dbf
Wed Jan 27 11:01:46 2010

Online redo logs are 500M in size and so are most archived logs, and as 
you can see it takes 5-10 minutes to apply just one, which is consistent.

top - 11:25:37 up 57 days, 17:34, 13 users,  load average: 0.55, 1.05, 1.19
Tasks: 209 total,   1 running, 208 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.8%us,  1.9%sy,  0.0%ni, 51.9%id, 43.5%wa,  0.0%hi,  0.0%si, 
0.0%st
Cpu1  :  1.9%us,  0.9%sy,  0.0%ni, 42.6%id, 54.6%wa,  0.0%hi,  0.0%si, 
0.0%st
Cpu2  :  0.9%us,  0.0%sy,  0.0%ni, 98.1%id,  0.9%wa,  0.0%hi,  0.0%si, 
0.0%st
Cpu3  :  0.0%us,  0.9%sy,  0.0%ni, 99.1%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   8310064k total,  8019968k used,   290096k free,    36456k buffers
Swap:  8152948k total,    70124k used,  8082824k free,  7568912k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

21647 oracle    15   0 2178m  48m  31m S    3  0.6   0:00.41 ora_mrp0_dr 

  7496 oracle    15   0 2145m 1.5g 1.5g S    2 18.4 190:47.10 
ora_p000_dr
  7500 oracle    15   0 2145m 1.5g 1.5g S    2 18.4 199:38.12 
ora_p002_dr
  2580 root      10  -5     0    0    0 S    1  0.0  17:42.07 
[kjournald] 

  7498 oracle    16   0 2145m 1.5g 1.5g D    1 18.4 189:40.17 
ora_p001_dr

The only processes on cpu are MRP0 and its parallel slaves. Funny 
enough, iostat doesn't seem to see a problem:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.00    0.00    0.25   24.75    0.00   75.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
cciss/c0d0        0.00     4.00  0.00  8.00     0.00    48.00    12.00 
    0.01    0.88   0.50   0.40
cciss/c0d1        0.00     5.00  0.00 18.00     0.00    92.00    10.22 
    0.02    0.89   0.06   0.10
sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
sdc               0.00   211.00  0.00 189.00     0.00  1600.00    16.93 
     0.49    2.57   2.57  48.60
sdd               0.00   203.00  0.00 183.00     0.00  1544.00    16.87 
     0.50    2.71   2.74  50.10
sde               0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
sdf               0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  0.00 786.00     0.00  3144.00     8.00 
     2.09    2.64   1.26  98.70
dm-3              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-4              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-5              0.00     0.00  0.00 786.00     0.00  3144.00     8.00 
     2.09    2.65   1.26  98.80
dm-6              0.00     0.00  0.00 23.00     0.00    92.00     8.00 
    0.02    0.83   0.04   0.10
dm-7              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00
dm-8              0.00     0.00  0.00  0.00     0.00     0.00     0.00 
    0.00    0.00   0.00   0.00

Service time is pretty decent I'd say and the SAN administrators tell me 
that everything on the array is fine. I checked for AIO and that's also 
used:

[oracle]$ cat /proc/slabinfo | grep ^k
kioctx                25     60    192   20    1 : tunables  120   60 
  8 : slabdata      3      3      0
kiocb                 64     90    128   30    1 : tunables  120   60 
  8 : slabdata      3      3      0

Data files are scattered over mount points /u02-/u06, and there are no 
multipathing (or other I/O related errors) reported in dmesg and 
/var/log/messages. Archive logs are received on the DR box in time, and 
there are no gaps. Our sys admin tells me that device mapper multipath 
is happily distributing IO round robin. Oh yes, and it's Oracle 10.2.0.4 
32bit.

Now does anyone have an idea on how to proceed?

Regards,

Martin
-- 
Martin Bach
OCM 10g
http://martincarstenbach.wordpress.com
http://www.linkedin.com/in/martincarstenbach
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l
References:
- Application of archived logs on physical standby very slow
  - From: Martin Bach
RE: Application of archived logs on physical standby very slow

Other related posts: