RE: Advice, archive redo log apply to slow

  • From: "George Leonard" <george@xxxxxxxxxxxx>
  • To: "'Mark W. Farnham'" <mwf@xxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 7 Dec 2006 20:15:57 +0200

Hi

Ok, wow, allot of info there

Some more information.

 We're talking 9.2.0.7

Old school physical standby.

Archive location from where the files are read to be applied is made
from 20 luns, each lun 5 disks/raid 5. 10 luns over 2 hba's and the
seonc 10 over a 2nd set of 2. then all together meta device to produce
/oracle_arch


 
George Leonard
________________________________________________________________________
 
Email: george@xxxxxxxxxxxx
 
Coding is easy. All you do is sit staring at a terminal until the drops
of blood form on your forehead.
-----Original Message-----
From: Mark W. Farnham [mailto:mwf@xxxxxxxx] 
Sent: 07 December 2006 16:06 PM
To: george@xxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: Advice, archive redo log apply to slow

Okay, so you have 32 cpus writing to database files plus the single
threaded
sequential write from memory to the redo log and then from redo log to
archived redo log on the primary.

On recovery you might well have 1 cpu reading the log and trying to
write to
all those files on all those independently running hbas.

If this was old school user managed standby like before they made it a
product, you could just feed multiple streams of recover tablespace to
fan
out the writes to the tablespace files. For dataguard you do it with the
parallelism settings. See chapter 8.6 of b14239.  (Main points are
recover
standby database parallel 64 (2*cpu they recommend, I might shoot a
little
low with 32 myself), db_block_checking=false  (faster, but do you really
want to?), parallel_execution_message_size <as big as you can>, and
disk_async_io to true.

If those archived redo logs are file system files, you can eliminate the
read from disk speed issue by running ahead a ways by copying the files
to
dev null (use something that does *NOT* bypass the file system cache,
reserve a couple cpus by lowering the 64 some.) Then you should get the
file
from file system cache and at least you'll see if you have a flat out
"I'm
just one process, how y'all expect me to keep up with the work that 32
did
on the other system" problem.

If you're able to determine that the read speed of the archived logs
you're
trying to apply remains part of the problem (let's say you're already
using
high parallelism, which I didn't credit you with at the outset, plus you
probably don't want parallel that high on your primary and you wrote
they
match - make a note to reset that if you have to fail over, by the way,
or
else you might get some surprises) then you might need to stage the
archived
redo on solid state disk. Same idea as the read ahead into file system
cache, but also works for raw.) That might get a bit complicated to
manage,
but nothing that a bit of tailing the alert log and perl or shell
programming can't handle - more tedious to get it absolutely correct
than
anything else.

Maybe Carel-Jan will chime in - he's probably done more "make dataguard
faster stuff" than anyone else I know, and I'm pretty confident he won't
tell you to do anything risky.

If none of that works out for you we can try configuring it old school.
It
is just recovery after all the super fancy tools to help you avoid
making a
catastrophic mistake and make it easier for you to configure are taken
away.
That might mean reconfiguring the disk farm BORING so you can be certain
that the tablespace recoveries do not compete with each other for disk
access and setting up manual archive log name feeding to the multiple
recovery sessions in sets that write to one stripe set at a time... but
I've
gone on too long already.

Hope this helps.

mwf



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On
Behalf Of George
Sent: Thursday, December 07, 2006 8:01 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Advice, archive redo log apply to slow

Hi all.

ok this is a difficult one.

Production is a IBM p595 (Power 5) 32 cpu's
DR is a IBM 690 (Power 4) 32 cpu's

the hardware configuration is exactly the same other than for the 595 to
690.

both machines has 32 hba's (2 GB/ports) connecting to exactly the same
lun
design, down to naming, number of spindles. spindle speeds etc, whats
located on the spindles, kernel parameters.

at the moment production can product 4000 archive redo log files per
day,
DR can only apply 3000.

wierd thing is DR IO layers are all idling, this is the view from the
spindle performance, HBA's everything is idling, cpu's max usage is like
25%.

hwo can i dig deeper, how can i get DR to apply log files faster.

oh each log file is 250mb.

DR while applying log files say via nmon it is reading 10mb/sec. we have
shown we can do 10 times that easily.

thanks.


------------------------------------------------
George

george@xxxxxxxxxxxx

You Have The Obligation to Inform One Honestly of the risk, And As a
Person
You Are Committed to Educate Yourself to the Total Risk In Any Activity!
Once Informed & Totally Aware of the Risk,
Every Fool Has the Right to Kill or Injure Themselves as They See Fit!



-------------------------------------------
For super low premiums, click here http://www.webmail.co.za/dd.pwm

--
//www.freelists.org/webpage/oracle-l






--
//www.freelists.org/webpage/oracle-l


Other related posts: