RE: Redo Log improvement - how to analyze and attack?

  • From: <Christopher.Taylor2@xxxxxxxxxxxx>
  • To: <lcarapinha@xxxxxxxxx>, <Mark.Bobak@xxxxxxxxxxxx>
  • Date: Thu, 15 Nov 2012 15:29:02 -0600

Luis,
Any idea how much its slowing down my system?  How can we (as DBAs) quantify 
the "how much" is "too much"?

This is kind of turned into a thought exercise I guess..I'm not trying to be an 
a$$.

Looks to me like my wait time over 5 hours was 998s - that's about 16 minutes.

I would "assume" that is high - but is it really?  How do we know?

Is anyone waiting on it?  No - it's a batch process that runs from 00:00 - 
05:00 in the morning.

This goes back to my first question I guess which was:
"Are high numbers of log file switches (during certain hours) enough validation 
for that an improvement is needed?"

                                                                  Avg
                                       %Time       Total Wait    wait     Waits
Wait Class                      Waits  -outs         Time (s)    (ms)      /txn
-------------------- ---------------- ------ ---------------- ------- ---------
Configuration                  63,949     .6              998      16       0.7


                                                                   Avg

                                             %Time  Total Wait    wait     Waits

Event                                 Waits  -outs    Time (s)    (ms)      /txn

---------------------------- -------------- ------ ----------- ------- ---------

log file sequential read             32,177     .0         648      20       0.3

log file switch(ckpt incomp          10,330    1.0         572      55       0.1

log file sync                        59,236     .0         183       3       0.6

log file switch completion            1,706    3.5         138      81       0.0

log file switch (archiving n          2,927     .2         121      41       0.0

Log archive I/O                      27,030     .0          22       1       0.3





Statistic                                     Total     per Second     per Trans

-------------------------------- ------------------ -------------- -------------

redo log space requests                     114,499            6.4           1.2

redo log space wait time                     85,008            4.7           0.9

redo ordering marks                       1,597,131           88.7          16.7

redo size                            26,127,511,776    1,451,121.6     272,624.5

redo subscn max counts                      163,257            9.1           1.7

redo synch time                              18,839            1.1           0.2

redo synch writes                            58,819            3.3           0.6

redo wastage                            151,718,608        8,426.5       1,583.1

redo write time                             131,085            7.3           1.4

redo writer latching time                       136            0.0           0.0

redo writes                                 499,869           27.8           5.2





Instance Activity Stats - Thread ActivityDB/Inst: CCMNASP1/CCMNASP1N1  Snaps:

-> Statistics identified by '(derived)' come from sources other than SYSSTAT



Statistic                                     Total  per Hour

-------------------------------- ------------------ ---------

log switches (derived)                          711    142.16

          -------------------------------------------------------------



From: Luis [mailto:lcarapinha@xxxxxxxxx]
Sent: Thursday, November 15, 2012 2:11 PM
To: Taylor Christopher - Nashville
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Redo Log improvement - how to analyze and attack?

From my humble experience and for a OLTP system (batch jobs?), 300 redo log 
switchs per hour is _VERY_ high. Some say that rule of thumb is 2 to 5 redo log 
switches per hour/max, so you are getting it too high. Please consider increase 
you log file size.

Looking at your AWR, the system is producing too much redo before LGWR can 
actually do its work, so users need to wait for it (wait:  log file switch 
completion). As expected, you also have a Incomplete Checkpoint wait, that 
means that your DBWR is not able to complete the checkpoint as of course, it is 
slowing down your operations..






On Wed, Nov 14, 2012 at 9:49 PM, 
<Christopher.Taylor2@xxxxxxxxxxxx<mailto:Christopher.Taylor2@xxxxxxxxxxxx>> 
wrote:
I have this 3-Node RAC (10.2.0.4) that I've inherited (as I've mentioned) and 
after putting out a few issues I'm ready to move onto the redo logs (and 
perhaps I should have started here earlier).
First question:
Are high numbers of log file switches (during certain hours) enough validation 
for that an improvement is needed?

In my case, during heavy batch processing, this RAC cluster may experience 300 
log switches during an hour (I know, I know - sounds bad doesn't it)

(Caveat: The log file switches were generated from within Toad - I'm assuming 
they are valid)

My first step was to enable FAST_START_MTTR_TARGET and it is set for 3 minutes 
(180 seconds).

Here's some of my numbers for the time interval on Node1:

Redo Size:
1,451,121.63 per second

AWR redo/log stats (formatted) here:
https://gist.github.com/4075050

Second question:
V$INSTANCE_RECOVERY OPTIMAL_LOGFILE_SIZE = 5421 (~5 GB)

Is OPTIMAL_LOGFILE_SIZE calculated by the *EXISTING LOGFILE GROUPS*?  So if I 
had more groups, would this number go down?

Third Question:
I've seen various mentions of " archive_lag_target" - what do I need to think 
about regarding this parameter in relation to check points?


Any suggestions/thoughts are appreciated.

Regards,

Chris Taylor
Oracle DBA
Parallon IT&S
christopher.taylor2@xxxxxxxxxxxx<mailto:christopher.taylor2@xxxxxxxxxxxx><mailto:christopher.taylor2@xxxxxxxxxxxx<mailto:christopher.taylor2@xxxxxxxxxxxx>>
www.parallon.net<http://www.parallon.net>



--
//www.freelists.org/webpage/oracle-l




--
Cumprimentos,
Luís Marques


--
//www.freelists.org/webpage/oracle-l


Other related posts: