RE: log buffer size and log file syncs

  • From: "CRISLER, JON A" <JC1706@xxxxxxx>
  • To: Tanel Poder <tanel@xxxxxxxxxxxxxx>
  • Date: Mon, 30 Apr 2012 14:56:22 +0000

Interesting thoughts Tanel: in this case of this specific app, the majority of 
the work is made of up small commits to a handful of tables on a 6 node RAC 
cluster.  I/O times are generally quite good, and with 32 cores per node the 
CPU and load average is very low.  Its 11gR1 - I was wondering if some of the 
tweaks to put LGWR at "real time" priority that are mentioned for 10g also 
apply to 11g.
From: tanel@xxxxxxxxxx [mailto:tanel@xxxxxxxxxx] On Behalf Of Tanel Poder
Sent: Friday, April 27, 2012 6:09 PM
To: CRISLER, JON A
Cc: oracle-l
Subject: Re: log buffer size and log file syncs

Yep, it's possible in theory and practice too - if you have some large 
(parallel) transactions generating redo on multiple CPUs - which flood the log 
buffer. This can drive the log file sync times up, as there's sometimes tens or 
100+ MB of redo waiting to be flushed before the commit can return.

With lower log buffer size, your large transaction would run slower as it would 
wait for log buffer space wait much more. Your smaller transactions wait for 
log buffer space as well, but as there's less redo to be flushed in the "queue" 
then they'll get their redo in there faster and also get the write done sooner 
too.

However - I would first look into what the LGWR itself is doing (how much CPU 
it's consuming vs waiting for IO, what's the average IO size (and redo size per 
message received by LGWR etc). A common cause for such symptoms is that LGWR 
just doesn't get enough CPU.

No need to guess, you can run snapper on LGWR too (snapper just queries v$ 
views so it doesn't matter whether it's a background or foreground session):

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                            
     ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     98, (LGWR)    , STAT, non-idle wait count                                  
     ,             2,         .4,
     98, (LGWR)    , STAT, messages sent                                        
     ,             2,         .4,
     98, (LGWR)    , STAT, messages received                                    
     ,             2,         .4,
     98, (LGWR)    , STAT, background timeouts                                  
     ,             2,         .4,
     98, (LGWR)    , STAT, physical write total IO requests                     
     ,             3,         .6,
     98, (LGWR)    , STAT, physical write total multi block requests            
     ,             1,         .2,
     98, (LGWR)    , STAT, physical write total bytes                           
     ,        232448,     46.49k,
     98, (LGWR)    , STAT, cell physical IO interconnect bytes                  
     ,        232448,     46.49k,
     98, (LGWR)    , STAT, redo wastage                                         
     ,            28,        5.6,
     98, (LGWR)    , STAT, redo writes                                          
     ,             1,         .2,
     98, (LGWR)    , STAT, redo blocks written                                  
     ,           274,       54.8,
     98, (LGWR)    , STAT, redo blocks checksummed by FG (exclusive)            
     ,           180,         36,
     98, (LGWR)    , TIME, background elapsed time                              
     ,           861,    172.2us,      .0%, |          |
     98, (LGWR)    , WAIT, rdbms ipc message                                    
     ,       4981797,   996.36ms,    99.6%, |@@@@@@@@@@|
     98, (LGWR)    , WAIT, log file parallel write                              
     ,           752,    150.4us,      .0%, |          |
--  End of Stats snap 1, end 12-04-28 00:40:59, seconds=5


If you happen to run on Solaris, then you can also run prstat -mLp PID on 
LGWR's process to see how much time it waits in the CPU runqueue  to get onto 
CPU (the LAT column - scheduling latency).

Another thing to definitely check is to see whether async IO is properly 
enabled, especially in the case of multiplexed redo logs...

--
Tanel Poder
Enkitec Europe
http://www.enkitec.com/
Advanced Oracle Troubleshooting v2.0 Seminars in May/June 2012!
http://blog.tanelpoder.com/seminar/



On Fri, Apr 27, 2012 at 11:36 PM, CRISLER, JON A 
<JC1706@xxxxxxx<mailto:JC1706@xxxxxxx>> wrote:
Has anybody run into a case where the log buffer size was too big, and 
negatively affected log file writes and log file sync waits ?

I have a system that ignores my parameter for 32mb log file size and sets it to 
128mb, due to the granule size.  The workaround is to lower the granule size so 
that I can manually set the redo log buffer lower.  But I am not sure if this 
will have any positive affect on log file sync times.  Normal I/O for log file 
writes is fast at 2-4 ms, but log file syncs are bad at 50 to 150 ms.
--
//www.freelists.org/webpage/oracle-l



--
//www.freelists.org/webpage/oracle-l


Other related posts: