RAC Re: CBC Latch

  • From: "John Kanagaraj" <john.kanagaraj@xxxxxxx>
  • To: <racdba@xxxxxxxxxxxxx>
  • Date: Wed, 30 Nov 2005 11:36:18 -0800

Ravi,
 
If hot *data* blocks are an issue and the application contends while
updating different rows all possibly in the same block, then you could
artificially split out the table to have *one* row per block by padding
each row with blank CHAR (not VARCHAR!!) columns. If these sessions
update the same row in a block, this does not work of course.
 
A very sarcastic person once wrote thus about RAC (paraphrasing): "RAC
is like an audio amplifier. If the application is well designed, it
enables good scaling. If not, you will end up with RAC amplifying the
design flaws".
 
John Kanagaraj <><
DB Soft Inc
Phone: 408-970-7002 (W)
 
Co-Author: Oracle Database 10g Insider Solutions
http://www.samspublishing.com/title/0672327910
 
** The opinions and facts contained in this message are entirely mine
and do not reflect those of my employer or customers **

________________________________

From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
On Behalf Of Ravi_Kulkarni@xxxxxxxx
Sent: Wednesday, November 30, 2005 5:53 AM
To: racdba@xxxxxxxxxxxxx
Subject: RAC Re: CBC Latch


John , Anand , Harish ,
 
Thank you all for the responses. 
 
We have recreated both tables and indexes with high concurrency with
increased pctfree (Indentified objects using script from KG's Tuning
book, contention is mostly on tables and not on indexes). Good news is
the focus is on only two tiny tables. But these happen to be used in
multiple,concurrent queries which are most likely to hit the same buffer
chain. Although , sometimes I am seeing many sessions waiting on
'different chains (hladdrs)' at the same time.
 
Is there any optimization we can do given the fact that CBC is
concentrated at kcbgtcr ? 
Thoughts about reverse-key indexes on the pk ? Caveats using multiple
blocksizes ?
 
There is a parallel effort to redesign the app to reduce concurrency ,
but that is in the long term.
We are around 75% CPU during peak load.
We are forcing each app component to connect to each instance to reduce
global CR requests - to mitigate the 'rac' overhead.
We are on 10.1.0.4 and most likely not hitting the bug since I am seeing
evidence of 'almost' even chain lengths - unlike what the bug states.
 
Thanks again ,
 
Regards,
Ravi.

________________________________

From: racdba-bounce@xxxxxxxxxxxxx [mailto:racdba-bounce@xxxxxxxxxxxxx]
On Behalf Of Anand Rao
Sent: Wednesday, November 30, 2005 12:19 AM
To: racdba@xxxxxxxxxxxxx
Subject: RAC Re: CBC Latch


Hi Ravi,

Looks like you have already got the best answers for the question. 

can you check what is value of _spin_count? if you using the default
value, try and increase it if you have enough CPU cycles. I know this is
against the 'general tuning recommendation' but there are cases when it
may help. You need to have a good amount of free CPU cycles. if you are
already running at 80% or more CPU load, then don't increase it.

increasing buckets and/or hash latches is really a last resort and when
you just cannot do anything about the SQLs that are primarily causing
the LIO.

You really need to check the SQL which is chasing these 'hot blocks or
rows' and figure out the data access pattern. That is the exact fix you
are looking for from a long term perspective.

As a workaround, moving the table/index or re-organise the data (delete
the data and reinsert by ROWID), Probably, you could get lucky and the
'hot blocks' can get hashed to another cache buffer chain (therefore, it
could be protected by some other latch). This could provide some relief.
But again, if the application starts chasing this new latch, you can't
do much than fix the SQL!

You have rebuild the Indexes too with a higher pctfree value, not just
the tables. Once you have identified the objects, rebuild them with
large pctfree (at the expense of slower full table scans and more
storage if the objects are huge). That should help. smaller block sizes
can also help.

lastly, i really wouldn't discount bug: 3611471. It was fixed on 9.2.0.4
as a backport and possibly made it to the 9.2.0.5 and 10.1.0.3 patchset.

cheers
anand


On 29/11/05, John Kanagaraj <john.kanagaraj@xxxxxxx > wrote: 

        Ravi,
         
        I am by no means a RAC expert, but CBC latch contention is due
to excessive, concurrent LIOs against a set of objects. This condition
is a problem on non-RAC instances, but gets magnified on RAC due to
cross-instance block consistency issues. Most of the time (and I hope I
am not generalizing this too much), this is due to tight loops on
indexed reads. While the application team should tune and reduces LIOs,
can you also parallely look at which SQL is causing this and trace it?
Please be aware that some 'data pattern' issues can be handled outside
the code via the judicious use of histograms. (Look at my paper at OAUG
CP 2005 or SELECT for an example of this - I have had multiple people
come back to let me know that this fixed a number of vexing issues...)
         
        Along with moving tables to smaller blocksizes, you might also
want to look at moving *indexes* to smaller blocksizes to spread out
'hot' data onto multiple blocks. If Indexed reads are the majority of
your LIOs, this might not help as the root/specific branch blocks may
still be 'hot'....
         
        Regards,
        
        John Kanagaraj <><
        DB Soft Inc
        Phone: 408-970-7002 (W)
         
        Co-Author: Oracle Database 10g Insider Solutions
http://www.samspublishing.com/title/0672327910
         
        ** The opinions and facts contained in this message are entirely
mine and do not reflect those of my employer or customers **
         

________________________________

        From: racdba-bounce@xxxxxxxxxxxxx
[mailto:racdba-bounce@xxxxxxxxxxxxx] On Behalf Of Ravi_Kulkarni@xxxxxxxx
        Sent: Tuesday, November 29, 2005 6:04 AM
        To: racdba@xxxxxxxxxxxxx
        Subject: RAC CBC Latch
        
        

        List, 

        Any pointers on how to reduce contention on CBC latch ? 
        I am seeing sessions waiting on multiple hladdr's too. 
        Recreate tables with Hi-pctfree helped but need much more
fine-tuning. (app team has been doing their bit on reducing logical
reads)

        How to surely be able to tell the precise fix is (from where it
is waited on) (increasing hash buckets, moving tables to smaller
blocksizes )

        10.1.0.4 (2Node) / RH 3.0. 


        
NoWait                  Waiter 
        Latch Name               Where                       Misses
Sleeps   Sleeps 
        ------------------------ -------------------------- -------
---------- -------- 
        cache buffer handles     kcbzfs
0          8        6 
        cache buffers chains  kcbgtcr: kslbegin excl           0
91,563   67,822 
        cache buffers chains  kcbgtcr: fast path                  0
57,384   76,694 
        cache buffers chains  kcbrls: kslbegin                     0
25,944   31,763 
        cache buffers chains     kcbchg: kslbegin: bufs not       0
942       35 
        cache buffers chains     kcbxbh
0        391       32 
        cache buffers chains     kcbzgb: scan from tail. no        0
101        0 




        Thanks,
        Ravi. 


Other related posts: