RE: Index Contention / Sequence Caching

  • From: "Uzzell, Stephan" <SUzzell@xxxxxxxxxx>
  • To: 'Jonathan Lewis' <jonathan@xxxxxxxxxxxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Sat, 15 Mar 2014 12:09:31 +0000

Jonathan,

Thank you for clarifying. I apparently owe my developer an apology.

We do have partitioning, so the cost of licensing that is not a factor. That 
being said, is there a way of modeling or predicting the impact of either of 
these changes other than trying them?

Changing the sequence cache value seems a smaller change - less likelihood of 
unintended consequences - to me, but I've been wrong once already on this...

Thanks,
stephan

Stephan Uzzell

From: Jonathan Lewis [mailto:jonathan@xxxxxxxxxxxxxxxxxx]
Sent: Saturday, 15 March, 2014 05:42
To: Uzzell, Stephan; oracle-l@xxxxxxxxxxxxx
Subject: RE: Index Contention / Sequence Caching


The "enq: TX - Index Contention" can be a sign of an earlier large-scale delete 
that results in excessive failure (statistic: "recursive aborts on index block 
reclamation") on index block splts; but you're if seeing a regular stream of 
small waits then you and your developers may both be on the right track.

Unless the sequence is declared with the ORDER option, then each RAC instance 
will be operating in a different range of values of size "CACHE".  With a cache 
size of 1,000 instance 1 would be inserting values (e.g.) 1,001 to 2,000 while 
instance 2 would be inserting values  2,001 and 3000, and instance 3 would be 
inserting values of 3,001 to 4000.  Apart from a brief collision as each 
instance exhausted its cache and bounced the sequence highwater mark the 
instances would probably be using 3 independent leaf blocks most of the time.  
With a cache size of 20 all three instances would probably be inserting into 
the same block constantly.  If I were to adopt their solution, I'd make the 
cache size more like 10,000.

Your solution, as you say, is another way to reduce a hot spot by introducing 
multiple insertion points in the index - this would mean that all the "high 
value" blocks would be flying around the interconnect all the time, but the 
rate of collision would drop in proportion to the rate at which you were using 
sequence numbers.  (You'd still want to have a larger cache size, though, if 
the sequence was busy as the seq$ block holding that sequence definition would 
also have to travel around the interconnect each time an instance exhausted its 
cache).

Whichever solution you adopt there are side effects on the space utilisation in 
the index. If you set a very large cache size then N-1 out of N instances would 
do 50/50 leaf node splits while the "current top" one would do 90/10;  with the 
partitioning choice the you would probably get a slightly more randomised 
effect across the partitions leading to a better use of space for the same 
cache size.  It's quite hard to predict, since it depends on choice of cache 
size, number of partitions, and pattern of usage.  Of course, a key factor may 
be the cost of the partitioning option if you haven't already paid for it.



Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx> 
[oracle-l-bounce@xxxxxxxxxxxxx] on behalf of Uzzell, Stephan 
[SUzzell@xxxxxxxxxx]
Sent: 15 March 2014 02:10
To: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Cc: Uzzell, Stephan
Subject: Index Contention / Sequence Caching
Hi all,

I'm having a slight disagreement with my developers, and hoping you guys can 
help me out (either confirm what I think or tell me I'm flat-out wrong, I can 
take either, I just need to get this fixed).

One of our applications is suffering pretty severe waits on a particular 
insert. From querying the waits from gv$active_session_history for *one* second:

14-MAR-14 11.11.00.041 AM      826usupm5pkwu gc buffer busy                   
13557923         13
14-MAR-14 11.11.00.041 AM      826usupm5pkwu enq: TX - index contention       
13557923         19
14-MAR-14 11.11.00.142 AM      826usupm5pkwu gc buffer busy                   
13557923         49
14-MAR-14 11.11.00.142 AM      826usupm5pkwu enq: TX - index contention       
13557923         53
14-MAR-14 11.11.00.041 AM      826usupm5pkwu enq: TX - index contention         
    -1         67
14-MAR-14 11.11.00.483 AM      826usupm5pkwu gc buffer busy                   
13557923        109

To me, this spells classic hot block on object 13557923 - which happens to be 
the PK for a table where we had a lot of waits happening. My idea to resolve is 
to partition that index to split out the contention - instead of all sessions 
trying to grab that last block, the sessions will have X blocks to work with, 
depending on the degree of partitioning.

Frankly, I didn't quite understand development's response (which may indicate a 
lack in my knowledge, or may indicate that they're not talking Oracle terms). 
They want to increase the caching for the sequence that they use to populate 
the PK from 20 to 1000, because:

"The fact that the biggest waits were index contention and gc buffer busy waits 
indicates that the contention is mostly between RAC nodes trying to lock the 
same index page. Increasing cache from 20 to 1000 (assuming that the default 
page is 8K and a PK column size is 8 bytes) will cause multiple nodes to work 
with their own index pages. There still will be contention between sessions 
running on the same node due to high number of CPUs (48) on each node. But in a 
short term increasing sequence cache might decrease contention up to three 
times (it's a 3 node RAC)"
I'm fairly certain that the gc buffer busy waits were because sessions trying 
to grab that index block were spread among multiple nodes, but I really don't 
understand how increasing the sequence cache value should help that - no matter 
what's cached in memory, the same block is going to be the hot spot, no?
Short version: 1) am I crazy? 2) is the developer who said that crazy? 3) what 
the heck is object "-1" 4) if you were confronted with waits like that - any 
immediate thoughts for resolving other than partitioning that PK index?
Oracle 10.2.0.4 on Windows 2003 x64, in case that matters.
Thanks!
stephan


Stephan Uzzell

Other related posts: