Re: Hot index block split on a very busy table--what is the impact

  • From: "Zhu,Chao" <zhuchao@xxxxxxxxx>
  • To: "Tanel Poder" <tanel.poder.003@xxxxxxx>, jonathan@xxxxxxxxxxxxxxxxxx
  • Date: Wed, 4 Oct 2006 21:51:21 +0800

Thanks very much to Tanel, Jonathan, and others whoe reviewed the issue.

To give an update on this issue, it turned out to be an oracle bug in
implementing the _db_block_hash_latches and spin_count is set to
spin_count*cpu_count internally by oracle. And another problem on scan the
index structure which I don't quite clear yet.

We applied a patch and the issue is gone now.

On 9/2/06, Zhu,Chao <zhuchao@xxxxxxxxx> wrote:



On 8/31/06, Tanel Poder <tanel.poder.003@xxxxxxx> wrote: > > You're on 9i or higher, right? > > One possibility is that this is caused by shared CBC latching > peculiarities: > > 1) during normal selects your index root block can be examined under a > shared cache buffers chains latch. > So as long as everybody is only reading the index root block, everybody > can do it concurrently (without pinning the block). The "current holder > count" in the CBC latch structure is just increased by one for every read > only latch get and decreased by one on every release. 0 value means that > nobody has this latch taken currently. > > Nobody has to wait for others for reading index root block in all read > only case. That greatly helps to combat hot index root issues. > > 2) Now if a branch block split happens a level below the root block, the > root block has to be pinned in exclusive mode for reflecting this change in > it. In order to pin a block you need to get the corresponding CBC latch in > exclusive mode. > If there are already a bunch of readers on the latch, then the exclusive > latch getter will just flip a bit in the CBC latch structure - stating it's > interest for exclusive get. > > Every read only latch get will check for this bit, if it's set, then the > getters will just spin instead, waiting this bit to be cleared (they may > yield or sleep immediately as well, I haven't checked). Now the exclusive > getter has to spin/wait until all the shared getters have released the latch > and the "current holder count" drops to zero. Once it's zero (and the getter > manager to get on to CPU) it can get the latch, do its work and release the > latch. > > During all that time starting from when the "exclusive interest" bit was > set, nobody could access this indexes root block except the processes which > already had the latch in shared mode. Depending on latch spin/sleep strategy > for this particular case and OSD implementation, this could mean that all > those "4000 readers per second" start just spinning on that latch, causing > heavy spike in CPU usage and they all queue up. > > > How do diagnose that: > > You could sample v$latch_misses to see whether the number of "kcbgtcr: > kslbegin shared" nowaitfails/sleeps counter takes an exceptional jump up > once you observe this hiccup. > > > How to fix that once diagnosed: > > The usual stuff, like partitioning if possible or creating a single > table hash cluster instead. > > If you see that the problem comes from excessive spinning, think about > reducing the spinning overhead (by reducing _spin_count for example). This > could affect your other database functions though.. > > If you can't do the above - then if you have off-peak time, then analyse > indexes (using treedump for start) and if you see a block split coming in a > branch below root block, then force the branch block to split during > off-peak time by inserting carefully picked values into the index tree, > which go exactly in the range which cause the proper block to split. Then > you can just roll back your transaction - the block splits are not rolled > back nor coalesced somehow, as this is done in a separate recursive > transaction. > > Tanel. > > > ------------------------------ > *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto: > oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Zhu,Chao > *Sent:* Wednesday, August 30, 2006 21:53 > > *To:* oracle-l@xxxxxxxxxxxxx > *Subject:* Hot index block split on a very busy table--what is the > impact > > hi, all, > We have a few database with very high concurrent access busy tables, > that some indexes of the busy tables could be accessed 1.5K-4K times per > second. (PK lookup) > I am wondering , when such index got block split (at the root, or > branch level), what will be the impact on the system. > Index block split is said to be an expensive operation, during the > block split on branch/root block, what the other sessions that is doing > select based on this index, be waiting on? There is a wait event named: > Index Block Split with p1,p2,p3 pointing to block address, level. > Document "Description of oracle 7 Wait event and enqueue" says it > will yield CPU. > We have a few production incident when load suddenly jump from 10, to > several hundred, and during that time nothing can be done. From our > monitoring tools, it is mainly "latch free" contention inside oracle, and it > is cache buffer chains. The load spike matches the oracle description of > block split, but the wait event does not match. And because the load spike > time is very short, we can't capture the actual wait event/p1,p2,p3 during > the exact time of the load spike time. > Anyone have similar problem/insight into this issue? > Is there some v$ view to track, (v$segstat does not) have log about > which index go through how many time block split? Or any other effecitve > way? > > > -- > Regards > Zhu Chao > www.cnoug.org >



--
Regards
Zhu Chao
www.cnoug.org




-- Regards Zhu Chao www.cnoug.org

Other related posts: