RE: Session With oper EXCL is also waiting - where to now? (systemstate dump)

  • From: <Christopher.Taylor2@xxxxxxxxxxxx>
  • To: <tanel@xxxxxxxxxxxxxx>
  • Date: Wed, 19 Sep 2012 16:16:12 -0500

Yessir - I was already reading that note id 843638.1 (LCK stuck) and it is 
indeed waiting as well as MMNL, CJQ etc etc.
Can you clarify this for me:
"But in this case it may be just MMAN (still) shrinking shared pool. Check the 
session state object just under that process (search for "(session)") and see 
which session/user it is."

(Thanks for working with me on this - learning a LOT today)

For PROCESS 152 (the holder holding those 3 latches), if I search for session 
from there I get:

    (session) sid: 5403 trans: (nil), creator: 0xa7aba8578, flag: (40000045) 
USR/- BSY/-/-/-/-/-
              DID: 0001-0098-00000E58, short-term DID: 0001-0098-00000E59
              txn branch: (nil)
              oct: 3, prv: 0, sql: 0xa11f65618, psql: (nil), user: 92/CONCUITY

    service name: /* redacted */
    O/S info: user: /* redacted */, term: unknown, ospid: , machine: /* 
redacted */
              program: JDBC Thin Client
    application name: JDBC Thin Client, hash value%46894660
    last wait for 'latch: shared pool' blocking sess=0x(nil) seq318 wait_timeW 
seconds since wait started=3
                address`0fab30, numberÖ, tries=0
    Dumping Session Wait History
     for 'latch: shared pool' count=1 wait_timeW
                address`0fab30, numberÖ, tries=0
     for 'latch: shared pool' count=1 wait_time&
                address`0fab30, numberÖ, tries=0
     for 'latch: row cache objects' count=1 wait_time7
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeY5844
                address¥b6bd2a0, numberÈ, tries=0
     for 'latch: shared pool' count=1 wait_time?
                address`0fab30, numberÖ, tries=0
     for 'latch: shared pool' count=1 wait_timee
                address`0fab30, numberÖ, tries=0
     for 'latch: row cache objects' count=1 wait_time16
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeE7111
                address¥b6bd2a0, numberÈ, tries=0
     for 'latch: row cache objects' count=1 wait_timeq
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeR7243
                address¥b6bd2a0, numberÈ, tries=0
    temporary object counter: 0
      ----------------------------------------

Chris


From: tanel@xxxxxxxxxx [mailto:tanel@xxxxxxxxxx] On Behalf Of Tanel Poder
Sent: Wednesday, September 19, 2012 4:02 PM
To: Taylor Christopher - Nashville
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Session With oper EXCL is also waiting - where to now? 
(systemstate dump)

This process was holding 3 different latches at the time of dumping (shared 
pool simulator, shared pool, row cache objects):


      holding    (efd) a5b747220 Child shared pool simulator level=8 child# 
<---
        Location from where latch is held: kglsim_scan_lru: scan:
        Context saved from call: 0
        state=free, wlstate=free
      holding    (efd) 600fab30 Child shared pool level=7 child#=1
        Location from where latch is held: kghfrunp: alloc: wait:
        Context saved from call: 0
        state=busy, wlstate=free
      holding    (efd) a5b6bd2a0 Child row cache objects level=4 child#
        Location from where latch is held: kghfrunp: clatch: wait:

And the reason why this process was holding the latch is "kghfrunp" function - 
Kernel Generic Heap-management FRee UNPinned chunks. It shows that the process 
was flushing out stuff from shared pool. This can happen for a few reasons - 
the normal reason is when just allocating space in shared pool and the suitable 
freelists are empty. But in this case it may be just MMAN (still) shrinking 
shared pool. Check the session state object just under that process (search for 
"(session)") and see which session/user it is.

Note that when searching for "kghfrunp row cache objects" in MOS, this came up:

LCK temporarily stuck waiting for latch 'Child row cache objects' [ID 843638.1]

This may be somewhat related too, worth checking out:

Bug 6368621  I/O slave creation time-outs, under extreme memory pressure

Tanel.


On Wed, Sep 19, 2012 at 11:34 PM, 
<Christopher.Taylor2@xxxxxxxxxxxx<mailto:Christopher.Taylor2@xxxxxxxxxxxx>> 
wrote:

Ah hah (see what it's waiting on...also any additional guidance from here?)

Here's the HOLDER

PROCESS 152:
  ----------------------------------------
  SO: 0xa7aba8578, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid2, calls cur/top: 0x6f9156810/0x6f8710648, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 163
              last post received-location: kqrbtm
              last process to post me: a7ab873c0 1 6
              last post sent: 0 0 24
              last post sent-location: ksasnd
              last process posted by me: a7ab873c0 1 6
    (latch info) wait_event=0 bits?
-->      holding    (efd) a5b747220 Child shared pool simulator level=8 
child# <---
        Location from where latch is held: kglsim_scan_lru: scan:
        Context saved from call: 0
        state=free, wlstate=free
      holding    (efd) 600fab30 Child shared pool level=7 child#=1
        Location from where latch is held: kghfrunp: alloc: wait:
        Context saved from call: 0
        state=busy, wlstate=free
      holding    (efd) a5b6bd2a0 Child row cache objects level=4 child#
        Location from where latch is held: kghfrunp: clatch: wait:


Here's the WAITS for the HOLDER
  Dumping Session Wait History
     for 'latch: shared pool' count=1 wait_timeW
                address`0fab30, numberÖ, tries=0
     for 'latch: shared pool' count=1 wait_time&
                address`0fab30, numberÖ, tries=0
     for 'latch: row cache objects' count=1 wait_time7
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeY5844
                address¥b6bd2a0, numberÈ, tries=0
     for 'latch: shared pool' count=1 wait_time?
                address`0fab30, numberÖ, tries=0
     for 'latch: shared pool' count=1 wait_timee
                address`0fab30, numberÖ, tries=0
     for 'latch: row cache objects' count=1 wait_time16
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeE7111
                address¥b6bd2a0, numberÈ, tries=0
     for 'latch: row cache objects' count=1 wait_timeq
                address¥b6bd2a0, numberÈ, tries=1
     for 'latch: row cache objects' count=1 wait_timeR7243
                address¥b6bd2a0, numberÈ, tries=0
    temporary object counter: 0

From: tanel@xxxxxxxxxx<mailto:tanel@xxxxxxxxxx> 
[mailto:tanel@xxxxxxxxxx<mailto:tanel@xxxxxxxxxx>] On Behalf Of Tanel Poder
Sent: Wednesday, September 19, 2012 3:26 PM
To: Taylor Christopher - Nashville
Cc: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Re: Session With oper EXCL is also waiting - where to now? 
(systemstate dump)


Search for the address a5b6bd2a0 (case insensitive) in the systemstate dump 
file. If some process still happened to be holding that latch while the dump 
read its state object, it should dump out which one it was (and which session 
state object was under it).


But allowing shared pool to automatically shrink is a bad idea in 
super-critical latency-sensitive (OLTP) systems ... better "waste" some extra 
memory for shared pool instead of having to live with CPU usage and share pool 
latch/library cache mutex contention spikes...

Of course in your case the symptom was even more extreme (a hang), which may be 
due to a bug ...

--

Tanel Poder

Blog - http://blog.tanelpoder.com

App  - http://voic.ee

On Wed, Sep 19, 2012 at 11:01 PM, 
<Christopher.Taylor2@xxxxxxxxxxxx<mailto:Christopher.Taylor2@xxxxxxxxxxxx>> 
wrote:

(apologies for bad formatting - I'm trying to keep it simple tho)
(pastebin is blocked, but used gist https://gist.github.com/3751878)
3-node RAC 10.2.0.4

We had a "hang" last night from 2:50 to ~3:48 and Oracle was kind enough to 
provide me with an automated system state dump.
(the bottom of this email is where I'm confused - where do I go from here?)

Looking at the ASH and AWR reports for the 2 hour window:

I see:
1.) An SGA Resize occurred:
Buffer Cache 30,608M increases to 30,656M
Shared Pool Size 5,136M *decreases* to 5,088M



--
//www.freelists.org/webpage/oracle-l


Other related posts: