Re: systemstate dump

I only now saw this post. Sorry for being late. There is an oracle
bug in 8i with DLM. To make the long story short, a global enqueue
"loses" owner, and, consequently, cannot be relased. That causes all
kinds of beautiful events. Here is what I'm talking about and what Mogens
would call CTNOM:

Article-ID:         <Note:157766.1> 
Circulation:        PUBLISHED (EXTERNAL) 
Folder:             server.HA.RAC 
Topic:              Performance, Tuning and Hanging Issues 
Title:              Sessions Wait Forever for 'global cache cr request' Wait

                    Event in OPS or RAC 
Document-Type:      PROBLEM 
Impact:             MEDIUM 
Skill-Level:        NOVICE 
Server-Version:     08.01 to 09.00 
Updated-Date:       18-FEB-2003 11:35:45 
References:         <BUG:1916409> 
Shared-Refs: 
Authors:            MPOLASKI.US 
Attachments:        NONE 
Content-Type:       TEXT/PLAIN 
Products:           5/RDBMS; 
Platforms:          GENERIC; 


Problem Description 
------------------- 


Sessions hang indefinately in an OPS environment.  Session(s) wait forever
for 
the 'global cache cr request' wait event.  Other sessions could end up
waiting 
for the 'buffer busy due to global cache' or 'global cache lock busy' wait
events. 
  


Solution Description 
-------------------- 


This is a known issue (Bug 1916409).  To immediately resolve the hang all 
instances will have to be restarted. 


To permanently correct the problem upgrade to a fixed release or apply 
necessary patches available on Metalink. 


Fixed Versions: 


        - 8.1.6.3 with patch for bug 1916409 
        - 8.1.7.2 with patch for bug 1916409 
        - 8.1.7.3 
        - 9.0.1.1 with patch for bug 1916409 
        - 9.0.1.2 
        - 9.2 or higher 
  


Explanation 
----------- 


Due to Bug 1916409 a PCM lock with the expired timeout might not be ever 
cancelled. 


Possible Symptoms: 


- The 'WAITING SESSIONS' section of an opsdiag.sql output (Note 135714.1)
shows 
  a session waiting for 'global cache cr request' with a high value for
'seconds' 


- The 'PCM LOCK WAITERS' section of an opsdiag.sql output (Note 135714.1)
shows 
  a lock upgrading (for example: granted in 'Null' mode and requesting
'Share' 
  mode with a high value for 'sec' (seconds)). 


- 'global cache cr request' wait event never finishes.  Seq# and 
seconds_in_wait continue to increase. 


        select sid, event, seq#, seconds_in_wait 
        from gv$session_wait 
        where event like 'global cache%'; 


- The lock dump (from an lkdebug dump or systemstate dump) for the offending

  PCM lock has On_timer_q? : Y and a negative timeout value.  Example: 


      ----------lock-------------------------- 
lock version     : 10165 
Owner node       : 0 
grant_level      : KJUSERNL            <-- Granted in Null Mode 
req_level        : KJUSERPR            <-- Requesting Protected Read 
bast_level       : KJUSERNL 
notify_func      : 0 
resp             : 0x99dd5174 
procp            : 0x9866e644 
remote_lockp     : 0x9916d114 
pid              : 5012 
proc version     : 418 
oprocp           : 0x9866e644 
opid             : 5012 
gid              : 1 
xid              : 0 0 
crstate          : 5 
dd_time          : 0.0 secs 
dd_count         : 0 
timeout          : -26786.0 secs       <-- Negative timeout value 
On_timer_q?      : Y                   <-- On Timer Queue 
On_dd_q?         : N 
lock_state       : KJUSEROPENING 
Open Options     : KJUSERPERSISTENT 
Convert options  : KJUSERGETVALUE 
History          : 0x231726cd 
Msg_Seq          : 0x0 


- 'buffer busy due to global cache' or 'global cache lock busy' wait events 
appearing on the system (although this is more of a side effect). 
  

References 
---------- 


Bug 1916409 
  


Additional Search Words 
----------------------- 


OPS RAC Hang Global Cache cr request buffer busy timeout 
@ OPS RAC Hang Global Cache cr request buffer busy timeout 
@ OPS RAC Hang Global Cache cr request buffer busy timeout 
@ OPS RAC Hang Global Cache cr request buffer busy timeout 


So, please,  check the note on Metalink.
 


On 04/28/2004 12:32:01 AM, "Mohan, Ross" wrote:
> here is a bleary-eyed and largely uninformed comment. 
>  
> I once saw an OPS cluster screw up badly due to inter machine networking 
> considerations. 
>  
> I would look at the DLM latching, intermachine networking word size,
> latencies, 
> and collisions. 
>  
> I think you are digging a deep hole, when what you need is an effective
> tunnel. 
>  
> Bon Chance
> -----Original Message----- 
> From: Henry Poras [mailto:hporas@xxxxxxxxxxx] 
> Sent: Wed 4/28/2004 2:08 AM 
> To: oracle-l@xxxxxxxxxxxxx 
> Cc: 
> Subject: systemstate dump
> 
> 
> 
> We have an Oracle/PeopleSoft implementation (8.1.7 on Solaris) which on
> occasion stops any new logons from happening. Performance for existing
> sessions is so slow as to be non-existent. This is a two instance OPS system
> with the second instance existing solely for failover. The data I have
> gathered comes from three sources:statspack snapshots at 15 minute
> intervals, some 10046 traces (level 12) which have since been deleted,
> systemstate dump. 
> 
> >From statspack, I compared the 15 (closer to 20) minute snapshot where the
> issue arose to the 15 minute snapshot just prior to that. What I found was:
> 
> -CPU time almost unchanged 
> -a huge change in the Total Wait Time (cs) for the following events (the
> number of waits were all within an order of 2 of each other except for latch
> free which jumped from 200 to 150,000).
> 
>      NAME                              TIME-initial (cs)          TIME-final
> (cs) 
>      latch free                           200
> 2,800,000 
>      enqueue                            900
> 280,000 
>      global cache lock busy       8,200                        30,000 
>      library cache pin                500                           18,000 
>      row cache lock                  600                           15,000 
>      library cache lock              600                            8,000 
>      DFS lock handle                30                             5,000 
>      PX Deq: Execution Msg      --                              5,000 
> 
> -latch sleeps skyrocketed for library cache, shared pool, and row cache
> objects with many having at least 4 sleeps. The 'dlm resource hash list'
> latch had about the same number of sleeps, but the number of gets increased
> by ~x10.
> 
> For the 10046 traces (taken a few weeks ago) I had a chance to look through
> a few of the multitude before they were deleted. 
> 
> -Lots of latch free:library cache, shared pool, and row cache object waits.
> The library cache wait always pointed to the same address (the same 1 of 5
> children).
> 
> -Increase in library cache lock and library cache pin. These are pointing to
> the same handle address. Also an increase in the library cache pin instance
> lock enqueue.
> 
> My working model is that the latch free (library cache) waits are a symptom.
> Something is holding a library cache pin/lock and another process wants it.
> Get a latch. Someone else wants it. Latch contention. Latch contention
> builds. Spin, sleep, another process, more latch contention, ... Take a
> systemstate dump to see what is happening. 
> 
> >From systemstate dump. (I thought I once saw a document on metalink to help
> read these things, but I can't find it. Is that FTMN??) I searched on the
> addresses of all 5 library cache latches. All Processes which were holding a
> library cache latch were waiting on a shared pool latch (address 8000ebec).
> Does a process State Object hold a latch independent from a transaction?
> Don't know where to go from here. The beginning of the systemstate dump for
> the process which holds the shared pool latch follows:
> 
> PROCESS 23: 
>   ---------------------------------------- 
>   SO: d9523624, type: 1, owner: 0, pt: 0, flag: INIT/-/-/0x00 
>   (process) Oracle pid=23, calls cur/top: d960d7d4/d960d7d4, flag: (0) - 
>             int error: 0, call error: 0, sess error: 0, txn error 0 
>   (post info) last post received: 3648316968 0 1 
>               last post received-location: kslfre 
>               last process to post me: d9522cd0 4 0 
>               last post sent: 2147544044 0 1 
>               last post sent-location: kslfre 
>               last process posted by me: d9525220 41 0 
>     (latch info) wait_event=0 bits=90 
>       holding     8000ebec shared pool level=7 
>         Location from where latch is held: kghfrunp: alloc: clatch nowait: 
>         Context saved from call: 0 
>         state=busy 
>       holding     8000c114 row cache objects level=4 
>         Location from where latch is held: kghfrunp: clatch: nowait: 
>         Context saved from call: 0 
>         state=busy 
>     Process Group: DEFAULT, pseudo proc: d955952c 
>     O/S info: user: oracle, term: UNKNOWN, ospid: 5345 
>     OSD pid info: 5345 
>     ---------------------------------------- 
>     SO: d95991a0, type: 3, owner: d9523624, pt: 0, flag: INIT/-/-/0x00 
>     (session) trans: 0, creator: d9523624, flag: (41) USR/- BSY/-/-/-/-/- 
>               DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000 
>               txn branch: 0 
>               oct: 47, prv: 0, user: 20/SYSADM 
>     O/S info: user: dp943, term: PHSPSPAPP8, ospid: 2028:3424, machine:
> PARTNERS\PHSPSPAPP8 
>               program: PSAPPSRV.exe 
>     client info: ,dp943,PHSPSPAPP8,,PSAPPSRV.exe, 
>     last wait for 'latch free' blocking sess=0x0 seq=14093 wait_time=-1 
>                 address=d974e628, number=6a, tries=1 
>       ---------------------------------------- 
>       SO: d97c6674, type: 36, owner: d95991a0, flag: INIT/-/-/0x00 
>       LIBRARY OBJECT PIN: pin=d97c6674 handle=da02bb1c mode=X lock=d97d5c84 
>       user=d95991a0 session=d95991a0 count=0 mask=007d savepoint=68
> flags=[00] 
>       ---------------------------------------- 
>       SO: d97d5c84, type: 35, owner: d95991a0, flag: INIT/-/-/0x00 
>       LIBRARY OBJECT LOCK: lock=d97d5c84 handle=da02bb1c mode=N 
>       call pin=0 session pin=d97c6674 
>       user=d95991a0 session=d95991a0 count=1 flags=PNS/[08] savepoint=68 
>       LIBRARY OBJECT HANDLE: handle=da02bb1c 
>       namespace=CRSR flags=RON/KGHP/PN0/[10010000] 
>       kkkk-dddd-llll=0000-0000-0000 lock=N pin=X latch=2 
>       lwt=da02bb34[da02bb34,da02bb34] ltm=da02bb3c[da02bb3c,da02bb3c] 
>       pwt=da02bb4c[da02bb4c,da02bb4c] ptm=da02bba4[da02bba4,da02bba4] 
>       ref=da02bb24[cac6f6d8,cac6f6d8] lnd=da02bbb0[da02bbb0,da02bbb0] 
>         LIBRARY OBJECT: object=dadfc344 
>         type=CRSR flags=EXS/BCM[0001] pflags=NST [101] status=VALD load=0 
>         DEPENDENCIES: count=4 size=16 
>         ACCESSES: count=1 size=16 
>         TRANSLATIONS: count=1 size=16 
>         DATA BLOCKS: 
>         data#     heap  pointer status pins change 
>         ----- -------- -------- ------ ---- ------ 
>             0 c9cee16c dadfc1b4 I/P/A     0 NONE  
>             2 dadfc3d8 c9c4bb3c I/P/A     1 NONE  
>             3 dadfc438        0 -/P/-     1 NONE  
>             4 dadfc498 cf49aef0 I/P/A     1 NONE  
>             5 dadfc104        0 -/P/-     1 NONE  
>             6 dadfc164 cb43e3e4 I/P/A     1 NONE  
>       ---------------------------------------- 
>       SO: d97cc9cc, type: 35, owner: d95991a0, flag: INIT/-/-/0x00 
>       LIBRARY OBJECT LOCK: lock=d97cc9cc handle=cac6f768 mode=N 
> 
> 
> Any ideas are welcome. If you made it this far, thanks for the perseverance
> and time you already invested. 
> 
> Henry 
> 
> 
> ---------------------------------------------------------------- 
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> <http://www.orafaq.com>  
> ---------------------------------------------------------------- 
> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx 
> put 'unsubscribe' in the subject line. 
> -- 
> Archives are at http://www.freelists.org/archives/oracle-l/
> <http://www.freelists.org/archives/oracle-l/>  
> FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
> <http://www.freelists.org/help/fom-serve/cache/1.html>  
> ----------------------------------------------------------------- 
> 
> 
> 
> -- Binary/unsupported file stripped by Ecartis --
> -- Type: application/ms-tnef
> 
> 
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
> put 'unsubscribe' in the subject line.
> --
> Archives are at http://www.freelists.org/archives/oracle-l/
> FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------
> 

-- 
Mladen Gogala
Oracle DBA
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: