RE: systemstate dump

  • From: "Mohan, Ross" <RMohan@xxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 28 Apr 2004 00:32:01 -0400

here is a bleary-eyed and largely uninformed comment. 
 
I once saw an OPS cluster screw up badly due to inter machine networking 
considerations. 
 
I would look at the DLM latching, intermachine networking word size,
latencies, 
and collisions. 
 
I think you are digging a deep hole, when what you need is an effective
tunnel. 
 
Bon Chance
-----Original Message----- 
From: Henry Poras [mailto:hporas@xxxxxxxxxxx] 
Sent: Wed 4/28/2004 2:08 AM 
To: oracle-l@xxxxxxxxxxxxx 
Cc: 
Subject: systemstate dump



We have an Oracle/PeopleSoft implementation (8.1.7 on Solaris) which on
occasion stops any new logons from happening. Performance for existing
sessions is so slow as to be non-existent. This is a two instance OPS system
with the second instance existing solely for failover. The data I have
gathered comes from three sources:statspack snapshots at 15 minute
intervals, some 10046 traces (level 12) which have since been deleted,
systemstate dump. 

From statspack, I compared the 15 (closer to 20) minute snapshot where the
issue arose to the 15 minute snapshot just prior to that. What I found was:

-CPU time almost unchanged 
-a huge change in the Total Wait Time (cs) for the following events (the
number of waits were all within an order of 2 of each other except for latch
free which jumped from 200 to 150,000).

     NAME                              TIME-initial (cs)          TIME-final
(cs) 
     latch free                           200
2,800,000 
     enqueue                            900
280,000 
     global cache lock busy       8,200                        30,000 
     library cache pin                500                           18,000 
     row cache lock                  600                           15,000 
     library cache lock              600                            8,000 
     DFS lock handle                30                             5,000 
     PX Deq: Execution Msg      --                              5,000 

-latch sleeps skyrocketed for library cache, shared pool, and row cache
objects with many having at least 4 sleeps. The 'dlm resource hash list'
latch had about the same number of sleeps, but the number of gets increased
by ~x10.

For the 10046 traces (taken a few weeks ago) I had a chance to look through
a few of the multitude before they were deleted. 

-Lots of latch free:library cache, shared pool, and row cache object waits.
The library cache wait always pointed to the same address (the same 1 of 5
children).

-Increase in library cache lock and library cache pin. These are pointing to
the same handle address. Also an increase in the library cache pin instance
lock enqueue.

My working model is that the latch free (library cache) waits are a symptom.
Something is holding a library cache pin/lock and another process wants it.
Get a latch. Someone else wants it. Latch contention. Latch contention
builds. Spin, sleep, another process, more latch contention, ... Take a
systemstate dump to see what is happening. 

From systemstate dump. (I thought I once saw a document on metalink to help
read these things, but I can't find it. Is that FTMN??) I searched on the
addresses of all 5 library cache latches. All Processes which were holding a
library cache latch were waiting on a shared pool latch (address 8000ebec).
Does a process State Object hold a latch independent from a transaction?
Don't know where to go from here. The beginning of the systemstate dump for
the process which holds the shared pool latch follows:

PROCESS 23: 
  ---------------------------------------- 
  SO: d9523624, type: 1, owner: 0, pt: 0, flag: INIT/-/-/0x00 
  (process) Oracle pid=23, calls cur/top: d960d7d4/d960d7d4, flag: (0) - 
            int error: 0, call error: 0, sess error: 0, txn error 0 
  (post info) last post received: 3648316968 0 1 
              last post received-location: kslfre 
              last process to post me: d9522cd0 4 0 
              last post sent: 2147544044 0 1 
              last post sent-location: kslfre 
              last process posted by me: d9525220 41 0 
    (latch info) wait_event=0 bits=90 
      holding     8000ebec shared pool level=7 
        Location from where latch is held: kghfrunp: alloc: clatch nowait: 
        Context saved from call: 0 
        state=busy 
      holding     8000c114 row cache objects level=4 
        Location from where latch is held: kghfrunp: clatch: nowait: 
        Context saved from call: 0 
        state=busy 
    Process Group: DEFAULT, pseudo proc: d955952c 
    O/S info: user: oracle, term: UNKNOWN, ospid: 5345 
    OSD pid info: 5345 
    ---------------------------------------- 
    SO: d95991a0, type: 3, owner: d9523624, pt: 0, flag: INIT/-/-/0x00 
    (session) trans: 0, creator: d9523624, flag: (41) USR/- BSY/-/-/-/-/- 
              DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000 
              txn branch: 0 
              oct: 47, prv: 0, user: 20/SYSADM 
    O/S info: user: dp943, term: PHSPSPAPP8, ospid: 2028:3424, machine:
PARTNERS\PHSPSPAPP8 
              program: PSAPPSRV.exe 
    client info: ,dp943,PHSPSPAPP8,,PSAPPSRV.exe, 
    last wait for 'latch free' blocking sess=0x0 seq=14093 wait_time=-1 
                address=d974e628, number=6a, tries=1 
      ---------------------------------------- 
      SO: d97c6674, type: 36, owner: d95991a0, flag: INIT/-/-/0x00 
      LIBRARY OBJECT PIN: pin=d97c6674 handle=da02bb1c mode=X lock=d97d5c84 
      user=d95991a0 session=d95991a0 count=0 mask=007d savepoint=68
flags=[00] 
      ---------------------------------------- 
      SO: d97d5c84, type: 35, owner: d95991a0, flag: INIT/-/-/0x00 
      LIBRARY OBJECT LOCK: lock=d97d5c84 handle=da02bb1c mode=N 
      call pin=0 session pin=d97c6674 
      user=d95991a0 session=d95991a0 count=1 flags=PNS/[08] savepoint=68 
      LIBRARY OBJECT HANDLE: handle=da02bb1c 
      namespace=CRSR flags=RON/KGHP/PN0/[10010000] 
      kkkk-dddd-llll=0000-0000-0000 lock=N pin=X latch=2 
      lwt=da02bb34[da02bb34,da02bb34] ltm=da02bb3c[da02bb3c,da02bb3c] 
      pwt=da02bb4c[da02bb4c,da02bb4c] ptm=da02bba4[da02bba4,da02bba4] 
      ref=da02bb24[cac6f6d8,cac6f6d8] lnd=da02bbb0[da02bbb0,da02bbb0] 
        LIBRARY OBJECT: object=dadfc344 
        type=CRSR flags=EXS/BCM[0001] pflags=NST [101] status=VALD load=0 
        DEPENDENCIES: count=4 size=16 
        ACCESSES: count=1 size=16 
        TRANSLATIONS: count=1 size=16 
        DATA BLOCKS: 
        data#     heap  pointer status pins change 
        ----- -------- -------- ------ ---- ------ 
            0 c9cee16c dadfc1b4 I/P/A     0 NONE  
            2 dadfc3d8 c9c4bb3c I/P/A     1 NONE  
            3 dadfc438        0 -/P/-     1 NONE  
            4 dadfc498 cf49aef0 I/P/A     1 NONE  
            5 dadfc104        0 -/P/-     1 NONE  
            6 dadfc164 cb43e3e4 I/P/A     1 NONE  
      ---------------------------------------- 
      SO: d97cc9cc, type: 35, owner: d95991a0, flag: INIT/-/-/0x00 
      LIBRARY OBJECT LOCK: lock=d97cc9cc handle=cac6f768 mode=N 


Any ideas are welcome. If you made it this far, thanks for the perseverance
and time you already invested. 

Henry 


---------------------------------------------------------------- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
<http://www.orafaq.com>  
---------------------------------------------------------------- 
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx 
put 'unsubscribe' in the subject line. 
-- 
Archives are at //www.freelists.org/archives/oracle-l/
<//www.freelists.org/archives/oracle-l/>  
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
<//www.freelists.org/help/fom-serve/cache/1.html>  
----------------------------------------------------------------- 



-- Binary/unsupported file stripped by Ecartis --
-- Type: application/ms-tnef


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: