Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster

  • From: natalka roshak <natalka.roshak@xxxxxxxxxx>
  • To: frasebook4@xxxxxxxxx, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 03 Nov 2014 16:30:24 -0500

Hi Nico, did you take a hanganalyze dump? "oradebug hanganalyze 3"

The hanganalyze dump shows wait chains from the hang analysis cache, also accessible via v$wait_chain. I haven't used snapper, but I googled it & it doesn't seem to query v$wait_chains.

If you didn't happen to take a hanganalyze dump: I haven't tried this AT ALL, but I just ran across another script by snapper's author Tanel Poder, which seems to pull wait chains from the active session history: http://blog.tanelpoder.com/files/scripts/ash/ash_wait_chains.sql or http://blog.tanelpoder.com/files/scripts/ash/dash_wait_chains.sql

-Natalka
--
Natalka Roshak | RAC/Storage BDE
Oracle <http://www.oracle.com/index.html>Global Product Support
45 O'Connor St, Suite 400 | Ottawa, Canada
On 03/11/2014 2:24 PM, nico torres wrote:
John:

The machine has 16 cores, 2 sockets, it's a virtual machine on a server cluster, what should I look into?

Thanks!

Nico

2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00@xxxxxxxxx <mailto:jcclarke00@xxxxxxxxx>>:

    Nico,

    Sounds like it could be CPU oversubscription - while there are
    bugs related to cursor%mutex% waits, I would this first.

    How many CPU cores are there on your machine?

    On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4@xxxxxxxxx
    <mailto:frasebook4@xxxxxxxxx>> wrote:

        Hi, list.
         I've had this situation last saturday. I got a call, telling
        me that no application could get into the database; checked
        alert log, ORA-00018: maximum number of sessions exceeded
        everywhere.
        Db host felt kind of clunky and slow, so I checked htop, 100%
        utilization on all CPU's, around *462 load*, wow.
          Tried to log / as sysdba, naturally couldn't, tried
        with -prelim flag, neither.
         I finally could get into the database, and immediately run
        snapper, and what I saw was all sessions waiting for "*cursor:
        mutex S*", with no running queries, no sql_id available. So I
        came into the conclussion that all sesions hung with the same
        wait, and in the same time, applications logged in more and
        more sessions, so it reached max sessions and nobody else
        could login.
         All of those sessions had different blockers, blocking each
        other, and different object#, so I coulnd't find a main
        culprit either.
         Tried killing those sessions, couldn't; Tried closing the
        instance with a shutdown immediate and it got hung, so I ended
        up issuing a shutdown abort. Instance closed, and started up
        pretty fast and things got back to normal.
         Looking at AWR snaps from that timestamp, I got:


        Top 5 Timed Foreground Events
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                             Avg
                                             wait   % DB
        Event      Waits      Time(s)  (ms)      time
        ------------------------------ ------------ ----------- ------
        ------
        cursor: mutex S  20,166,853   1,057,135     52   86.5
        library cache lock  90,243       91,516   1014    7.5
        DB CPU   43,678                               3.6
        library cache: mutex X 74,002       10,816    146      .9
        db file sequential read  276,847         1,417      5        .1

        As expected, lots of "cursor: mutex S", but also Library cache
        lock and library cache:mutex X.

        Load Profile              Per Second  Per Transaction
        ~~~~~~~~~~~~         ---------------  ---------------
              DB Time(s):              249.6              1.4
               DB CPU(s):                8.9              0.1
               Redo size:           77,451.6            438.3
           Logical reads:           29,037.2            164.3
           Block changes:              320.9              1.8
          Physical reads:               70.3              0.4
         Physical writes:               23.5              0.1
              User calls:              755.3              4.3
                  Parses:              197.7              1.1
             Hard parses:                0.2              0.0
        W/A MB processed:            0.1          0.0
                  Logons:                1.9              0.0
                Executes:              203.8              1.2
               Rollbacks:              173.7              1.0
            Transactions:              176.7

        Hard parses not a concern here.

        Statistic Name                            Time (s)       % of
        DB Time
        ------------------------------------------ ------------------
        ------------
        parse time elapsed                          1,217,693.8   99.6
connection management call elapsed time 868,991.0 71.1
        DB CPU                                  43,677.7          3.6

        All sessions clearly hung at parse time.

        Foreground Wait Events
cursor: mutex S 20,166,853 100 1,057,135 52 23.3 86.5
        library cache lock  90,243      0     91,516    1014      0.1
           7.5
library cache: mutex X 74,002 0 10,816 146 0.1 .9
        db file sequential read  276,847      0      1,417       5
         0.3     .1
        cursor: pin S wait on X 94     0        164    1749      0.0 .0

        I searched around MOS and some bugs appeared, but couldn't
        find one that matches entirely.

        Some information about the database:
        -------------------------

        Oracle *11.2.0.1*/, /it needs some patching urgently and a bug
        is my main suspect.

        It uses ASSM
                                    begin    end
         SGA use (MB):      6,656.0  6,656.0
         PGA use (MB):        561.1  1,213.3

        Cache sizes between the snaps:
            Buffer Cache:     3,840M
        Shared Pool Size:    2,496M


        Parameters:

        memory_target: 13421772800
        open_cursors                  300
        processes                     600
        sessions                      928
        session_cached_cursors 50

        Could anyone  please give me some advice on how to further
        investigate this issue? Any other information needed?

        Thanks








Other related posts: