cursor: mutex S + library cache lock + library cache: mutex X = disaster

From: nico torres <frasebook4@xxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 3 Nov 2014 14:58:29 -0300
Hi, list.
 I've had this situation last saturday. I got a call, telling me that no
application could get into the database; checked alert log, ORA-00018:
maximum number of sessions exceeded everywhere.
Db host felt kind of clunky and slow, so I checked htop, 100% utilization
on all CPU's, around *462 load*, wow.
  Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
neither.
 I finally could get into the database, and immediately run snapper, and
what I saw was all sessions waiting for "*cursor: mutex S*", with no
running queries, no sql_id available. So I came into the conclussion that
all sesions hung with the same wait, and in the same time, applications
logged in more and more sessions, so it reached max sessions and nobody
else could login.
 All of those sessions had different blockers, blocking each other, and
different object#, so I coulnd't find a main culprit either.
 Tried killing those sessions, couldn't; Tried closing the instance with a
shutdown immediate and it got hung, so I ended up issuing a shutdown abort.
Instance closed, and started up pretty fast and things got back to normal.
 Looking at AWR snaps from that timestamp, I got:


Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Avg

 wait   % DB
Event                                        Waits      Time(s)  (ms)
 time
------------------------------ ------------ ----------- ------ ------
cursor: mutex S                  20,166,853   1,057,135     52   86.5
library cache lock                      90,243       91,516   1014    7.5
DB CPU                                   43,678
  3.6
library cache: mutex X               74,002       10,816    146      .9
db file sequential read              276,847         1,417      5        .1

As expected, lots of "cursor: mutex S", but also Library cache lock and
library cache:mutex X.

Load Profile              Per Second    Per Transaction
~~~~~~~~~~~~         ---------------    ---------------
      DB Time(s):              249.6                1.4
       DB CPU(s):                8.9                0.1
       Redo size:           77,451.6              438.3
   Logical reads:           29,037.2              164.3
   Block changes:              320.9                1.8
  Physical reads:               70.3                0.4
 Physical writes:               23.5                0.1
      User calls:              755.3                4.3
          Parses:              197.7                1.1
     Hard parses:                0.2                0.0
W/A MB processed:            0.1                0.0
          Logons:                1.9                0.0
        Executes:              203.8                1.2
       Rollbacks:              173.7                1.0
    Transactions:              176.7

Hard parses not a concern here.

Statistic Name                                                    Time (s)
            % of DB Time
------------------------------------------ ------------------ ------------
parse time elapsed                                              1,217,693.8
        99.6
connection management call elapsed time               868,991.0         71.1
DB CPU
 43,677.7          3.6

All sessions clearly hung at parse time.

Foreground Wait Events
cursor: mutex S              20,166,853   100  1,057,135      52     23.3
86.5
library cache lock                  90,243      0     91,516    1014
 0.1    7.5
library cache: mutex X           74,002      0     10,816     146      0.1
    .9
db file sequential read          276,847      0      1,417       5      0.3
    .1
cursor: pin S wait on X                 94     0        164    1749
 0.0     .0

I searched around MOS and some bugs appeared, but couldn't find one that
matches entirely.

Some information about the database:
-------------------------

Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my main
suspect.

It uses ASSM
                            begin          end
 SGA use (MB):      6,656.0      6,656.0
 PGA use (MB):        561.1      1,213.3

Cache sizes between the snaps:
    Buffer Cache:     3,840M
Shared Pool Size:    2,496M


Parameters:

memory_target: 13421772800
open_cursors                  300
processes                     600
sessions                      928
session_cached_cursors 50

Could anyone  please give me some advice on how to further investigate this
issue? Any other information needed?

Thanks
Follow-Ups:
- Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster
  - From: Tanel Poder
cursor: mutex S + library cache lock + library cache: mutex X = disaster

Other related posts: