Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster

  • From: nico torres <frasebook4@xxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 3 Nov 2014 16:24:31 -0300

John:

 The machine has 16 cores, 2 sockets, it's a virtual machine on a server
cluster, what should I look into?

Thanks!

Nico

2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00@xxxxxxxxx>:

> Nico,
>
> Sounds like it could be CPU oversubscription - while there are bugs
> related to cursor%mutex% waits, I would this first.
>
> How many CPU cores are there on your machine?
>
> On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4@xxxxxxxxx> wrote:
>
>> Hi, list.
>>  I've had this situation last saturday. I got a call, telling me that no
>> application could get into the database; checked alert log, ORA-00018:
>> maximum number of sessions exceeded everywhere.
>> Db host felt kind of clunky and slow, so I checked htop, 100% utilization
>> on all CPU's, around *462 load*, wow.
>>   Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
>> neither.
>>  I finally could get into the database, and immediately run snapper, and
>> what I saw was all sessions waiting for "*cursor: mutex S*", with no
>> running queries, no sql_id available. So I came into the conclussion that
>> all sesions hung with the same wait, and in the same time, applications
>> logged in more and more sessions, so it reached max sessions and nobody
>> else could login.
>>  All of those sessions had different blockers, blocking each other, and
>> different object#, so I coulnd't find a main culprit either.
>>  Tried killing those sessions, couldn't; Tried closing the instance with
>> a shutdown immediate and it got hung, so I ended up issuing a shutdown
>> abort. Instance closed, and started up pretty fast and things got back to
>> normal.
>>  Looking at AWR snaps from that timestamp, I got:
>>
>>
>> Top 5 Timed Foreground Events
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>    Avg
>>
>>    wait   % DB
>> Event                                        Waits      Time(s)  (ms)
>>  time
>> ------------------------------ ------------ ----------- ------ ------
>> cursor: mutex S                  20,166,853   1,057,135     52   86.5
>> library cache lock                      90,243       91,516   1014    7.5
>> DB CPU                                   43,678
>>     3.6
>> library cache: mutex X               74,002       10,816    146      .9
>> db file sequential read              276,847         1,417      5
>>  .1
>>
>> As expected, lots of "cursor: mutex S", but also Library cache lock and
>> library cache:mutex X.
>>
>> Load Profile              Per Second    Per Transaction
>> ~~~~~~~~~~~~         ---------------    ---------------
>>       DB Time(s):              249.6                1.4
>>        DB CPU(s):                8.9                0.1
>>        Redo size:           77,451.6              438.3
>>    Logical reads:           29,037.2              164.3
>>    Block changes:              320.9                1.8
>>   Physical reads:               70.3                0.4
>>  Physical writes:               23.5                0.1
>>       User calls:              755.3                4.3
>>           Parses:              197.7                1.1
>>      Hard parses:                0.2                0.0
>> W/A MB processed:            0.1                0.0
>>           Logons:                1.9                0.0
>>         Executes:              203.8                1.2
>>        Rollbacks:              173.7                1.0
>>     Transactions:              176.7
>>
>> Hard parses not a concern here.
>>
>> Statistic Name                                                    Time
>> (s)             % of DB Time
>> ------------------------------------------ ------------------ ------------
>> parse time elapsed
>>  1,217,693.8         99.6
>> connection management call elapsed time               868,991.0
>> 71.1
>> DB CPU
>>  43,677.7          3.6
>>
>> All sessions clearly hung at parse time.
>>
>> Foreground Wait Events
>> cursor: mutex S              20,166,853   100  1,057,135      52     23.3
>>   86.5
>> library cache lock                  90,243      0     91,516    1014
>>  0.1    7.5
>> library cache: mutex X           74,002      0     10,816     146
>>  0.1     .9
>> db file sequential read          276,847      0      1,417       5
>>  0.3     .1
>> cursor: pin S wait on X                 94     0        164    1749
>>  0.0     .0
>>
>> I searched around MOS and some bugs appeared, but couldn't find one that
>> matches entirely.
>>
>> Some information about the database:
>> -------------------------
>>
>> Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my
>> main suspect.
>>
>> It uses ASSM
>>                             begin          end
>>  SGA use (MB):      6,656.0      6,656.0
>>  PGA use (MB):        561.1      1,213.3
>>
>> Cache sizes between the snaps:
>>     Buffer Cache:     3,840M
>> Shared Pool Size:    2,496M
>>
>>
>> Parameters:
>>
>> memory_target: 13421772800
>> open_cursors                  300
>> processes                     600
>> sessions                      928
>> session_cached_cursors 50
>>
>> Could anyone  please give me some advice on how to further investigate
>> this issue? Any other information needed?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>

Other related posts: