Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster

  • From: Tanel Poder <tanel@xxxxxxxxxxxxxx>
  • To: frasebook4@xxxxxxxxx
  • Date: Mon, 3 Nov 2014 22:01:00 -0600

You can look into ASH data from around your problem time - the P1 parameter
for cursor/mutex waits shows the library cache object hash value (or likely
the libcache hash bucket if it's less than 131072).
SQL> *@ash/dashtop* event,top_level_call_name,sql_opname,p1text,p1 "event
like 'cursor%'" "TIMESTAMP'2014-11-03 00:40:05'" "TIMESTAMP'2014-11-03
21:41:05'"

%This  EVENT                          TOP_LEVEL_CALL_NAME       SQL_OPNAME
     P1TEXT                       *P1* TotalSeconds FIRST_SEEN
 LAST_SEEN
------ ------------------------------ -------------------------
--------------- -------------------- ---------- ------------
------------------- -------------------
  33%  cursor: pin S wait on X        LOB/FILE operations       SELECT
     *idn*                  *3309402135*           10 2014-10-27 12:35:19
2014-10-27 12:35:19
  33%  cursor: pin S wait on X        VERSION2                  SELECT
     idn                  3309402135           10 2014-10-27 12:35:19
2014-10-27 12:35:19
  33%  cursor: pin S wait on X        VERSION2                  SELECT
     idn                  3363682207           10 2014-10-13 04:15:32
2014-10-13 04:15:32

In addition to the P1 you can also use the *top_level_call_name* field in
ASH to see what kind of DB calls your sessions were trying to execute
during the problem (like LOGON / OAUTH).

Now query v$db_object_cache by the hash value (or x$kglob.kglnahsv if DB
ver is older than 11.2):

SQL> SELECT * FROM *v$db_object_cache* WHERE hash_value = *3309402135*
  2  @pr
Pivoting output using Tom Kyte's printtab....
==============================
OWNER                         :
NAME                          : select
pos#,intcol#,col#,spare1,bo#,spare2,spare3 from icol$ where obj#=:1
DB_LINK                       :
NAMESPACE                     : SQL AREA
TYPE                          : CURSOR
SHARABLE_MEM                  : 20232
LOADS                         : 2
EXECUTIONS                    : 14031
LOCKS                         : 1
PINS                          : 0
KEPT                          : NO
CHILD_LATCH                   : 0
INVALIDATIONS                 : 1
HASH_VALUE                    : 3309402135
LOCK_MODE                     : NULL
PIN_MODE                      : NONE
STATUS                        : VALID
TIMESTAMP                     : 2014-10-30/15:08:02
PREVIOUS_TIMESTAMP            :
LOCKED_TOTAL                  : 779
PINNED_TOTAL                  : 14832
PROPERTY                      :
FULL_HASH_VALUE               : 346f70cda3227d855a05d824c5417817
CON_ID                        : 0
CON_NAME                      :
ADDR                          : 00000000C56F0548


But as you saw all kinds of different blockers contributing to the issue,
then especially if you do see lots of different P1 values for your waits,
it looks like that you had some systemic problem that caused processes to
hold their mutexes for very long time (CPU overload / scheduling
unfairness, memory shortage/paging, VM resource scheduling issues). The
fact that you ran out of sessions indicates that you have a connection pool
configured that started up many new connections, greatly amplifying this
performance problem.

Tanel




On Mon, Nov 3, 2014 at 11:58 AM, nico torres <frasebook4@xxxxxxxxx> wrote:

> Hi, list.
>  I've had this situation last saturday. I got a call, telling me that no
> application could get into the database; checked alert log, ORA-00018:
> maximum number of sessions exceeded everywhere.
> Db host felt kind of clunky and slow, so I checked htop, 100% utilization
> on all CPU's, around *462 load*, wow.
>   Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
> neither.
>  I finally could get into the database, and immediately run snapper, and
> what I saw was all sessions waiting for "*cursor: mutex S*", with no
> running queries, no sql_id available. So I came into the conclussion that
> all sesions hung with the same wait, and in the same time, applications
> logged in more and more sessions, so it reached max sessions and nobody
> else could login.
>  All of those sessions had different blockers, blocking each other, and
> different object#, so I coulnd't find a main culprit either.
>  Tried killing those sessions, couldn't; Tried closing the instance with a
> shutdown immediate and it got hung, so I ended up issuing a shutdown abort.
> Instance closed, and started up pretty fast and things got back to normal.
>
>

Other related posts: