John: The machine has 16 cores, 2 sockets, it's a virtual machine on a server cluster, what should I look into? Thanks! Nico 2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00@xxxxxxxxx>: > Nico, > > Sounds like it could be CPU oversubscription - while there are bugs > related to cursor%mutex% waits, I would this first. > > How many CPU cores are there on your machine? > > On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4@xxxxxxxxx> wrote: > >> Hi, list. >> I've had this situation last saturday. I got a call, telling me that no >> application could get into the database; checked alert log, ORA-00018: >> maximum number of sessions exceeded everywhere. >> Db host felt kind of clunky and slow, so I checked htop, 100% utilization >> on all CPU's, around *462 load*, wow. >> Tried to log / as sysdba, naturally couldn't, tried with -prelim flag, >> neither. >> I finally could get into the database, and immediately run snapper, and >> what I saw was all sessions waiting for "*cursor: mutex S*", with no >> running queries, no sql_id available. So I came into the conclussion that >> all sesions hung with the same wait, and in the same time, applications >> logged in more and more sessions, so it reached max sessions and nobody >> else could login. >> All of those sessions had different blockers, blocking each other, and >> different object#, so I coulnd't find a main culprit either. >> Tried killing those sessions, couldn't; Tried closing the instance with >> a shutdown immediate and it got hung, so I ended up issuing a shutdown >> abort. Instance closed, and started up pretty fast and things got back to >> normal. >> Looking at AWR snaps from that timestamp, I got: >> >> >> Top 5 Timed Foreground Events >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> Avg >> >> wait % DB >> Event Waits Time(s) (ms) >> time >> ------------------------------ ------------ ----------- ------ ------ >> cursor: mutex S 20,166,853 1,057,135 52 86.5 >> library cache lock 90,243 91,516 1014 7.5 >> DB CPU 43,678 >> 3.6 >> library cache: mutex X 74,002 10,816 146 .9 >> db file sequential read 276,847 1,417 5 >> .1 >> >> As expected, lots of "cursor: mutex S", but also Library cache lock and >> library cache:mutex X. >> >> Load Profile Per Second Per Transaction >> ~~~~~~~~~~~~ --------------- --------------- >> DB Time(s): 249.6 1.4 >> DB CPU(s): 8.9 0.1 >> Redo size: 77,451.6 438.3 >> Logical reads: 29,037.2 164.3 >> Block changes: 320.9 1.8 >> Physical reads: 70.3 0.4 >> Physical writes: 23.5 0.1 >> User calls: 755.3 4.3 >> Parses: 197.7 1.1 >> Hard parses: 0.2 0.0 >> W/A MB processed: 0.1 0.0 >> Logons: 1.9 0.0 >> Executes: 203.8 1.2 >> Rollbacks: 173.7 1.0 >> Transactions: 176.7 >> >> Hard parses not a concern here. >> >> Statistic Name Time >> (s) % of DB Time >> ------------------------------------------ ------------------ ------------ >> parse time elapsed >> 1,217,693.8 99.6 >> connection management call elapsed time 868,991.0 >> 71.1 >> DB CPU >> 43,677.7 3.6 >> >> All sessions clearly hung at parse time. >> >> Foreground Wait Events >> cursor: mutex S 20,166,853 100 1,057,135 52 23.3 >> 86.5 >> library cache lock 90,243 0 91,516 1014 >> 0.1 7.5 >> library cache: mutex X 74,002 0 10,816 146 >> 0.1 .9 >> db file sequential read 276,847 0 1,417 5 >> 0.3 .1 >> cursor: pin S wait on X 94 0 164 1749 >> 0.0 .0 >> >> I searched around MOS and some bugs appeared, but couldn't find one that >> matches entirely. >> >> Some information about the database: >> ------------------------- >> >> Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my >> main suspect. >> >> It uses ASSM >> begin end >> SGA use (MB): 6,656.0 6,656.0 >> PGA use (MB): 561.1 1,213.3 >> >> Cache sizes between the snaps: >> Buffer Cache: 3,840M >> Shared Pool Size: 2,496M >> >> >> Parameters: >> >> memory_target: 13421772800 >> open_cursors 300 >> processes 600 >> sessions 928 >> session_cached_cursors 50 >> >> Could anyone please give me some advice on how to further investigate >> this issue? Any other information needed? >> >> Thanks >> >> >> >> >> >> >