Hi Nico, did you take a hanganalyze dump? "oradebug hanganalyze 3"The hanganalyze dump shows wait chains from the hang analysis cache, also accessible via v$wait_chain. I haven't used snapper, but I googled it & it doesn't seem to query v$wait_chains.
If you didn't happen to take a hanganalyze dump: I haven't tried this AT ALL, but I just ran across another script by snapper's author Tanel Poder, which seems to pull wait chains from the active session history: http://blog.tanelpoder.com/files/scripts/ash/ash_wait_chains.sql or http://blog.tanelpoder.com/files/scripts/ash/dash_wait_chains.sql
-Natalka -- Natalka Roshak | RAC/Storage BDE Oracle <http://www.oracle.com/index.html>Global Product Support 45 O'Connor St, Suite 400 | Ottawa, Canada On 03/11/2014 2:24 PM, nico torres wrote:
John:The machine has 16 cores, 2 sockets, it's a virtual machine on a server cluster, what should I look into?Thanks! Nico2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00@xxxxxxxxx <mailto:jcclarke00@xxxxxxxxx>>:Nico, Sounds like it could be CPU oversubscription - while there are bugs related to cursor%mutex% waits, I would this first. How many CPU cores are there on your machine? On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4@xxxxxxxxx <mailto:frasebook4@xxxxxxxxx>> wrote: Hi, list. I've had this situation last saturday. I got a call, telling me that no application could get into the database; checked alert log, ORA-00018: maximum number of sessions exceeded everywhere. Db host felt kind of clunky and slow, so I checked htop, 100% utilization on all CPU's, around *462 load*, wow. Tried to log / as sysdba, naturally couldn't, tried with -prelim flag, neither. I finally could get into the database, and immediately run snapper, and what I saw was all sessions waiting for "*cursor: mutex S*", with no running queries, no sql_id available. So I came into the conclussion that all sesions hung with the same wait, and in the same time, applications logged in more and more sessions, so it reached max sessions and nobody else could login. All of those sessions had different blockers, blocking each other, and different object#, so I coulnd't find a main culprit either. Tried killing those sessions, couldn't; Tried closing the instance with a shutdown immediate and it got hung, so I ended up issuing a shutdown abort. Instance closed, and started up pretty fast and things got back to normal. Looking at AWR snaps from that timestamp, I got: Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time ------------------------------ ------------ ----------- ------ ------ cursor: mutex S 20,166,853 1,057,135 52 86.5 library cache lock 90,243 91,516 1014 7.5 DB CPU 43,678 3.6 library cache: mutex X 74,002 10,816 146 .9 db file sequential read 276,847 1,417 5 .1 As expected, lots of "cursor: mutex S", but also Library cache lock and library cache:mutex X. Load Profile Per Second Per Transaction ~~~~~~~~~~~~ --------------- --------------- DB Time(s): 249.6 1.4 DB CPU(s): 8.9 0.1 Redo size: 77,451.6 438.3 Logical reads: 29,037.2 164.3 Block changes: 320.9 1.8 Physical reads: 70.3 0.4 Physical writes: 23.5 0.1 User calls: 755.3 4.3 Parses: 197.7 1.1 Hard parses: 0.2 0.0 W/A MB processed: 0.1 0.0 Logons: 1.9 0.0 Executes: 203.8 1.2 Rollbacks: 173.7 1.0 Transactions: 176.7 Hard parses not a concern here. Statistic Name Time (s) % of DB Time ------------------------------------------ ------------------ ------------ parse time elapsed 1,217,693.8 99.6connection management call elapsed time 868,991.0 71.1DB CPU 43,677.7 3.6 All sessions clearly hung at parse time. Foreground Wait Eventscursor: mutex S 20,166,853 100 1,057,135 52 23.3 86.5library cache lock 90,243 0 91,516 1014 0.1 7.5library cache: mutex X 74,002 0 10,816 146 0.1 .9db file sequential read 276,847 0 1,417 5 0.3 .1 cursor: pin S wait on X 94 0 164 1749 0.0 .0 I searched around MOS and some bugs appeared, but couldn't find one that matches entirely. Some information about the database: ------------------------- Oracle *11.2.0.1*/, /it needs some patching urgently and a bug is my main suspect. It uses ASSM begin end SGA use (MB): 6,656.0 6,656.0 PGA use (MB): 561.1 1,213.3 Cache sizes between the snaps: Buffer Cache: 3,840M Shared Pool Size: 2,496M Parameters: memory_target: 13421772800 open_cursors 300 processes 600 sessions 928 session_cached_cursors 50 Could anyone please give me some advice on how to further investigate this issue? Any other information needed? Thanks