Re: risk: hangalalyze and system state

  • From: John Hurley <hurleyjohnb@xxxxxxxxx>
  • To: oracle-l digest users <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 11 Feb 2014 15:53:07 -0800 (PST)

This is also of course one of the standard tricks to have canned and tested 
regularly on your test systems that match as close as you can your production 
environment.

The usual setup is a unix script that you can invoke with parameters ( level / 
how many to take / how long apart in time / just system state / just hang 
analyze ) etc.

You then document it ... test it on test systems ... keep it ready for a rainy 
day.

Then of course the most frustrating part can be if you have that all ready and 
you experience bad problems and you get all the evidence into Oracle support 
and they are unable to debug.  

 

________________________________
 From: Austin Hackett <hacketta_57@xxxxxx>
To: oracle-l digest users <oracle-l@xxxxxxxxxxxxx> 
Sent: Tuesday, February 11, 2014 4:01 PM
Subject: Re: risk: hangalalyze and system state
  

Hi Jeremy

I have had a bad experience with system state dumps on Solaris 10 (SPARC) and 
single instance 11.2.0.2.1. 

This is going back a couple of years in a previous role, so sorry I can't 
provide  much detail.

It involved an investigation into a parent cursor memory leak bug. Support 
asked me to take some system state dumps, which i did during a quiet time. This 
was a pretty high volume system, so a fair amount going on even during quiet 
periods. I'd scoured MOS for bugs involving system states dumps in my version, 
asked the analyst for confirmation multiple times that it was safe to do this, 
got clearance from management etc. Needless to say, within a minute or so of me 
running for first command, there were monitoring dashboards glowing red and 
application servers timing out. If I remember correctly, an exclusive mutex was 
held whilst the dump was being written to disk leading to fairly severe system 
hang. The feedback from support was that this kind of thing could happen in 
rare circumstances (which wasn't what they told me when I first asked!)

I remember reading another 112.2 war story shortly afterwards: 
http://oracledoug.com/serendipity/index.php?/archives/1645-Systemstate-Dump-warning.html

Unless things are totally hosed anyway, it's only something I'd do following a 
thorough search of MOS for bugs and a discussion with relevant people about the 
risk of taking the system state dump versus the impact/frequency of the issue 
at hand and any possible workarounds.

Thanks

Austin



--
//www.freelists.org/webpage/oracle-l

Other related posts: