Re: risk: hangalalyze and system state

  • From: Austin Hackett <hacketta_57@xxxxxx>
  • To: oracle-l digest users <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 11 Feb 2014 21:01:10 +0000

Hi Jeremy

I have had a bad experience with system state dumps on Solaris 10 (SPARC) and 
single instance 11.2.0.2.1. 

This is going back a couple of years in a previous role, so sorry I can't 
provide  much detail.

It involved an investigation into a parent cursor memory leak bug. Support 
asked me to take some system state dumps, which i did during a quiet time. This 
was a pretty high volume system, so a fair amount going on even during quiet 
periods. I'd scoured MOS for bugs involving system states dumps in my version, 
asked the analyst for confirmation multiple times that it was safe to do this, 
got clearance from management etc. Needless to say, within a minute or so of me 
running for first command, there were monitoring dashboards glowing red and 
application servers timing out. If I remember correctly, an exclusive mutex was 
held whilst the dump was being written to disk leading to fairly severe system 
hang. The feedback from support was that this kind of thing could happen in 
rare circumstances (which wasn't what they told me when I first asked!)

I remember reading another 112.2 war story shortly afterwards: 
http://oracledoug.com/serendipity/index.php?/archives/1645-Systemstate-Dump-warning.html

Unless things are totally hosed anyway, it's only something I'd do following a 
thorough search of MOS for bugs and a discussion with relevant people about the 
risk of taking the system state dump versus the impact/frequency of the issue 
at hand and any possible workarounds.

Thanks

Austin



--
//www.freelists.org/webpage/oracle-l


Other related posts: