RE: SunFire Server Hangs

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <ian@xxxxxxxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 9 Mar 2015 12:59:55 -0400

A class of things I'd check is trends toward zero in available headroom of 
various resources up to the point where the system (and log writing) suspends 
and correlations between resources likely to conflict with each other.

FOR EXAMPLE, not necessarily real, perhaps freemem is trending downward, total 
swap is almost full, then possibly an attempt to actually swap or page out 
can't finish.

Any such deadly embrace should of course either time-out or fail, possibly 
causing a re-boot. However there are infinitely many possible and you may have 
found one that that didn't present itself as likely enough to code against. 
(Begging the question of how it evades some global time-out mechanism).

Again: the example is merely for clarification of the sort of thing I'm 
suggesting you check the logs for; it might be entirely impossible in and of 
itself. (But surely was present in the early editions of a few early operating 
systems.)

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of MacGregor, Ian A.
Sent: Monday, March 09, 2015 11:21 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: SunFire Server Hangs

It's the entire server which hangs.   Once we can  access the machine, after  
the reset though the system process, a check of the system down time shows the 
time the machine  froze.    It's like the server  panicked, but did not make  
it all the way down  as it remains pingable.   When it happens all  programs 
which might provide some information as to the cause stop.     However, up to 
that point things look very normal indeed.

All the storage is  onboard.     These machines accommodate 16 drives 
internally.

None of the machines which has had this problem is clustered.    They are 
dedicated database  machines.    The OS is Solaris 10.

Ian MacGregor
SLAC National Accelerator Center

<snip>

--
//www.freelists.org/webpage/oracle-l

References:
- SunFire Server Hangs
  - From: MacGregor, Ian A.
- Re: SunFire Server Hangs
  - From: Andrew Kerber
- Re: SunFire Server Hangs
  - From: MacGregor, Ian A.
- Re: SunFire Server Hangs
  - From: Mladen Gogala
- RE: SunFire Server Hangs
  - From: MacGregor, Ian A.

RE: SunFire Server Hangs

Other related posts: