A class of things I'd check is trends toward zero in available headroom of various resources up to the point where the system (and log writing) suspends and correlations between resources likely to conflict with each other. FOR EXAMPLE, not necessarily real, perhaps freemem is trending downward, total swap is almost full, then possibly an attempt to actually swap or page out can't finish. Any such deadly embrace should of course either time-out or fail, possibly causing a re-boot. However there are infinitely many possible and you may have found one that that didn't present itself as likely enough to code against. (Begging the question of how it evades some global time-out mechanism). Again: the example is merely for clarification of the sort of thing I'm suggesting you check the logs for; it might be entirely impossible in and of itself. (But surely was present in the early editions of a few early operating systems.) mwf -----Original Message----- From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of MacGregor, Ian A. Sent: Monday, March 09, 2015 11:21 AM To: oracle-l@xxxxxxxxxxxxx Subject: RE: SunFire Server Hangs It's the entire server which hangs. Once we can access the machine, after the reset though the system process, a check of the system down time shows the time the machine froze. It's like the server panicked, but did not make it all the way down as it remains pingable. When it happens all programs which might provide some information as to the cause stop. However, up to that point things look very normal indeed. All the storage is onboard. These machines accommodate 16 drives internally. None of the machines which has had this problem is clustered. They are dedicated database machines. The OS is Solaris 10. Ian MacGregor SLAC National Accelerator Center <snip> -- //www.freelists.org/webpage/oracle-l