We have a vexing problem at the moment which I felt might warrant posting.
We have a HP Tru64 Alpha machine which until recently was a hard working machine supporting between 10 and 20 instances.
The machine is 16 processor GS class server with 16 GB of memory.
Recently, the instances have one or another of the background processes (pmon, smon, dbwr) fail crashing the instances.
The background processes are failing with ORA-00600. Dedicated user connections are failing with ORA-07445. Neither has left much
in the way of "conclusive" evidence.
Instances are 9i (different revisions) and 10gR1.
The machine isn't configured to support the 10gR1 properly (missing a O/S patchset) but the 10gR1 instances (2) worked before.
We did have some hardware change in the device but the problem occurred before the hardware change. We also had some
disk issues but those appear to be resolved as well. Sys Admins say there is nothing in the log that indicates hardware issues.
We've been thinking fairly "outside the box" on this; to that end, we have:
- relinked all the binary sets
- built a new kernel and relinked all the binary sets
We are considering:
- setting an event to capture further detail - at this point, unsure specifically of what event to set - still researching
- perhaps running truss on one of the background processes in hopes of getting more information
- applying the missing O/S patchset to allow the 10gR1 to be properly supported - this will require another relink
If you have any ideas, I would appreciate them as this is knawing at the insides of everyone involved.
Stephen de Vries