RE: Shutdown Abort

  • From: "Robert Freeman" <robertgfreeman@xxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 3 Jul 2007 10:13:00 -0600

Shutdown AbortAnything is possible given the right set of conditions.
Clearly there is an issue with fsck in AIX5L that can cause problems with
temporary tablespace temp files. This same problem would seem to exist when
doing a shutdown immediate too. So if one were to do a shutdown abort, and
reboot the system and have this failure occur, one might well jump to the
conclusion that it's a problem with shutdown abort, when in fact it is not.

So there may be other OS interactions that happen in very odd cases that
might cause it appear that a shutdown abort is the issue when it is, in
reality, something else. Also there could be bugs present anywhere in the
configuration (OS, firmware, etc..) that could cause IO corruption given the
right conditions. Then there is the possibility of something running in the
background of the OS that caused the problem, who is to know? The bottom
line is that you are supposed to be able to pull the PLUG on the thing, and
expect that it will come up without help *every* *single* time (assuming
that pulling the plug didn't take out a physical disk for some reason).

I think you are spot on that this was not a controlled test, so anything
could have caused it. In my mind, if it's not reproducable, then there is
something about the test that was not controlled. The exact set of
conditions needs to be known and reproducable to call a test controlled.

The OP's problem might well have happened with a shutdown immediate, there
is no way of telling and so blaming shutdown abort is jumping to conclusions
that are not supported by any hard facts other than the fact that one of the
actions performed was a shutdown abort. How do we know it wasn't the startup
command that was at fault?? This test makes shutdown abort a suspect but not
a criminal. If you can replicate the problem consistently, then we have
something to work with, and I'd LOVE to see your results.

I guess my point of view is that you are as likely to find a bug with
shutdown immediate as you are with shutdown abort, so do we just not
shutdown the database at all because there  *might* be a bug? All other
things being equal, shutdown abort should not have negative impacts on your
database, and if it does, it's a bug. Besides, by the time I go production
on any given system, I've done a shutdown abort enough times on test and
development that it should be exercised pretty well. You can't spend your
life worrying about the bugs that *might* be there (there are enough real
ones to contend with!!), that is what backup strategies are for.

Finally the *body* of experience here seems to be that shutdown abort works,
and is perfectly safe. I've yet to see one case where anyone can reliably
replicate a shutdown abort bug in 9i or 10g that is exclusive to shutdown
abort. If anyone can provide a reproducible test case of shutdown abort
failure, please let me know.

RF


Robert G. Freeman
Oracle Consultant/DBA/Author
Principal Engineer/Team Manager
The Church of Jesus Christ of Latter-Day Saints
Father of Five, Husband of One,
Author of various geeky computer titles
from Osborne/McGraw Hill (Oracle Press)
Oracle Database 11g New Features Now Available for Pre-sales on Amazon.com!
Sig V1.1

  -----Original Message-----
  From: Joel.Patterson@xxxxxxxxxxx [mailto:Joel.Patterson@xxxxxxxxxxx]
  Sent: Tuesday, July 03, 2007 9:47 AM
  To: robertgfreeman@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
  Subject: RE: Shutdown Abort


  I agree.   It was very unusual - but apparently possible.    My original
note sided with immediate, but if that was unacceptable, I would use
t.     .  If Randy claims it has to be a controlled test, then so be it.   I
would think it quite hard to find exactly what was happening and make it
happen.



  Joel Patterson
  Database Administrator
  joel.patterson@xxxxxxxxxxx
  x72546
  904  727-2546

Other related posts: