RE: Shutdown Abort, a defense of some guessing

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <jeremiah@xxxxxxxxxxx>, <jkstill@xxxxxxxxx>
  • Date: Tue, 3 Jul 2007 16:01:08 -0400

To be precise, and please contradict at will if you think I've got this
wrong, the increase in exposure from shutdown abort is unwritten dirty
blocks past the point of a putative corruption in the online redo log where
the dirty blocks contain correct information.

A corruption in the online redo log nearly never happens, and this is
further reduced by the requirement that the log is broken but no dirty
blocks past that point are broken (which also results in a failure).

Now Oracle's recovery model cannot survive giving the computer an
instruction to write "a" with a checksum and having the actual contents be
"b" with the correct checksum due to something going wrong from the point of
telling it to write "a" down to getting the correct information saved on the
spinning rust. I'm not aware of any recovery model that will survive those
conditions. Aside from such putative rates of errors that can be teased out
from how bulletproof things like ECC codes are and all the vulnerabilities
inherent in physical reality, the last documented software corruption of the
online redo logs I'm aware of was a race condition in high cpu count smp
ports around 6.0.33.x and fixed by 6.0.36.x (and a corresponding patch to
7.0.? which I think was still beta). Now maybe I missed one, but we're over
half a lifetime of Oracle ago for that one.

This is my long winded way of agreeing that it is passing strange and
ignoring the numbers to eschew "shutdown abort" as part of routine
maintenance. There was a period of years where "shutdown immediate" was a
good way to get in trouble on some ports because it was buggy, and before
Oracle allowed opening the database before all pending rollbacks were
complete "abort" could result in protracted outages itself if you were silly
enough to kneecap a large monolith. But nothing else was going to be quicker
anyway and in fact the rollback proceeded quicker with less noise since no
one could connect to consume resources. Allowing opening sooner as soon as
the system tablespace was coherent and the setup of the transactions
managing the pending rollbacks were complete was an outgrowth of the demand
for reducing down time after a shutdown abort.

Now as for the other thread, I have a slight quibble. I hold that there is a
variable amount of economic guessing time ranging from zero seconds to
perhaps 5 or 15 minutes depending on the nature and complexity of the
question such that the sum of the cost of the trouble shooting or
optimization is less than immediately leaping to measuring and analyzing the
data that will tell you for sure what is happening. However, I also applaud
Alex G.'s "rants" because the likely to be economic guessing time most
certainly should have expired well before a consultation to the list is
made. When in doubt, timing out on guessing quicker is better than guessing
too long. As tools leading to the provably correct answer get easier and
easier to use and faster and faster to execute, perhaps I'll some day agree
that the high limit for guessing time for all situations has become too
close to zero seconds to ever guess. For now I'll subscribe to the notion
that those of us who guess at all probably tend to guess too long.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Jeremiah Wilton
Sent: Tuesday, July 03, 2007 1:47 PM
To: jkstill@xxxxxxxxx
Cc: randyjo@xxxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: Re: Shutdown Abort

Thanks Jared.  It has taken all of my strength to not reply to some of 
the most egregious postings in this thread.  My more recent blurb on 
shutdown abort can be found on page four of my 2004 HA paper:

http://www.ora-600.net/articles/stayinalive.pdf

I will confirm that practically every site that requires very high 
availability uses 'abort' as SOP.  I am really surprised and 
disappointed by the wild theoretical conjecture that accompanies the 
steadfast resistance to 'abort'.

Speaking of wild theoretical conjecture, thanks to Alex G. for his 
recent rants on DBAs and guessing.  I have long been an opponent of the 
'guessing method' of Oracle tuning, which goes hand in hand with the 
'try a bunch of stuff' method of Oracle troubleshooting :-)

Best to all, including the guessers,

Jeremiah Wilton
ORA-600 Consulting
http://www.ora-600.net

Jared Still wrote:
> 
> Please see
http://www.speakeasy.org/~jwilton/oracle/shutdown-abort-bad.html
> 
> If you are not familiar with Jeremiah Wilton, he was a DBA at Amazon.com 
> <http://Amazon.com> from early days.
> 
> Amazon has a few databases, and they were/are regularly shutdown with 
> abort.
--
//www.freelists.org/webpage/oracle-l




--
//www.freelists.org/webpage/oracle-l


Other related posts: