Re: Oracle backups using Snapshot Technology

Comments inline:

On 11/9/06, Hameed, Amir <Amir.Hameed@xxxxxxxxx> wrote:

... "until the split is sent to tape, there is no good
backup because a disk failure in the primary backup storage can destroy
the entire snapshot". Even though this is true but there are ways to
protect the online backup mirror by mirroring it with 1+0 or 0+1. It is
certainly not a cheap solution but there is no guarantee that a tape
will not go bad after the snapshot is copied to the tape....


Sure tapes fail.  That is precisely why most "enterprise" backup solutions
allow you to duplex (or multiplex) tapes.  They not only assume that tapes
can fail, but sensibly assume that they will, fail. Just as a good backup
strategy must assume that RAID-1 storage not only can, but will, fail.

Please bear in mind that I made that quoted comment in the context of people
who use a "snapshot" volume as their only backup, that is, they never write
the contents of the snapshot volume to tape.  (Yes, such sites, do exist, as
horrible as this is to contemplate.)  And besides no level of RAID on the
snapshot volume will protect you from failure of the primary media until
after the snapshot has "hardened".  (Some types of snapshot never "harden",
by the way.)

Of course, at "sane" sites, this is a non issue.  So the "primary" storage
fails while I'm in the middle of writing the snapshot volume to tape?  Big
deal.  I still have yesterday's backups (on duplexed tapes), and all of the
archive logs.  And my datafiles, archive logs, and online redo never share
common (physical) spindles, so no matter which RAID-10 volume failed, I can
still recover right up to the last committed transaction.  At "sane" sites.
Alas, I haven't actually seen one of those for a while...  :-(

(For some reason, it no longer seems fashionable to place datafiles, online
redo, and archived redo on disjoint sets of disks.  It almost seems that
people have forgotten that we do that for data protection purposes, not for
performance purposes...)

Some people might think I am paranoid about data protection.  But those
people have probably never experienced a situation where 30 disks out of a
set of 200 failed simultaneously.  (Yes, I have seen that, and lived to talk
about it.)  Even with RAID-10, you have to be pretty lucky to come though an
incident like that unscathed.  We didn't get through without downtime, in
part because we had RAID-0+1 instead of RAID-10 so the odds against us were
astronimical.   But we survived, and didn't lose a single committed
transaction.  Paranoid?  Maybe, but events like this end careers.  And
corporations.  It pays to protect against them.

By the way, don't ask about the RAID-0+1 -- it was the best available
technology at the time...  ;-)

--
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs

Other related posts: