FW: San & single point of failure

  • From: "Herring Dave - dherri" <Dave.Herring@xxxxxxxxxx>
  • To: "oracle_l" <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 19 Nov 2008 06:54:33 -0600

From this awesome discussion you've all helped me realize that I've got a 
vulnerability spot with my new servers.  All of them are using ASM, all with 1 
big ol' disk group.  I have multiple copies of the controlfile, but they're all 
on the 1 ASM disk group.  I believe I'll adjust that as soon as I can to put 1 
copy on 1 of the available filesystems, so I know for sure that I've got 
controlfiles on separate LUNs.  Thanks!

Dave
___________________________________
Dave Herring, DBA |   A c x i o m  M I C S / C S O
630-944-4762 office | 630-430-5988 wireless | 630-944-4989 fax
________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Mark Brinsmead
Sent: Monday, November 17, 2008 8:34 PM
To: piontekdd@xxxxxxxxx
Cc: czeiler@xxxxxxxxxx; oracle_l
Subject: Re: San & single point of failure

Absolutely correct, Brad.

In theory, if we were willing to put complete faith in our SAN devices, and to 
put complete faith in our operating systems and software, and to put complete 
faith in the humans configuring and operating them, we wouldn't need any 
redundancy.  Everything would work as it was supposed to, and nothing would 
ever fail.  Unless, of course, our faith was misplaced.  :-)

The sad fact is, though, stuff fails.  Hardware fails, firmware fails, software 
fails (a lot) and people fail (often even more).

People who use (good) SAN devices rarely suffer data loss these days as the 
result of disk failures.  (Rarely, but not "never".)

I have been present at sales presentations where sales reps (and pre-sales 
engineers who really ought to know better) actually swore that their SAN device 
is "infallible", and that no customer using that particular device had ever 
lost data.  

I also have friends who work for storage / backup vendors, and have heard 
plentiful (first-hand) horror stories of simple hardware or firmware upgrades 
completely obliterating the entire contents of multi-terabyte disk arrays.  
Permanently and irretrievably.  No human error (provably) involved!

SAN devices have become amazingly good at protecting us from data loss due to 
failure of a single disk, or sometimes even many disks.  But what protects us 
from failure of the SAN device?

Few things move me closer to tears than to review a customer's systems and find 
all of the following on the same SAN device:
*  Datafiles
*  (All) Online redologs
*  Archived redo logs
*  (All) Controlfiles
*  (All) Backups.

Don't get me wrong.  RAID arrays are great.  But we really need to be careful 
not to trust them too much.
On Mon, Nov 17, 2008 at 2:06 PM, Bradd Piontek <piontekdd@xxxxxxxxx> wrote:
there are other reasons to multi-plex the controlfile If you only have one, you 
aren't guarded from logical controlfile corruption. OR, say, maybe a dba or 
admin accidentally removes one of your controlfiles. 

In theory, provided your SAN adminstrators lay things out correctly, there may 
be something to be said for their redundancy at the hardware level. I've seen 
database be sliced up into /data and /archive. As time has gone on in my 
career, I've asked more questions on the layout and thought about things a bit 
more. I'm not sure there is a clear cut answer, but it definitely does 'depend'.

Bradd Piontek
  "Next to doing a good job yourself, 
        the greatest joy is in having someone 
        else do a first-class job under your  
        direction."
 -- William Feather
On Mon, Nov 17, 2008 at 2:50 PM, Claudia Zeiler <czeiler@xxxxxxxxxx> wrote:
All, 
I have just been given a new server to put a database on.  It is a SAN server, 
but the apparent layout of drives to me is:
/redo1
/redo2
/big    everything_else_disk
 
This means that I have just put control_file1, 2, and 3  all in the same place 
- on /big.  I thought that the whole point of multiple control files was to 
avoid single points of failure, such as a single location.
 
I am told that SAN layout is to handle mirroring, striping, & hot spots behind 
the scene and I don't need to worry.  If this is true, why do I need duplicates 
of the control file?
 
Something smells fishy to me.  Does anyone else have an opinion?
 
-Claudia 
-- 
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs
***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************

--
//www.freelists.org/webpage/oracle-l


Other related posts:

  • » FW: San & single point of failure - Herring Dave - dherri