FW: Some Dataguard is good, lots more must bebetter?(specifically, when do most actual failovers really occur?)

From: "Laimutis Nedzinskas" <Laimutis.Nedzinskas@xxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 21 Sep 2006 10:27:18 -0000
 

________________________________

From: Laimutis Nedzinskas 
Sent: 21. september 2006 10:26
To: 'Carel-Jan Engel'
Subject: RE: Some Dataguard is good, lots more must
bebetter?(specifically, when do most actual failovers really occur?)


 

________________________________

From: Carel-Jan Engel [mailto:cjpengel.dbalert@xxxxxxxxx] 


On Thu, 2006-09-21 at 08:46 +0000, Laimutis Nedzinskas wrote: 


        No, I do not confuse. I just was not 100% sure if Oracle can do
it because I've never tested it myself. 
        

Your phrase 'Well, it is not a good option for maximum data
protection(as Oracle defines it.)' is misplaced then. You don't know,
you haven't tested whether it is a good option. Stating an untested
assumption as a fact is not right. I did test this a lot. It actually
works.  
 
Agreed. I hope not too much harm was done.
 

        The point is I've never used this option is that together with
Data Protection one wants High Availability which means that time lag is
contradicting this requirement. In numbers if 15 minutes downtime is
allowed then recovery must be 15 minutes.
        

No. I have never seen (which doesn't meen it isn't possible) recovery
lasting as long as the timeframe spanned by the redo to be applied. In
general 15 minutes worth of redo does not take 15 minutes to apply.  
 
Yes, this is my experience too. However if you have a few hours of lag
then I am not sure how to calculate the upper bound of redo log apply
time which usually varies from a few minutes to 15-30 minutes (15
minutes is a coffee break, 30 - 2 coffee breaks, something that normal
business can tolerate) 

        I am not sure how to calculate maximum lag allowed as it depends
on machine speed and redo size and probably redo contents. 
        

The maximum lag allowed should be business driven. How much time does
business allow themselves to discover a logical error? How much time do
they allow you to do the same? The time it takes to apply the amount of
redo for that a timeframe can only be determined by testing. How much
redo is generated at most during such a timeframe? How much time does it
take to apply that amount of redo? That depends mainly on your CPU,
storage abilities. Frequently I see 8 hours worth of redo being applied
in a handful of minutes. This is not a very idle system, BTW. Your
Mileage May Vary. TEST! 
Again, this is why many organisations tend to install two standbys, once
the decision for installing a standby is made. 
 
2 standbys is my choice too. After all, good sleep costs. Testing is not
enough, not for me at least. I prefer proofs.
 
If your business really cannot afford an outage of say, > 4-6 hours, 2
standbys are required IMHO. 
 
4-6 hours would be a disaster for the business I happened to work for in
that last 8 years. 
 
And that 2 standby thing brings me back to the point the OP had in mind
when he started with this thread. Storage replication versus Data Guard.

 
Well, recently I had to think a lot about that in terms of HA (data
protection was part of HA in my case too meaning that the solution must
provide 100% of committed data after the disaster)
 
First of all: Let's say why the hell to go into storage solution or DG?
Why not to go into triple RAID 10? If a single disk fails then 2 more
are left.
Why not to go into cluster(non parallel): if one box fails then the
other is available with all it's functionality: CPU, RAM, network cards,
etc.
 
So far I arrived into this:
 
- DG or modern storage  (all kind of journalled file systems) allow to
have a geographically separated online copy of your database. As you
said the question what is faster is open. As far as I understand Oracle
redo logs are about keeping change vectors but not sql statements
(meaning "insert as select from" will generate a lot of redo). If this
is so then journaled file system should be able to compare with Oracle
regarding the bandwidth.
 
- DG interfaces with database via redo logs. I assume this minimizes the
possibility of human or software induced data corruption. If primary
database server (rdbms or OS or hardware) goes berserk and writes zeros
into datafiles then there is quite a probability that standby will
detect a corrupted redo and just stop. The same is true if human
destroys a file system for example by error or deliberately.
 
- keeping a lag is easy with DG.

Best regards,

Carel-Jan Engel

===
If you think education is expensive, try ignorance. (Derek Bok)
===     

Fyrirvari/Disclaimer
http://www.landsbanki.is/disclaimer
FW: Some Dataguard is good, lots more must bebetter?(specifically, when do most actual failovers really occur?)

Other related posts: