RE: Some Dataguard is good, lots more must be better?(specifically, when do most actual failovers really occur?)

  • From: Carel-Jan Engel <cjpengel.dbalert@xxxxxxxxx>
  • To: Laimutis.Nedzinskas@xxxxxxxxxxxxx
  • Date: Thu, 21 Sep 2006 11:51:27 +0200

On Thu, 2006-09-21 at 08:46 +0000, Laimutis Nedzinskas wrote: 

> No, I do not confuse. I just was not 100% sure if Oracle can do it
> because I've never tested it myself. 

Your phrase 'Well, it is not a good option for maximum data
protection(as Oracle defines it.)' is misplaced then. You don't know,
you haven't tested whether it is a good option. Stating an untested
assumption as a fact is not right. I did test this a lot. It actually
works. 

> The point is I've never used this option is that together with Data
> Protection one wants High Availability which means that time lag is
> contradicting this requirement. In numbers if 15 minutes downtime is
> allowed then recovery must be 15 minutes.

No. I have never seen (which doesn't meen it isn't possible) recovery
lasting as long as the timeframe spanned by the redo to be applied. In
general 15 minutes worth of redo does not take 15 minutes to apply. 

> I am not sure how to calculate maximum lag allowed as it depends on
> machine speed and redo size and probably redo contents. 

The maximum lag allowed should be business driven. How much time does
business allow themselves to discover a logical error? How much time do
they allow you to do the same? The time it takes to apply the amount of
redo for that a timeframe can only be determined by testing. How much
redo is generated at most during such a timeframe? How much time does it
take to apply that amount of redo? That depends mainly on your CPU,
storage abilities. Frequently I see 8 hours worth of redo being applied
in a handful of minutes. This is not a very idle system, BTW. Your
Mileage May Vary. TEST!

Again, this is why many organisations tend to install two standbys, once
the decision for installing a standby is made.

If your business really cannot afford an outage of say, > 4-6 hours, 2
standbys are required IMHO. Think of the situation when a real disaster
struck. Then you are running at your DR center, and that is your last
resort. If failover is not tested at a regular basis, this is an
extraordinary sitiuation for all admins involved. It leads to an even
more error prone situation, all the admins working with the systems they
never used to work with for real. Then there is the distraction of the
destroyed primary DC or system. New hardware needs to be ordered, or
selected first? This needs a setup? Co-workers are in hospital or even
died? How to concentrate on your daily work then, which additionally
isn't a routine at all in the strange environment? And then you're
running just one system, no standbys, (I've even seen 'no backup
hardware at the DR site') and so on. How vulnerable do you want to be?
If a second disaster (more likely to be originated by humanoid carbon
objects (Thank you Casey Dyke) under the circumstances described)
strikes it's over. How much time will it take then to get new hardware,
restore, and so on? 4-6 hours is rather optimistic I guess.

If you take countermeasures for HA, investigate the risks first. And not
just the risks at normal operation, but also the risks when running at
the DR site with a missing main DC. Calculate the costs of disasters in
terms of busniness interruption. Find out how much 'insurance premium'
it is worth to the business to cover for the risks. Calculate the costs
of the various solutions to cover for the risks. Make clear what the
leftover risks are when a certain (combination of) countermeasure(s) is
choosen for. Then let the business decide what the game plan will be.
It's their data, it's their budget.

And that 2 standby thing brings me back to the point the OP had in mind
when he started with this thread. Storage replication versus Data Guard.
How easy is it to replicate storage to two standbys, let's say from A to
B and C? And how easy is it to switch to the situation B to A and C, and
then to C to B and A, and then back to A to B and C? I can do this
easily with standby databases. Is it as easy for storage? With Data
Guard I can do this on a per database basis. Can storage replication
handle that? Or is it a 'box' granularity in role switching between
primary and standby?

Storage replication versus Data Guard. We skipped the bandwidth
discussion so far. I've disappointed the OP for that part. He was so
much expecting me to start discussing that. Maybe more about bandwidth
later. I have to figure how to test that for a fair comparison first.
Much later would be a better term. All suggestions are most welcomed. 

Best regards,

Carel-Jan Engel

===
If you think education is expensive, try ignorance. (Derek Bok)
===

Other related posts: