Re: Re[6]: 2 Node RAC Standby -- A mix of Managed Recovery and Read only?

  • From: Carel-Jan Engel <cjpengel.dbalert@xxxxxxxxx>
  • To: Leyi Zhang <kamusis@xxxxxxxxx>
  • Date: Sun, 09 Jan 2005 16:31:38 +0100

Hi Leyi,
On Fri, 2005-01-07 at 19:32, Leyi Zhang wrote:

> disk failure is just one of the risk about shortage of archived redolog
> in standby site.
> How about if some network failure happend, maybe one or more archived
> log cannot transfer into ur standby site. If u set ur standby at managed
> recover mode, the FAL_SERVER and FAL_CLINET will take care of this
> situation. But if in readonly mode, FAL_SERVER and FAL_CLINET will have
> no function until u change it to recover mode, maybe late at night.
> Of course, you can monitor the alertlog in primary site to detect these
> error, but also not a good idea.
> 
> 

That is a valid argument. Thank you for bringing it up. One needs always
to observe _all_ risks, and then decide how big or valid the risk is for
your own situation. My use of the phrasing 'the _only_ risk' is too much
limited. There are more risks, as you pointed out.

I must add that at sites, when I play a role in defining monitoring
scripts of the system, I also monitor the log_archive_dest_state_n and
the number of unapplied redo log files. Depending on the treshold
settings, and the severity level related to these treshold settings, the
DBA will be warned sooner or later. But I admit, when only one redo log
file (or even one transaction) hasn't been forwarded yet, even 'sooner'
can appear to be 'too late'. I need to do some more detailed testing
into this, but I think that Maximum Protection Mode will actually take
care of the problem by shutting down the primary when the redo of a
transaction can't be forwarded to the last standby, even when that
standby is in R/O mode. So far, I only tested this with standby's in
Managed Recovery mode. Then pulling the network cable will cut off the
primary almost instantly. In Maximum Availabiliy mode the primary will
continue, after the timeouts expired. At that moment the
log_archive_dest_state_n parameter will show the problem. 

If data protection is the main reason for implementing DG, configure two
standbys, and use Maximum Protection Mode. I remain to my first opinion,
that there is not _a_ 'better' or even 'best' solution. Everything
depends on SLA, budgets, needs, and expectations. The 'best' solution is
the solution that just satisfies the customer (and no more than that),
for the budget available. That requires careful investigations of the
risks, allowed downtimes, allowed recovery times, allowed dataloss. I've
a customer that needs an HA system, that gets queried about 20
times/week(!), gets updated (batchwise) once a day and experiences about
2-4 online transactions every week. For several reasons HA is absolutely
required, with allowed donwtimes of less than half an hour. This system
will get other specifations than the one of a second customer that
produces 800 MB of redolog, 24*7. Another customer has a smaller server
for the standby than they use for the primary. They accept performance
loss when the primary fails and the standby takes over, and will simply
stop running some less urgent reports and backoffice work on that
occasion. When the downtime of the primary appears to last longer, whey
will upgrade the standby server. All three customers, however, have the
'best' solution, the best for their needs and budget. When the second
customer would have had a RAC configuration to allow for the # of
transactions /minute, I would have recommended a RAC configuration for
the standby as well. Otherwise the system wouldn't have had enough
capacity to really act as standby, and performance loss as a result of a
switcover/failover is not acceptable for them.


Best regards,

Carel-Jan Engel

===
If you think education is expensive, try ignorance. (Derek Bok)
===

Upcoming appearances: 

      * Jan 27, 2005: London, UKOUG Unix SIG: Data Guard Best Practices 
      * Feb 9-10, 2005: Denver, RMOUG Training Days: Data Guard
        Performance Issues 
      * Mar 6-10, 2005: Dallas, Hotsos Symposium: Data Guard Performance
        Issues


--
//www.freelists.org/webpage/oracle-l

Other related posts:

  • » Re: Re[6]: 2 Node RAC Standby -- A mix of Managed Recovery and Read only?