RE: DataGuard

From: "Allen, Brandon" <Brandon.Allen@xxxxxxxxxxx>
To: "Carel-Jan Engel" <cjpengel.dbalert@xxxxxxxxx>, <rgoulet@xxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 16 Jan 2007 10:33:17 -0700
Yes, we are most likely going to increase the bandwidth, but regardless
of that, we want to ensure that the standby will *never* halt the
production database - we are okay with some transaction loss, that's why
we are running in maximum performance mode and using ARCH to transfer
the logs.  The business folks have been well informed and are okay with
the potential transaction loss - they are not okay with slowing down
their production system.  No matter how much bandwidth we buy, there
could always be problems with the WAN service provider, or someone
accidentally FTPing some huge files over the same pipe and slowing
things down so we also need to make sure that this will not cause
problems for our primary database.
 
Are you aware of any configuration that would meet these requirements
other than the proposed cascading primary>localSB>remoteSB?
 
We are running a COTS app (BaanIV ERP) so we can't do much about the
redo.
 
Thanks!
Brandon
 

________________________________

From: Carel-Jan Engel [mailto:cjpengel.dbalert@xxxxxxxxx] 
Sent: Monday, January 15, 2007 5:23 PM
To: Allen, Brandon
Cc: rgoulet@xxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: DataGuard


Brandon,

If the system is important for the business, they (the business)  should
provide enough (budget for enough) bandwidth. If they can afford the
money for an extra server, with extra Oracle EE licenses, having a
proper line with enough bw. shouldn't be a very difficult business case.


If you follow the suggestion to use the local standby as a buffer for
redo forwarding, be aware that an unknown amount of redo is not sent to
the DR site at any given point in time. If at that point the disaster
strikes, you will loose transactions. If the business is aware of that
risk, and made the trade-off, fine! Let them confirm that to you in
writing. Too often they are unaware, and the technicians get blamed in
the event of a failover to the DR-site, loosing important transactions.
Just because the techies were responible for the HA solution. Wrong.
Management doesn't make housekeeping responsible for insuring the
building either.

Consider using hardware line cards to compress the redo traffic. Use QOS
on the routers to prioritize the data sent on the portnumber you choose
for redo transport. People might come up with the suggestion of using
ssh tunneling with compression. IMHO: Too cumbersome for HA. You need an
extra process, it will consume CPU, it can fail, needs monitoring, etc.
James Morle used Cisco line cards at a DG site we set up and they got
typically 4:1 compression. They were appr. $1000 each. No setup, no
worries, no monitoring.

Finally: is the application optimized for minimized redo generation?


Best regards,

Carel-Jan Engel

===
If you think education is expensive, try ignorance. (Derek Bok)
===     
On Mon, 2007-01-15 at 16:48 -0700, Allen, Brandon wrote: 

        I'm not questioning Carel-Jan's recommendation at all - I think
he has more DG knowledge in his pinky than I'll ever have, but just
passing on a case where cascading setup might be appropriate/necessary: 

        

        We have been struggling to get a standard (single standby) DG
setup working for the last few months because our network connection
isn't sufficient to keep up with the rate of our redo generation and
when the transfer of archived logs falls behind far enough, it
eventually freezes the production database.  We're using ARCH to
transfer the logs and already tried upgrading to 9.2.0.8 and setting the
hidden parameter _log_archive_callout='LOCAL_FIRST=true', but we still
see this behavior.  Oracle Support's recommendation is to implement a
cascading standby where we ship the logs to a local standby first and
then go from the local to the remote so that the local standby operates
as a buffer to keep the slow network from halting our primary.  We are
considering their recommendation, but going to try everything else we
can think of to avoid it first, which will probably include upgrading to
10.2 because supposedly this problem no longer occurs in 10g, but that's
the same thing we were told about 9.2.0.7 with the local_first=true
setting (Metalink 260040.1) and we're not very confident based on our
experience with that config. 

        

        Of course another option that we're considering is increasing
the network bandwidth to the remote destination, but we would really
like to have dataguard configured such that it will absolutely never
impair production performance because even with the increased bandwidth,
there is always the possibility of WAN problems, someone accidentally
clogging the pipe with other large files, etc. 

        

        Carel-Jan, if you have any recommendations, we'd love to hear
them! 

        

        Thanks, 

        Brandon 

        


________________________________


        From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Carel-Jan Engel
        Sent: Monday, January 15, 2007 4:12 PM
        
        


        Stay away from cascaded redo transport as far as you can. You
simply don't want that in HA environments. As Miracleas.dk states:
'..Complexity is the enemy of availability...' (and I'd like to add: '..
but the friend of consultancy...' J). Imagine a switchover with cascaded
transport. The whole redo transport stack has to be reinvented. A
star-configuration (primary points to both standbys) is much easier to
setup. 
        
        Privileged/Confidential Information may be contained in this
message or attachments hereto. Please advise immediately if you or your
employer do not consent to Internet email for messages of this kind.
Opinions, conclusions and other information in this message that do not
relate to the official business of this company shall be understood as
neither given nor endorsed
        



Privileged/Confidential Information may be contained in this message or 
attachments hereto. Please advise immediately if you or your employer do not 
consent to Internet email for messages of this kind. Opinions, conclusions and 
other information in this message that do not relate to the official business 
of this company shall be understood as neither given nor endorsed by it.
References:
- RE: DataGuard
  - From: Richard J. Goulet
- RE: DataGuard
  - From: Carel-Jan Engel
- RE: DataGuard
  - From: Allen, Brandon
- RE: DataGuard
  - From: Carel-Jan Engel
RE: DataGuard

Other related posts: