Re: data guard fast start failover
- From: Alex Gorbachev <ag@xxxxxxxxxxxx>
- To: fairlie_r@xxxxxxxxx
- Date: Tue, 20 Jan 2009 11:26:31 +1100
There are two issues - one is WebLogic specific as they have their own
connection management with multi-pools for DataGuard. (don't ask -
they are working on integration with FAN and RLB but that's not
available yet).
The 2nd issue is generic - and with introduction of Oracle
Clusterware, Oracle solved is with VIP's. The problem is that when IP
is not available, the connection times out after a while. This is why
VIP's are taken over by survived nodes in RAC but I don't need to
explain that to you. However, Data Guard standby does not take over
VIP's when it's promoted to primary. This means that application
connection to VIP's of old primary (now unavailable if site are down
or hosts a down) will take a while to timeout. If client side Load
Balancing is ON between standby and primary address_list's (in rare
cases when there is not real DR and people switch between sites
regularly) then about 50% of connection requests will timeout after a
minute or two whatever your tcp_timeout setting in apps tier. If you
configure your descriptor without load balance option between primary
and standby address lists but only with failover then 100% of re-
connects will be delayed.
Fairlie, please correct what I've got wrong here.
Cheers,
Alex
On 20/01/2009, at 9:43 AM, fairlie rego wrote:
You have a connection to the each node in RAC but how you handle
connections to standby?
Alex,
In the environment I am currently working on (2 8 node clusters in
DG config) we have both the primary and standby clusters node
virtual IPs in the tnsnames.ora (16 nodes) .
The application connects to RAC services which run only on the
primary cluster. Upon switchover/failure the db_role_change trigger
fires which starts the services on the standby nodes. Ofcourse it is
a pain that dbms_service does not update the OCR but let me not
digress....
Am just curious as to why this may not work for you
Thanks
Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
--- On Mon, 19/1/09, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:
From: Alex Gorbachev <ag@xxxxxxxxxxxx>
Subject: Re: data guard fast start failover
To: "Mark Strickland" <strickland.mark@xxxxxxxxx>
Cc: Laimutis.Nedzinskas@xxxxxx, oracle-l@xxxxxxxxxxxxx
Received: Monday, 19 January, 2009, 9:58 AM
Thanks Mark,
What about Data Guard now? You have a connection to the each node in
RAC but how you handle connections to standby?
On one project I'm working on now, with RAC on primary and RAC on
standby, we plan to setup multi-pool controlling underlying pools
for each instance on primary *AND* standby. Theoretically, WebLogic
multi-pool with load balancing will not send transactions to the
"broken" pools but in the past we didn't have good experience with
that.
Another issue is the failover time - VIP's are not taken over by
standby on role switch and, of course, connection timeout takes long
time so if it's 60 seconds for you, is your OS setting for
tcp_timeout 60 seconds?
Anybody attempted to do automation of VIP management integrating it
with Observer and FSFO?
Cheers,
Alex
On 19/01/2009, at 9:17 AM, Mark Strickland wrote:
I'll find out more from our WebLogic SME, but we're using WebLogic
multi-pools (multi-datasources?), ie each server running WebLogic
has three connection pools -- one for each of the RAC instances.
The connections do re-connect automatically after failover. We're
finding that it takes 60-90 seconds for failover and reconnect. I
believe that we are using WebLogic XA transactions but I'll verify.
-Mark
On Sun, Jan 18, 2009 at 1:49 PM, Alex Gorbachev <ag@xxxxxxxxxxxx>
wrote:
Hi Mark,
Could you elaborate on WebLogic config you are using for RAC?
- Is it configured using WebLogic multi-datasources?
- Do you use WebLogic XA transactions? Does WebLogic datasource re-
tries transaction on reconnect?
- What are the patched you mentioned (perhaps, you have the
reference to the WebLogic support docs)?
Cheers,
Alex
On 17/01/2009, at 8:52 AM, Mark Strickland wrote:
We've been testing FSF with 10.2.0.2 and my co-DBA discovered a
bug that can cause a split-brain to occur. I don't remember the
exact circumstances, but the fix is in 10.2.0.4 which is driving
us to apply that patchset. Our FSF testing with 10.2.0.4 has been
going very well. If you use WebLogic, it will handle a failover
but it requires a patch depending on what version you use. I've
been doing new 10.2.0.4 builds with RAC and Data Guard with FSF
for a new customer. No issues so far.
Mark
Seattle, WA
On Thu, Jan 15, 2009 at 11:27 PM, <Laimutis.Nedzinskas@xxxxxx>
wrote:
Hi all
Anyone's using data guard fast-start failover ?
What are the experiences ?
What about split brain?
Does it interfere heavily with normal database activities?
Any other comments?
Thank you in advance,
Laimis N
--
http://www.freelists.org/webpage/oracle-l
Stay connected to the people that matter most with a smarter inbox.
Take a look.
Other related posts: