Re: data guard fast start failover
- From: fairlie rego <fairlie_r@xxxxxxxxx>
- To: Alex Gorbachev <ag@xxxxxxxxxxxx>
- Date: Thu, 22 Jan 2009 03:50:44 -0800 (PST)
That is correct Alex.
We get around these issues partially by using outbound_connect_timeout in the
sqlnet.ora
of the mid tiers. (Not sure what is your client version ) We have a value of 3
seconds for OCT.
So if we take an example of the following connect string
xxxx =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = OFF)
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby1-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby2-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby3-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby4-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby5-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby6-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby7-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stdby8-vip.sys.au.eds.com)(PORT =
1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = prim1-vip.sys.au.eds.com)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = xxxx.commsec.com.au)
)
)
I have set load_balance = OFF so that we traverse through all the standby nodes
which when down
(in this case fictitious nodes) and with OCT set to 3 it takes around 12
seconds to establish a connection from a Solaris 10.2.0.3 client.
The other benefit of having all nodes in the mid tier is that we did not have
to change the tnsnames.ora each time did a switchover. We have done 8
switchovers over the past 3 months
Hope that makes sense.. The rest over some beer
Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
--- On Tue, 20/1/09, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:
From: Alex Gorbachev <ag@xxxxxxxxxxxx>
Subject: Re: data guard fast start failover
To: fairlie_r@xxxxxxxxx
Cc: "ORACLE-L Freelists" <oracle-l@xxxxxxxxxxxxx>
Received: Tuesday, 20 January, 2009, 11:26 AM
There are two issues - one is WebLogic specific as they have their own
connection management with multi-pools for DataGuard. (don't ask - they are
working on integration with FAN and RLB but that's not available yet).
The 2nd issue is generic - and with introduction of Oracle Clusterware, Oracle
solved is with VIP's. The problem is that when IP is not available, the
connection times out after a while. This is why VIP's are taken over by
survived nodes in RAC but I don't need to explain that to you. However, Data
Guard standby does not take over VIP's when it's promoted to primary. This
means that application connection to VIP's of old primary (now unavailable if
site are down or hosts a down) will take a while to timeout. If client side
Load Balancing is ON between standby and primary address_list's (in rare cases
when there is not real DR and people switch between sites regularly) then about
50% of connection requests will timeout after a minute or two whatever your
tcp_timeout setting in apps tier. If you configure your descriptor without load
balance option between primary and standby address lists but only with failover
then 100% of re-connects will be delayed.
Fairlie, please correct what I've got wrong here.
Cheers,
Alex
On 20/01/2009, at 9:43 AM, fairlie rego wrote:
You have a connection to the each node in RAC but how you handle connections to
standby?
Alex,
In the environment I am currently working on (2 8 node clusters in DG config)
we have both the primary and standby clusters node virtual IPs in the
tnsnames.ora (16 nodes) .
The application connects to RAC services which run only on the primary cluster.
Upon switchover/failure the db_role_change trigger fires which starts the
services on the standby nodes. Ofcourse it is a pain that dbms_service does not
update the OCR but let me not digress....
Am just curious as to why this may not work for you
Thanks
Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
--- On Mon, 19/1/09, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:
From: Alex Gorbachev <ag@xxxxxxxxxxxx>
Subject: Re: data guard fast start failover
To: "Mark Strickland" <strickland.mark@xxxxxxxxx>
Cc: Laimutis.Nedzinskas@xxxxxx, oracle-l@xxxxxxxxxxxxx
Received: Monday, 19 January, 2009, 9:58 AM
Thanks Mark,
What about Data Guard now? You have a connection to the each node in RAC but
how you handle connections to standby?
On one project I'm working on now, with RAC on primary and RAC on standby, we
plan to setup multi-pool controlling underlying pools for each instance on
primary *AND* standby. Theoretically, WebLogic multi-pool with load balancing
will not send transactions to the "broken" pools but in the past we didn't have
good experience with that.
Another issue is the failover time - VIP's are not taken over by standby on
role switch and, of course, connection timeout takes long time so if it's 60
seconds for you, is your OS setting for tcp_timeout 60 seconds?
Anybody attempted to do automation of VIP management integrating it with
Observer and FSFO?
Cheers,
Alex
On 19/01/2009, at 9:17 AM, Mark Strickland wrote:
I'll find out more from our WebLogic SME, but we're using WebLogic multi-pools
(multi-datasources?), ie each server running WebLogic has three connection
pools -- one for each of the RAC instances. The connections do re-connect
automatically after failover. We're finding that it takes 60-90 seconds for
failover and reconnect. I believe that we are using WebLogic XA transactions
but I'll verify.
-Mark
On Sun, Jan 18, 2009 at 1:49 PM, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:
Hi Mark,
Could you elaborate on WebLogic config you are using for RAC?
- Is it configured using WebLogic multi-datasources?
- Do you use WebLogic XA transactions? Does WebLogic datasource re-tries
transaction on reconnect?
- What are the patched you mentioned (perhaps, you have the reference to the
WebLogic support docs)?
Cheers,
Alex
On 17/01/2009, at 8:52 AM, Mark Strickland wrote:
We've been testing FSF with 10.2.0.2 and my co-DBA discovered a bug that can
cause a split-brain to occur. I don't remember the exact circumstances, but
the fix is in 10.2.0.4 which is driving us to apply that patchset. Our FSF
testing with 10.2.0.4 has been going very well. If you use WebLogic, it will
handle a failover but it requires a patch depending on what version you use.
I've been doing new 10.2.0.4 builds with RAC and Data Guard with FSF for a new
customer. No issues so far.
Mark
Seattle, WA
On Thu, Jan 15, 2009 at 11:27 PM, <Laimutis.Nedzinskas@xxxxxx> wrote:
Hi all
Anyone's using data guard fast-start failover ?
What are the experiences ?
What about split brain?
Does it interfere heavily with normal database activities?
Any other comments?
Thank you in advance,
Laimis N
--
http://www.freelists.org/webpage/oracle-l
Stay connected to the people that matter most with a smarter inbox. Take a look.
Stay connected to the people that matter most with a smarter inbox. Take
a look http://au.docs.yahoo.com/mail/smarterinbox
Other related posts: