Re: data guard fast start failover

  • From: fairlie rego <fairlie_r@xxxxxxxxx>
  • To: Alex Gorbachev <ag@xxxxxxxxxxxx>
  • Date: Thu, 22 Jan 2009 03:50:44 -0800 (PST)

 
That is correct Alex.
We get around these issues partially by using outbound_connect_timeout in the 
sqlnet.ora
of the mid tiers. (Not sure what is your client version ) We have a value of 3 
seconds for OCT.
 
So if we take an example of the following connect string

xxxx =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (LOAD_BALANCE = OFF)
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby1-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby2-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby3-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby4-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby5-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby6-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby7-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby8-vip.sys.au.eds.com)(PORT = 
1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = prim1-vip.sys.au.eds.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = xxxx.commsec.com.au)
    )
  )
 
I have set load_balance = OFF so that we traverse through all the standby nodes 
which when down
(in this case fictitious nodes)  and with OCT set to 3 it takes around 12 
seconds to establish a connection from a Solaris 10.2.0.3 client.
 
The other benefit of having all nodes in the mid tier is that we did not have 
to change the tnsnames.ora each time did a switchover. We have done 8 
switchovers over the past 3 months 
 
Hope that makes sense.. The rest over some beer







Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
 

--- On Tue, 20/1/09, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:


From: Alex Gorbachev <ag@xxxxxxxxxxxx>
Subject: Re: data guard fast start failover
To: fairlie_r@xxxxxxxxx
Cc: "ORACLE-L Freelists" <oracle-l@xxxxxxxxxxxxx>
Received: Tuesday, 20 January, 2009, 11:26 AM


There are two issues - one is WebLogic specific as they have their own 
connection management with multi-pools for DataGuard. (don't ask - they are 
working on integration with FAN and RLB but that's not available yet).


The 2nd issue is generic - and with introduction of Oracle Clusterware, Oracle 
solved is with VIP's. The problem is that when IP is not available, the 
connection times out after a while. This is why VIP's are taken over by 
survived nodes in RAC but I don't need to explain that to you. However, Data 
Guard standby does not take over VIP's when it's promoted to primary. This 
means that application connection to VIP's of old primary (now unavailable if 
site are down or hosts a down) will take a while to timeout. If client side 
Load Balancing is ON between standby and primary address_list's (in rare cases 
when there is not real DR and people switch between sites regularly) then about 
50% of connection requests will timeout after a minute or two whatever your 
tcp_timeout setting in apps tier. If you configure your descriptor without load 
balance option between primary and standby address lists but only with failover 
then 100% of re-connects will be delayed.


Fairlie, please correct what I've got wrong here.


Cheers,
Alex



On 20/01/2009, at 9:43 AM, fairlie rego wrote:






You have a connection to the each node in RAC but how you handle connections to 
standby?
 
Alex,
 
In the environment I am currently working on (2 8 node clusters in DG config)  
we have both the primary and standby clusters node virtual IPs in the 
tnsnames.ora (16 nodes) . 
 
The application connects to RAC services which run only on the primary cluster. 
Upon switchover/failure the db_role_change trigger fires which starts the 
services on the standby nodes. Ofcourse it is a pain that dbms_service does not 
update the OCR but let me not digress....
 
Am just curious as to why this may not work for you

Thanks
 





Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
 

--- On Mon, 19/1/09, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:


From: Alex Gorbachev <ag@xxxxxxxxxxxx>
Subject: Re: data guard fast start failover
To: "Mark Strickland" <strickland.mark@xxxxxxxxx>
Cc: Laimutis.Nedzinskas@xxxxxx, oracle-l@xxxxxxxxxxxxx
Received: Monday, 19 January, 2009, 9:58 AM


Thanks Mark, 


What about Data Guard now? You have a connection to the each node in RAC but 
how you handle connections to standby?
On one project I'm working on now, with RAC on primary and RAC on standby, we 
plan to setup multi-pool controlling underlying pools for each instance on 
primary *AND* standby. Theoretically, WebLogic multi-pool with load balancing 
will not send transactions to the "broken" pools but in the past we didn't have 
good experience with that.
Another issue is the failover time - VIP's are not taken over by standby on 
role switch and, of course, connection timeout takes long time so if it's 60 
seconds for you, is your OS setting for tcp_timeout 60 seconds?


Anybody attempted to do automation of VIP management integrating it with 
Observer and FSFO?


Cheers,
Alex



On 19/01/2009, at 9:17 AM, Mark Strickland wrote:

I'll find out more from our WebLogic SME, but we're using WebLogic multi-pools 
(multi-datasources?), ie each server running WebLogic has three connection 
pools -- one for each of the RAC instances.  The connections do re-connect 
automatically after failover.  We're finding that it takes 60-90 seconds for 
failover and reconnect.  I believe that we are using WebLogic XA transactions 
but I'll verify.

-Mark



On Sun, Jan 18, 2009 at 1:49 PM, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:


Hi Mark, 


Could you elaborate on WebLogic config you are using for RAC?
- Is it configured using WebLogic multi-datasources?
- Do you use WebLogic XA transactions? Does WebLogic datasource re-tries 
transaction on reconnect?
- What are the patched you mentioned (perhaps, you have the reference to the 
WebLogic support docs)?


Cheers,
Alex






On 17/01/2009, at 8:52 AM, Mark Strickland wrote:

We've been testing FSF with 10.2.0.2 and my co-DBA discovered a bug that can 
cause a split-brain to occur.  I don't remember the exact circumstances, but 
the fix is in 10.2.0.4 which is driving us to apply that patchset.  Our FSF 
testing with 10.2.0.4 has been going very well.  If you use WebLogic, it will 
handle a failover but it requires a patch depending on what version you use.  
I've been doing new 10.2.0.4 builds with RAC and Data Guard with FSF for a new 
customer.  No issues so far.

Mark
Seattle, WA



On Thu, Jan 15, 2009 at 11:27 PM, <Laimutis.Nedzinskas@xxxxxx> wrote:


Hi all

Anyone's using data guard fast-start failover ?
What are the experiences ?
What about split brain?
Does it interfere heavily with normal database activities?
Any other comments?

Thank you in advance,

Laimis N

--
//www.freelists.org/webpage/oracle-l









Stay connected to the people that matter most with a smarter inbox. Take a look.



      Stay connected to the people that matter most with a smarter inbox. Take 
a look http://au.docs.yahoo.com/mail/smarterinbox

Other related posts: