RE: DCD and TCP timeout

  • From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • To: "riyaj.shamsudeen@xxxxxxxxx" <riyaj.shamsudeen@xxxxxxxxx>
  • Date: Wed, 13 Nov 2013 00:06:08 +0000

Thanks Riyaj.
This whole investigation started while doing destructive testing for the ERP 
Concurrent Managers. There are two VM servers in the Concurrent Processing (CP) 
tier and they are configured in an active/passive manner. When the node where 
the Internal Concurrent Manager (ICM) was running was shutdown, the ICM and the 
other managers would not failover in a timely manner. Further investigation 
showed that connections of those managers were still reported as active in the 
V$SESSION view. Those connections started to clean up in about 15-18 minutes 
and that is when the ICM started to failover to its secondary node followed by 
the other managers.
I see the following in note Performance problem with Oracle*Net Failover when 
TCP Network down (no IP address) (Doc ID 249213.1)

net.ipv4.tcp_keepalive_time 3000
net.ipv4.tcp_retries 5
net.ipv4.tcp_syn_retries 1

Which is supposed to reduce the timeout period to about 20 seconds.

Do you have any suggestions on the above settings?

Thanks again.
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Riyaj Shamsudeen
Sent: Tuesday, November 12, 2013 6:36 PM
To: Hameed, Amir
Cc: Oracle List List
Subject: Re: DCD and TCP timeout

Amir
  Setting expire_time to 1, will send a SQLNET packet every minute ( so a 
TCP/IP probe is sent every minute). In normal conditions, a TCP ACK will be 
received immediately.

 But, if the client forcefully dies or kills the connections, then TCP 
retransmission code kicks-in for the unacknowledged TCP/IP transmission. The 
tcp_retries2 kernel parameter controls the behavior of retransmission, in 
LINUX. In a connection ESTABLISED state, TCP/IP retransmits 15 times (default 
value of tcp_retries2 kernel parameter) , with an exponential backoff for TCP 
retransmission interval, before raising an alarm to the application. Read the 
link below, and I think, I had similar results as your test case, last time I 
performed the test. This behavior is only applicable to LINUX.

  Do you really care to change the TCP level parameters? If yes, you can reduce 
the tcp_retries2 parameter to a value >8 and test it.( Please let me know if 
you still see a different behavior after the adjustment). If that isn't enough, 
 tcp_keepalive_time can be reduced to 10 minutes, but it can increase network 
traffic marginally (one tcp keep alive parameter every 10 minutes to all alive 
TCP/IP sockets).

 Read : 
http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux

 HTH


Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals -  http://www.orainternals.com<http://www.orainternals.com/> - 
Specialists in Performance, RAC and EBS
Blog: http://orainternals.wordpress.com/
Oracle ACE Director and OakTable member<http://www.oaktable.com/>

Co-author of the books: Expert Oracle 
Practices<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL, 
<http://tinyurl.com/ahpvms8> Expert RAC Practices 
12c.<http://tinyurl.com/expert-rac-12c> Expert PL/SQL 
practices<http://tinyurl.com/book-expert-plsql-practices>


Other related posts: