12170/ORA-12535/12537

From: Kellyn Pedersen <kjped1313@xxxxxxxxx>
To: oracle Freelists <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 13 Oct 2010 12:45:13 -0700 (PDT)
I am officially stuck and looking for some help, (please, please, please... )

We have specific configurations in our sqlnet.ora files on all our database 
servers that have been present since the inception of Oracle at my company-
SQLNET.EXPIRE_TIME =10
SQLNET.INBOUND_CONNECT_TIMEOUT = 300
INBOUND_CONNECT_TIMEOUT_LISTENER=300
SQLNET.SEND_TIMEOUT = 300
SQLNET.RECV_TIMEOUT = 300

About five weeks ago, People started to complain about disconnects, (errors 
codes seen in the subject line above...) from two of our main dblinked 
databases 
and one of our duplicates failed due to 12170 errors, both in the 10.2.0.4 and 
11.1.0.7 databases.  I dug in and ended up having to put in the oddest values 
to 
get the disconnects to stop in the sqlnet.log and  log.xml, along with making 
the users content with no more failures.  It took another turn for the worse 
this last week and I had to "tweak" the numbers just a bit more to get a 
duplicate to complete from one production server to another.  Here is the 
current configuration for the SQLNET.ORA files:
SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
SQLNET.EXPIRE_TIME =100000000
SQLNET.INBOUND_CONNECT_TIMEOUT = 300000
SQLNET.SEND_TIMEOUT = 300000000
SQLNET.RECV_TIMEOUT = 300000000
DEFAULT_SDU_SIZE=8832
INBOUND_CONNECT_TIMEOUT_LISTENER=0

OK, we can all be honest here, the numbers I've used are outrageous, but they 
are the only thing that's stopped system processes from failing, where for the 
first week, it was 24X7 with complaints... and we all know the database is 
guilty until proven innocent!

I have tested just about every app we have, traced back every disconnect, gone 
through every log back to user and even used my own connections as guinea pigs, 
testing out each parameter and each value with different system processes to 
come up with the final values.

Since putting this into place, we continue to have one or two disconnects at 
the 
client side in the office per day.  The disconnects do not calculate to what I 
have in the timeouts, which makes me wonder if I'm just fighting a losing 
battle 
here.  It's not consistent across the client base, ether.  I have only two 
developers that are losing connectivity consistently,from PL/SQL Developer, 
we're talking just a minute or so after they have gone inactive and many of 
them 
do not show up in the logs at all.  They are using the network client TNS and 
SQLNet files, so they are using the files configured to match what I have on 
the 
server, nothing local that could be tripping them up.  I myself was 
disconnected 
from every SSH session I had open just yesterday morning and no one seems to 
understand *HOW* it happened or what went wrong, but that they were putty 
sessions and that this has happened to these two users for their diconnects 
from 
time to time, (they do not use Putty sessions often...) makes me wonder some 
more...

The network guys are saying they have changed nothing, but we just had a 
multi-server MySQL farm go in 5 weeks ago and the way we move and how much data 
we move, I have a hard time believing someone didn't sneak something in on 
them...


Anybody have any ideas or recommendations?   I'm pulling my hair out here and 
honestly, other than a small change recommended by Oracle, personally I've 
always felt that this type of tweaking was either a problem with the network or 
a problem with code that needed to be tuned so the waits were not so severe to 
cause connectivity loss...

Kellyn Pedersen
Sr. Database Administrator
I-Behavior Inc.
http://www.linkedin.com/in/kellynpedersen
www.dbakevlar.blogspot.com
 
"Go away before I replace you with a very small and efficient shell script..."
Follow-Ups:
- RE: 12170/ORA-12535/12537
  - From: Storey, Robert \(DCSO\)
- RE: 12170/ORA-12535/12537
  - From: rajendra.pande
- Re: 12170/ORA-12535/12537
  - From: Andreas Piesk
12170/ORA-12535/12537

Other related posts: