The network guys are a bit "gun shy" right now after their network being targeted for quiet some time now. Due to this, I am not getting any information on what changes might have been done to the network and I have no visibility into it myself... :( Kellyn ________________________________ From: "rajendra.pande@xxxxxxx" <rajendra.pande@xxxxxxx> To: kjped1313@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx Sent: Wed, October 13, 2010 2:11:47 PM Subject: RE: 12170/ORA-12535/12537 Everything that I read here – to me indicates a network item. Have they introduced anything that has a timeout feature – powerbroker, any other pam module But that still does not explain these parameters and the values But then I am not fully conversant with how these work really to affect any timeouts ________________________________ From:oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Kellyn Pedersen Sent: Wednesday, October 13, 2010 3:45 PM To: oracle Freelists Subject: 12170/ORA-12535/12537 I am officially stuck and looking for some help, (please, please, please... ) We have specific configurations in our sqlnet.ora files on all our database servers that have been present since the inception of Oracle at my company- SQLNET.EXPIRE_TIME =10 SQLNET.INBOUND_CONNECT_TIMEOUT = 300 INBOUND_CONNECT_TIMEOUT_LISTENER=300 SQLNET.SEND_TIMEOUT = 300 SQLNET.RECV_TIMEOUT = 300 About five weeks ago, People started to complain about disconnects, (errors codes seen in the subject line above...) from two of our main dblinked databases and one of our duplicates failed due to 12170 errors, both in the 10.2.0.4 and 188.8.131.52 databases. I dug in and ended up having to put in the oddest values to get the disconnects to stop in the sqlnet.log and log.xml, along with making the users content with no more failures. It took another turn for the worse this last week and I had to "tweak" the numbers just a bit more to get a duplicate to complete from one production server to another. Here is the current configuration for the SQLNET.ORA files: SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF SQLNET.EXPIRE_TIME =100000000 SQLNET.INBOUND_CONNECT_TIMEOUT = 300000 SQLNET.SEND_TIMEOUT = 300000000 SQLNET.RECV_TIMEOUT = 300000000 DEFAULT_SDU_SIZE=8832 INBOUND_CONNECT_TIMEOUT_LISTENER=0 OK, we can all be honest here, the numbers I've used are outrageous, but they are the only thing that's stopped system processes from failing, where for the first week, it was 24X7 with complaints... and we all know the database is guilty until proven innocent! I have tested just about every app we have, traced back every disconnect, gone through every log back to user and even used my own connections as guinea pigs, testing out each parameter and each value with different system processes to come up with the final values. Since putting this into place, we continue to have one or two disconnects at the client side in the office per day. The disconnects do not calculate to what I have in the timeouts, which makes me wonder if I'm just fighting a losing battle here. It's not consistent across the client base, ether. I have only two developers that are losing connectivity consistently,from PL/SQL Developer, we're talking just a minute or so after they have gone inactive and many of them do not show up in the logs at all. They are using the network client TNS and SQLNet files, so they are using the files configured to match what I have on the server, nothing local that could be tripping them up. I myself was disconnected from every SSH session I had open just yesterday morning and no one seems to understand *HOW* it happened or what went wrong, but that they were putty sessions and that this has happened to these two users for their diconnects from time to time, (they do not use Putty sessions often...) makes me wonder some more... The network guys are saying they have changed nothing, but we just had a multi-server MySQL farm go in 5 weeks ago and the way we move and how much data we move, I have a hard time believing someone didn't sneak something in on them... Anybody have any ideas or recommendations? I'm pulling my hair out here and honestly, other than a small change recommended by Oracle, personally I've always felt that this type of tweaking was either a problem with the network or a problem with code that needed to be tuned so the waits were not so severe to cause connectivity loss... Kellyn Pedersen Sr. Database Administrator I-Behavior Inc. http://www.linkedin.com/in/kellynpedersen www.dbakevlar.blogspot.com "Go away before I replace you with a very small and efficient shell script..."