Re: Client and Shadow Process stuck communicating in Network Layer

  • From: Patrick Jolliffe <jolliffe@xxxxxxxxx>
  • To: Stefan Koehler <contact@xxxxxxxx>
  • Date: Fri, 26 Jun 2015 14:07:06 +0800

Thanks for the pointers Stefan. I have checked, and this particular host
is not using VIO.
Host OS is 6.1.8.15, and which believe contains fixes for all three.
IV31011 is particularly interesting as connection IS loopback.

We are actually have slightly similar, but different problem right now with
a DB Link from Linux to AIX.
I can see AIX side is stuck in write, Linux is stuck in read. (Note this
AIX host DOES use VIO)
Interesting to note from "netstat -an" command mentioned in one of the
links you sent, I can see the data is sitting in "Send Q" buffer.
To me proves that this is OS Level/Networking issue, rather than
application/DB level.
At least that is some evidence I can use to direct our efforts,
Regards
Patrick


On 25 June 2015 at 19:14, Stefan Koehler <contact@xxxxxxxx> wrote:

Hi Patrick,
"OS is AIX 6.1" and the call stack of the Oracle shadow process in read()
gets my attention as some of my clients hit this issue several times :)

Do you use VIO by chance? If yes (and even if not, you still can hit the
LPAR OS issue - see last APAR), then you may have hit a nice "well-known" OS
bug, which is/was pushed through varios AIX / VIO levels.

http://www-01.ibm.com/support/docview.wss?uid=isg1IZ59298
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ96155
https://www-304.ibm.com/support/docview.wss?uid=isg1IV20656

Please don't be confused by the AIX version level. If you follow the
sysrouted APARs, you will find the fix for AIX 6.1 as well.

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher
Homepage: http://www.soocs.de
Twitter: @OracleSK


Patrick Jolliffe <jolliffe@xxxxxxxxx> hat am 25. Juni 2015 um 12:57
geschrieben:

We have been getting a very occasional problem with third party
application, where client process and Oracle shadow process seem to hang
both
waiting on read in Oracle network layer.
Database is 11.2.0.4, and OS is AIX 6.1, although this has persisted
through various versions of database (and application).
Interesting that always seems to be within similar call-stack on
application side (BatchReviseOnExit).
Leads me to suspect may be some kind of memory corruption on the client
application side, but nothing I can identify from the source we have access
to.
Have spend some time with support, but was just getting bounced between
application side and database side (even though sides within Oracle,
application is JDEdwards).
Wonder if anybody has any ideas about what I can do either
pre-emptively to gather more information when it happens, or when
application gets into
this state.
I have pasted output from v$session and process stacks below.
Patrick

select process, spid, state, status, event, paddr From v$session s,
v$process p where p.addr = s.paddr and sid = 982

55575914/28050178/WAITING/INACTIVE/SQL*Net message from
client/07000106C51BC9B0

[oracle@jdelogichk:/home/oracle]$ procstack 28050178
28050178: oracleJDE (LOCAL=NO)
0x090000000002dc94 read(??, ??, ??) + 0x274
0x00000001009e63d4 ntusfprd(0x57b, 0x110a0cf16, 0xfffffffffff9050,
0x2822484100000020, 0x1003c5bb0) + 0x54
0x0000000100a7d01c nsbasic_brc(??, ??, ??, ??) + 0x45c
0x0000000100a7f3e0 nsbrecv(??, ??, ??, ??) + 0x80
0x00000001018ae3c8 nioqrc(??, ??, ??, ??, ??) + 0x4448
0x0000000108a80228 opikndf2(??, ??, ??, ??) + 0x7e8
0x0000000108a528f8 opitsk(??, ??) + 0x318
0x0000000108a820cc opiino(??, ??, ??) + 0x3ac
0x0000000108a559ac opiodr(??, ??, ??, ??) + 0x38c
0x0000000108a959ec opidrv(??, ??, ??) + 0x46c
0x0000000108a8b4c8 sou2o(??, ??, ??, ??) + 0x88
0x0000000100000a10 opimai_real(??, ??) + 0x230
0x00000001000f7494 ssthrdmain(??, ??) + 0x114
0x000000010000064c main(??, ??) + 0xcc
0x0000000100000340 _text() + 0x70

[jdespxx@jdelogichk:/home/jdespxx]$ procstack 55575914
55575914: jdenet_k 6209
0xd0121548 read(??, ??, ??) + 0x268
0xd6ab3ff0 ntusfprd(??, ??, ??, ??, ??) + 0x50
0xd6b4ae50 nsbasic_brc(??, ??, ??, ??) + 0x550
0xd6b4cee0 nsbrecv(??, ??, ??, ??) + 0xa0
0xd77bdf5c nioqrc(??, ??, ??, ??, ??) + 0x1bbc
0xd65c6908 ttcdrv(??, ??) + 0x408
0xd77dab2c nioqwa(??, ??, ??, ??, ??, ??) + 0x4c
0xd66066a0 upirtrc(0x6, 0x24262888, 0x24ed1e5c, 0x24ed1f7c,
0x24ed2cbc, 0xf1eb2d74, 0x24ed34bc, 0x24260d50) + 0x740
0xd6dd7f28 kpurcsc(??, ??, ??, ??, ??, ??, ??, ??) + 0x68
0xd6e69f88 kpuexec(??, ??, ??, ??, ??, ??, ??, ??) + 0x2388
0xd5e6b618 OCIStmtExecute(??, ??, ??, ??, ??, ??, ??, ??) + 0x18
0xd79e7118 BFOCIStmtExecute(0x23c0ffa4, 0x24260c50, 0x24262888, 0x1,
0x0, 0x0, 0x0, 0x20) + 0x4c
0xd79fd7b0 performRequestInternal(0x25033f38, 0x1) + 0x110
0xd79fdd68 dballPerformRequest(0x25033f38) + 0xfc
0xd79fddc4 DBPerformRequest(0x25033f38) + 0x14
0xd3ad1ee4 JDB_DBPerformRequest(0x21ffbfa8, 0x25033f38, 0x25052798) +
0x40
0xd3d57b54 TM_DBPerformRequest(0x20ffad18, 0x20ff4cb8, 0x25052798,
0x24a3da38) + 0x290
0xd3ab2b64 DeleteTable(0x24a3da38, 0x2ff206bc, 0x0, 0x1, 0x2ff206dc,
0x20002, 0x1, 0x0) + 0x184c
0xd3ab3e2c JDB_DeleteTable(0x24a3da38, 0x2ff206bc, 0x0, 0x1,
0x2ff206dc, 0x20002) + 0xb0
0x206ccc74 BatchReviseOnExit() + 0x7f4
0xd1335ae4 jdeCallObject(0x2228fe78, 0x0, 0x2426be08, 0x242e14c8,
0x24f784d8, 0x0, 0x0, 0x2228feb8) + 0x2420
0xd3b750b4 JDEK_ProcessCallRequest(0x190bd1, 0xac1508fc, 0x0,
0x2228fc58, 0x23676c28, 0x20ff4cb8) + 0xce4
0xd3b75b64 JDEK_StartCallRequest(0x190bd1, 0xac1508fc, 0x0,
0x2228fc58) + 0x46c
0xd3b5a9a8 JDEK_DispatchCallObjectMessage(0x190bd1, 0xac1508fc, 0x0,
0x2228fc58, 0x0, 0x3850385, 0x0) + 0x4cc
0xd7f48bb4 XMLCallObjectDispatch() + 0xd0
0xd11c03ac callDispatchFunction(0x5, 0x190bd1, 0xac1508fc, 0x0,
0x2228fc58, 0x0, 0x385, 0x8000) + 0x52c
0xd11c05f0 kernelMsgThread(0x2228fc18) + 0x1c0
0xd11c200c processKernelQueueMsg(0x2228fc18) + 0x14
0xd11b0730 processKernelQueue() + 0x49c
0xd11a5798 JDENET_RunKernel(0x2ff22621) + 0x188
0x10001f70 main(0x2, 0x2ff22568) + 0x290
0x100001c0 __start() + 0x98

Other related posts: