RE: 10046 L12 - Very long ela for 'SQL*Net message from client'?

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: Rich <richa03@xxxxxxxxx>
Date: Wed, 24 Sep 2014 20:27:04 +0000

Is it reproducible ? If I had to guess I'd look at network routing - e.g.  did 
the session start trying to use the wrong network address on each fetch, 
timeout, and fall back to the correct address ?


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: Rich [richa03@xxxxxxxxx]
Sent: 24 September 2014 21:23
To: Jonathan Lewis
Cc: Oracle-L Freelists
Subject: Re: 10046 L12 - Very long ela for 'SQL*Net message from client'?

Thanks, Matthew and Jonathan.

The autorace and 10046 are from separate sessions - I let the AT run overnight 
and killed the 10046 part way through.

The client (for both) is me on the DB machine at a SQL Plus prompt with a 
dedicated connection.

I'm starting to suspect the OS...but where?


On Wed, Sep 24, 2014 at 1:03 PM, Jonathan Lewis 
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:

Something's not consistent here.

You have an array size of 15, a wait of 16 seconds between fetches, and 207,906 
rows fetched - which should total about 2.5 days to complete !

How many of the waits were about 16 seconds, did they all appear in a single 
batch.

When you ran autotrace was the client elapsed time consistent with the server 
trace time ? What did the tkprof output give as the "time in the trace file".


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx> 
[oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>] on behalf 
of Rich [richa03@xxxxxxxxx<mailto:richa03@xxxxxxxxx>]
Sent: 24 September 2014 17:38
To: Oracle-L Freelists
Subject: Fwd: 10046 L12 - Very long ela for 'SQL*Net message from client'?


Hi List,
This is 11.1.0.7 on Oracle Linux 5 U3 (Enterprise Linux Server release 5.3 
(Carthage) (kernel 2.6.18-128.el5))

rpm -qf /etc/redhat-release
enterprise-release-5-0.0.17

(I know the version is “very” old, but that’s where we are right now.)

I seem to remember a post by Tanel (or was it Kyle?) a while back about 
this/these types of waits (SQL*Net), however, my Google fu is not working for 
me right now.

Snip from 10046 level 12 trace from a dedicated SQL*Plus session on the DB 
server:
…
*** 2014-09-24 08:03:20.151
WAIT #2: nam='SQL*Net message from client' ela= 16701919 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571000151138
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571000151302
FETCH 
#2:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571000143712

*** 2014-09-24 08:03:36.853
WAIT #2: nam='SQL*Net message from client' ela= 16702053 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571016853884
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571016854041
FETCH 
#2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571016838765

*** 2014-09-24 08:03:53.541
WAIT #2: nam='SQL*Net message from client' ela= 16686602 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571033541263
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571033541424
FETCH 
#2:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571033533874
…

From client is consistently ~16sec!

Autotrace of the statement looks like:
…
207963 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 133634354

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes 
|TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |  5759 |  2991K|     
  | 16679   (1)| 00:03:21 |
|*  1 |  TABLE ACCESS FULL            | PFM_AUDIT_LOG_HDR |     1 |    22 |     
  |   309   (1)| 00:00:04 |
|   2 |  SORT ORDER BY                |                   |  5759 |  2991K|  
9224K| 16679   (1)| 00:03:21 |
|*  3 |   HASH JOIN                   |                   |  5759 |  2991K|     
  | 16026   (1)| 00:03:13 |
|*  4 |    INDEX FAST FULL SCAN       | PFM_COMPANY_AI1   |   331 | 11916 |     
  |    60   (2)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| PFM_POS           | 17428 |  8441K|     
  | 15965   (1)| 00:03:12 |
|*  6 |     INDEX RANGE SCAN          | PFM_POS_AI1       | 18366 |       |     
  |  8013   (1)| 00:01:37 |

-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MODEL_TYPE_ID"='POS' AND TO_NUMBER("DOCUMENT_OID")=:B1)
   3 - access("POS"."COMPANY_ENSOID"="VER_ENSOID")
   4 - filter(UPPER("NAME") LIKE 'SOME STRING' ESCAPE  AND "VER_LATEST_VEROID" 
IS NOT NULL)
   5 - filter("POS"."TRANS_SUBTYPE_ID"='4')
   6 - access("POS"."TRANS_END_DATE">=TO_DATE(' 2011-04-01 00:00:00', 
'syyyy-mm-dd hh24:mi:ss')
              AND "POS"."TRANS_END_DATE"<=TO_DATE(' 2014-03-31 23:59:59', 
'syyyy-mm-dd hh24:mi:ss'))
       filter(UPPER(TO_CHAR("POS"."HIDDEN")) LIKE '0' ESCAPE  AND 
"POS"."VER_LATEST_VEROID" IS NOT NULL)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    264266641  consistent gets – I know it might be better to just FTS the POS 
table rather than use the index, however, I’m trying to understand why the long 
‘SQL*Net message from client’ waits
          0  physical reads
          0  redo size
   45412064  bytes sent via SQL*Net to client – this is “only” ~43MB
     157869  bytes received via SQL*Net from client - ~150KB
      13866  SQL*Net roundtrips to/from client – might be able to reduce this 
with arraysize (currently 15) and/or SDU/TDU/IO Buffers (currently default 
AFAIK), but, still why the long ‘SQL*Net message from client’ waits

          1  sorts (memory)
          0  sorts (disk)

     207963  rows processed

The POC table is very wide (533 columns with type spread of 237 number, 17 date 
and 213 varchar2 – no LOB) - AVG_ROW_LEN is 496 from DBA_TABLES (last analyzed 
15SEP2014).
The PFM_AUDIT_LOG_HDR table is very small (1126 blocks from DBA_TABLES – last 
analyzed 22SEP2014).

We could do a SQL Net trace, however, on this version, I think it’d be  have to 
be for all traffic which is not doable in the current environment…
Also, there’s no strace on production (where I am right now).
Might be able to try to repro in DEV, however, it might be difficult to get 
time from them…

Any help would be greatly appreciated.

Thanks,
Rich
References:
- Fwd: 10046 L12 - Very long ela for 'SQL*Net message from client'?
  - From: Rich
- RE: 10046 L12 - Very long ela for 'SQL*Net message from client'?
  - From: Jonathan Lewis
- Re: 10046 L12 - Very long ela for 'SQL*Net message from client'?
  - From: Rich
RE: 10046 L12 - Very long ela for 'SQL*Net message from client'?

Other related posts: