RE: 10046 L12 - Very long ela for 'SQL*Net message from client'?

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: Rich <richa03@xxxxxxxxx>
  • Date: Wed, 24 Sep 2014 20:27:04 +0000


Is it reproducible ? If I had to guess I'd look at network routing - e.g.  did 
the session start trying to use the wrong network address on each fetch, 
timeout, and fall back to the correct address ?


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: Rich [richa03@xxxxxxxxx]
Sent: 24 September 2014 21:23
To: Jonathan Lewis
Cc: Oracle-L Freelists
Subject: Re: 10046 L12 - Very long ela for 'SQL*Net message from client'?

Thanks, Matthew and Jonathan.

The autorace and 10046 are from separate sessions - I let the AT run overnight 
and killed the 10046 part way through.

The client (for both) is me on the DB machine at a SQL Plus prompt with a 
dedicated connection.

I'm starting to suspect the OS...but where?


On Wed, Sep 24, 2014 at 1:03 PM, Jonathan Lewis 
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:

Something's not consistent here.

You have an array size of 15, a wait of 16 seconds between fetches, and 207,906 
rows fetched - which should total about 2.5 days to complete !

How many of the waits were about 16 seconds, did they all appear in a single 
batch.

When you ran autotrace was the client elapsed time consistent with the server 
trace time ? What did the tkprof output give as the "time in the trace file".


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx> 
[oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>] on behalf 
of Rich [richa03@xxxxxxxxx<mailto:richa03@xxxxxxxxx>]
Sent: 24 September 2014 17:38
To: Oracle-L Freelists
Subject: Fwd: 10046 L12 - Very long ela for 'SQL*Net message from client'?


Hi List,
This is 11.1.0.7 on Oracle Linux 5 U3 (Enterprise Linux Server release 5.3 
(Carthage) (kernel 2.6.18-128.el5))

rpm -qf /etc/redhat-release
enterprise-release-5-0.0.17

(I know the version is “very” old, but that’s where we are right now.)

I seem to remember a post by Tanel (or was it Kyle?) a while back about 
this/these types of waits (SQL*Net), however, my Google fu is not working for 
me right now.

Snip from 10046 level 12 trace from a dedicated SQL*Plus session on the DB 
server:
…
*** 2014-09-24 08:03:20.151
WAIT #2: nam='SQL*Net message from client' ela= 16701919 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571000151138
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571000151302
FETCH 
#2:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571000143712

*** 2014-09-24 08:03:36.853
WAIT #2: nam='SQL*Net message from client' ela= 16702053 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571016853884
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571016854041
FETCH 
#2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571016838765

*** 2014-09-24 08:03:53.541
WAIT #2: nam='SQL*Net message from client' ela= 16686602 driver id=1650815232 
#bytes=1 p3=0 obj#=1237502 tim=1411571033541263
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 
p3=0 obj#=1237502 tim=1411571033541424
FETCH 
#2:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,plh=133634354,tim=1411571033533874
…

From client is consistently ~16sec!

Autotrace of the statement looks like:
…
207963 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 133634354

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes 
|TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |  5759 |  2991K|     
  | 16679   (1)| 00:03:21 |
|*  1 |  TABLE ACCESS FULL            | PFM_AUDIT_LOG_HDR |     1 |    22 |     
  |   309   (1)| 00:00:04 |
|   2 |  SORT ORDER BY                |                   |  5759 |  2991K|  
9224K| 16679   (1)| 00:03:21 |
|*  3 |   HASH JOIN                   |                   |  5759 |  2991K|     
  | 16026   (1)| 00:03:13 |
|*  4 |    INDEX FAST FULL SCAN       | PFM_COMPANY_AI1   |   331 | 11916 |     
  |    60   (2)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| PFM_POS           | 17428 |  8441K|     
  | 15965   (1)| 00:03:12 |
|*  6 |     INDEX RANGE SCAN          | PFM_POS_AI1       | 18366 |       |     
  |  8013   (1)| 00:01:37 |

-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MODEL_TYPE_ID"='POS' AND TO_NUMBER("DOCUMENT_OID")=:B1)
   3 - access("POS"."COMPANY_ENSOID"="VER_ENSOID")
   4 - filter(UPPER("NAME") LIKE 'SOME STRING' ESCAPE  AND "VER_LATEST_VEROID" 
IS NOT NULL)
   5 - filter("POS"."TRANS_SUBTYPE_ID"='4')
   6 - access("POS"."TRANS_END_DATE">=TO_DATE(' 2011-04-01 00:00:00', 
'syyyy-mm-dd hh24:mi:ss')
              AND "POS"."TRANS_END_DATE"<=TO_DATE(' 2014-03-31 23:59:59', 
'syyyy-mm-dd hh24:mi:ss'))
       filter(UPPER(TO_CHAR("POS"."HIDDEN")) LIKE '0' ESCAPE  AND 
"POS"."VER_LATEST_VEROID" IS NOT NULL)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    264266641  consistent gets – I know it might be better to just FTS the POS 
table rather than use the index, however, I’m trying to understand why the long 
‘SQL*Net message from client’ waits
          0  physical reads
          0  redo size
   45412064  bytes sent via SQL*Net to client – this is “only” ~43MB
     157869  bytes received via SQL*Net from client - ~150KB
      13866  SQL*Net roundtrips to/from client – might be able to reduce this 
with arraysize (currently 15) and/or SDU/TDU/IO Buffers (currently default 
AFAIK), but, still why the long ‘SQL*Net message from client’ waits

          1  sorts (memory)
          0  sorts (disk)

     207963  rows processed

The POC table is very wide (533 columns with type spread of 237 number, 17 date 
and 213 varchar2 – no LOB) - AVG_ROW_LEN is 496 from DBA_TABLES (last analyzed 
15SEP2014).
The PFM_AUDIT_LOG_HDR table is very small (1126 blocks from DBA_TABLES – last 
analyzed 22SEP2014).

We could do a SQL Net trace, however, on this version, I think it’d be  have to 
be for all traffic which is not doable in the current environment…
Also, there’s no strace on production (where I am right now).
Might be able to try to repro in DEV, however, it might be difficult to get 
time from them…

Any help would be greatly appreciated.

Thanks,
Rich

Other related posts: