Re: Fw: sql with SQL_OPCODE=0

From: Denis <denis.sun@xxxxxxxxx>
To: Karl Arao <karlarao@xxxxxxxxx>, oracle-l@xxxxxxxxxxxxx
Date: Wed, 1 Dec 2010 07:59:39 -0800 (PST)
Karl,

Thanks for your analysis and scripts - I will certainly save them for furture 
use.

(1) blog post was for a then-9i db two years ago
(2) ASH was for another db having similar problem on  the past Monday after 
adding datafile.

More env info:

DB (1)  9.2.0.8  Sun Solaris 9   16 CPU   VSU SAN storage  (two years ago
     ( a same incident occurred month ago when this db was 10.2.0.4 and OS was 
10)

      CPU util 60% idle typically

DB (2) 10.2.0.4 Sun Solaris 10  16 CPU util 70-80% idle typically. VSU SAN 
storage

I am afraid even I correlate the slowness with high I/O latency or any OS data, 
this is just a symptom, I still can not explain or understand why add datafile 
caused hugh slow. I can not believe our I/O system is at the edge of crash so 
that an add datafile operation  really crashes it. 


For the database (1) when we add datafile to those tablespace not hosting lob 
segment during buzy hours, no problem.  For database (2)  not always adding 
datafile cause problem, just this monday it happened with similar symptom as 
that database (1).  I dare to say similar symptom becase I confirmed that top 
sql  in the ASH is doning some kind of lob operation. 


Denis
 
Hi Denis,  

________________________________

From: Karl Arao <karlarao@xxxxxxxxx>
To: denis.sun@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Sent: Wed, December 1, 2010 9:55:29 AM
Subject: Re: Fw: sql with SQL_OPCODE=0


Interesting.. from your last message you mentioned that you were 

- adding datafile
- you have many concurrent sessions doing lob access for insert/update
- and then experienced system slowdown with lob operation

And then I saw your first post regarding the SQL_ID 6s8fdgnw2u49h having high 
"CPU + Wait for CPU" and from my experience once you started encountering this 
event you are likely to have high run queue that could be attributed by slow IO 
(high wait IO%) or simply because of CPU bottleneck. 

Looking on your blog post, you have drilled down on the IO stats and the most 
relevant would be getting the latency numbers.. so the formula would be 
 latency(ms)=(readtim/phyreads)*10

and here's what I've derived from your data

Time       latency (ms)
13:40:0237.14
13:50:0262.00
14:00:042.00
14:10:0128.89
14:20:031.43
14:30:0327.14
14:40:0267.14
14:50:0230.00
15:00:0463.33
15:10:031428.57
15:20:101676.67
15:30:307455.00<-- probably the peak with 7sec latency
15:40:35755.00
15:50:173080.00
16:00:46446.67
16:10:128.57
16:20:0513.75<-- ASH 
16:30:0545.26
16:40:038.18
16:50:018.75
17:00:0416.25

You sent the ASH for 16:20-16:30... the interesting periods would be 15:10:03 
to 16:00:46. 
I suggest you also correlate this with your OS performance data probably SAR if 
you're in linux (use kSar to graph it) since you'd like to know what happened 
on 
those past periods.. 

It will also help if you can share to us how many CPUs, what disk subsystem is 
this running on, and some general info about your environment.


BTW, you can get the latency numbers from these views

dba_hist_filestatxs, dba_hist_tempstatxs      <-- where the AWR pulls data, 
yours is already 10mins interval
v$filemetric_history                                            <-- 10 mins 
intervals (fixed)
v$filestat and v$tempstat        <-- you can do 1 sec sampling from here if you 
are worried about the averages normalizing the latency numbers

I've uploaded the scripts at http://karlarao.wordpress.com/scripts-resources/ ;, 
all with the same column output.. 
awr_io_ts.sql, awr_io_file.sql
filemetric.sql
filestat.sql


-- 
Karl Arao
karlarao.wordpress.com
karlarao.tiddlyspot.com
References:
- Fw: sql with SQL_OPCODE=0
  - From: Denis
- Re: Fw: sql with SQL_OPCODE=0
  - From: Karl Arao
Re: Fw: sql with SQL_OPCODE=0

Other related posts: