FW: same sql, same execution plan, but the time to complete is very different when run in different time.

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: "'oracle-l'" <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 12 Nov 2007 09:12:45 -0500

snipped to fit.

  _____  

From: Mark W. Farnham [mailto:mwf@xxxxxxxx] 
Sent: Saturday, November 10, 2007 6:48 AM
To: 'staywithpin@xxxxxxxxx'; 'oracle-l'
Subject: RE: same sql, same execution plan, but the time to complete is very
different when run in different time.

Your total time difference is less than a factor of three. Of course from
your data as presented we have no way to know whether each "bad running" is
proprotionally a bit slower or if one or a few of the "bad running"
executions are relatively very long possibly meaning most of the "bad
running" are actually faster than most of the "good running."  Possibly you
know, but we don't. Cary Millsap has called this "skewing" and is actually
presenting a paper about this being at the bottom of most of the problem
resolution brain cramps he has seen. (I'm paraphrasing horribly, but I hope
you take away a correct image of the overall theme.)

Interestingly enough, I saw almost the same relative performance ratio on a
process in 1989 or 1990 that turned out to be an artifact of having a
mixture of 25MHz and 75MHz processors shoved into an SMP box. (I promise you
I did not put that machine together. I believe that led to Sequent
increasing the font size in the configuration guideline memo for the
paragraph where they told you not to do that.)

Now whether you're spinning more on latches, or reading more buffers that
have to be reconstructed for read consistency, or have more system level
context switching, or simply scanning a bigger buffer pool, or any other
greater use of CPU in at least one of the "Bad running" executions, the way
to find out is a trace. But which of 1.6 million executions to trace?

Someone may have a better suggestion, but I'm thinking the good experimental
news is that you do this every day. If you record the minimum and maximum
execution times you'll know whether you're looking for proportional
degradation across the system or a small number of outliers. If you have a
small number of outliers you're looking for a transient cause that
dramatically affected a few of 1.6 million. Then you could reasonably snap
tracing on any execution that exceeded the average execution time by a
factor of two or so and find out what it is really waiting for (you'll miss
part of the job, but probably still see enough if the skewing is large.) If
your distribution of execution times is very flat, you're looking for a
chronic problem over the duration of the window you measured, so snapping a
trace and system statistics on any average "Good running" execution and any
average "Bad running" execution has decent a chance to give you your answer
to what is "eating" extra CPU. 

Or it could be that you have extra indexes that must be changed in the "Bad
running" case that have nothing to do with the execution plan but which have
columns updated. That is extra work. Or it could be a row movement issue
dependent on what the updates are. This is two questions in a row that
appear designed to generate guesses. I'd almost guess you're pulling our
legs, but that would be a guess and rude, so I won't.

Regards,

mwf

  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of qihua wu
Sent: Saturday, November 10, 2007 2:12 AM
To: oracle-l
Subject: same sql, same execution plan, but the time to complete is very
different when run in different time.

Hi, we have a batch job(run throught SQLPLUS) which runs every day. For the
same sql with the same execution plan, but the time spent on CPU is largely
different. One is 781 seconds and another one is 2193 seconds. As buffer
gets eat CPU, so I also compared the number of buffer gets and found the
number of buffer gets are very close. SQL parsing also eats CPU, but the sql
run millions of times inside a procedure, so it should be only parsed once,
and one time of parse shouldn't use much CPU. What else can eat so much CPU?
<snip>

FW: same sql, same execution plan, but the time to complete is very different when run in different time.

Other related posts: