Re: SQL performance in prod

From: Cee Pee <carlospena999@xxxxxxxxx>
To: elkinsl@xxxxxxxxxxx
Date: Thu, 19 Jul 2018 15:49:47 -0500

Nice thread, we get same kind of problems too. Is there a book or paper on
how to read the 10053 trace file, like "basics of 10053"?

Reading Tim's response, I think it is a good time someone wrote a book on
that topic.

CP

On Thu, Jul 19, 2018 at 9:38 AM, Larry Elkins <elkinsl@xxxxxxxxxxx> wrote:

I’ve normally used DBMS_SHARED_POOL.PURGE with the C option to purge
specific SQL’s and get a reparse and desired plan. It continued to work ok,
for me anyway, in 12c, but Carlos Sierra, and maybe some others, had
mentioned potential issues in 12x. Like the link mentions, I’d normally do
this to get a re-parse it to pick up a baseline we just implemented, or
simply to get a re-parse away from an “odd” value such as is already being
discussed.

https://carlos-sierra.net/2017/11/22/purging-a-cursor-in-oracle-revisited/

Regards,

Larry G. Elkins

elkinsl@xxxxxxxxxxx

214.695.8605

*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@
freelists.org] *On Behalf Of *Mark W. Farnham
*Sent:* Thursday, July 19, 2018 9:08 AM
*To:* tim.evdbt@xxxxxxxxx; 'Stefan Knecht' <knecht.stefan@xxxxxxxxx>
*Cc:* vraman4list@xxxxxxxxx; 'oracle-l-freelists' <oracle-l@xxxxxxxxxxxxx>
*Subject:* RE: SQL performance in prod

well… if you change the number of cpus, then you can get a different
number of parallel servers and a different hash distribution, so something
prone to skew of row assignments could in theory be pretty flat with one
number of cpus getting one number of parallel servers and slow with a
different number of cpus getting a different number of parallel servers
with a small number of them getting the load.

So you were initially right (but for a pretty rare yet interesting case.)

I don’t know whether JL’s laundry list of “nothing changed, but the
performance is different” laundry list has been mentioned in this thread,
but as laundry lists go it is better than most.

Of course something like Method-R’s tool can pin down what is taking time
in the pair. That removes all guessing about WHAT is consuming the time,
which sometimes helps diagnose what triggers the different classes of
response.

But most likely you are getting entirely different plan, so you want to
focus on that to rule it out first. It is often the case that non-prod has
a less robust parameter choice in the collection of regression tests for
performance. As in the story of the rainy day, if in prod an unusual
parameter choice produces a plan that is sub-optimal for the majority of
re-uses of the plan for that query, then you would like to be able to flush
just that one query from the shared pool and let it get reparsed. I don’t
know how to do that, despite asking for that rifle shot in place of the
shot gun flush shared pool that can cause a parse storm decades ago.

Still, if you have the sql text you can modify it with a comment and see
if a fresh parse is likely to get the good plan and more importantly,
eliminate the prescience required to ask for a Wolfgang trace.

In addition to difference of parameter (in the sense of bind variable
predicate choice as opposed to init stuff), timing of stats collection
versus running of the query such that some predicates exceed the recorded
high value is a classic way to get a completely different plan.

Your mileage may vary. In addition to this oracle-l thread, I would
certainly also review the Oracle Scratchpad laundry list.

mwf

*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@
freelists.org <oracle-l-bounce@xxxxxxxxxxxxx>] *On Behalf Of *Tim Gorman
*Sent:* Thursday, July 19, 2018 9:36 AM
*To:* Stefan Knecht
*Cc:* vraman4list@xxxxxxxxx; oracle-l-freelists
*Subject:* Re: SQL performance in prod

Stefan,

I had one too many:  hardware changes could not affect execution plans.  I
added that to pad the list without much thought just before pressing SEND,
and of course regretted it two seconds later.

There are certainly more items to add, but I started with the
easily-verifiable stuff.

Clearly a 10053 trace is the ultimate, but as the OP noted, it requires
prescience to be set before it is needed, and prescience isn't always
available.

10053 trace output is also not easy to read.  Reading two such traces,
comprehending both, and then comparing and contrasting usually requires
intelligence and attention to detail approaching the level of Wolfgang
Breitling.

Thanks!

-Tim

On 7/18/18 21:03, Stefan Knecht wrote:

Tim, you forgot one:

7. The fact whether it rains Monday morning or not

The original anecdote referred to the fact that if it rained, a certain
employee that normally arrives first on a sunny day, would get to the
office later - which caused a different employee to first trigger execution
plan creation, with different bind variables, leading to a different plan.

So the query would run fast all day on a sunny day, but slow all day when
it rained.

Venky - try looking at the values of the bind variables of a good run vs a
bad run.

On Thu, Jul 19, 2018 at 5:58 AM, Tim Gorman <tim.evdbt@xxxxxxxxx> wrote:

Venky,

"Assuming there is not much change in the DB"

Let's narrow down the things that can change an execution plan...

   1. hardware change (i.e. #-cpus, # GB of RAM, storage, etc)
   2. application software change (i.e. change to the SQL text)
   3. Oracle software change (i.e. patch, upgrade, etc)
   4. initialization parameter change
   5. gathering system statistics
   6. gathering table, index, column statistics

When you state the assumption about "no much change in the DB", I am
assuming that you're discussing items #1-4.

How about item #5?  Can you query the SYS.AUX_STATS$ table and display the
column PVAL1 where PNAME has the value "DSTART" or "DSTOP"?

How about item #6?  Can you display the contents of DBA_TAB_STATS_HISTORY
for the tables involved in the query?  Please refer to the useful blog
posts by Uwe Hesse HERE
<https://uhesse.com/2012/04/23/diff_table_stats_in_history-example/> and
by Marcel-Jan Krijgsman HERE
<https://mjsoracleblog.wordpress.com/2013/02/19/more-statistics-history/>
for more information, if necessary?

Hope this helps?

Thanks!

-Tim

On 7/18/18 15:30, V Raman wrote:

List

We have a SQL that is performing intermittently bad in our prod env. The
good ones take 2 to 5 mins, the bad ones run for hours we kill them. They
run fine in the non prod env. I ran an awsqrpt and based on that I see that
there are a few executions with the bad ones taking hours. Looking at the
differences in the execution plan, the good ones have lots of nested loops
in them, with the bad ones having lots of hash joins.

I am trying to figure out the cause(s). Assuming there is not much change
in the DB, the first thing that comes to mind is statistics. Can the
listers help with ideas? Thanks.

If anyone is interested is seeing the report, i can provide a link to them
by email.

Venky

--

//

zztat - The Next-Gen Oracle Performance Monitoring and Reaction Framework!

Visit us at zztat.net | @zztat_oracle | fb.me/zztat | zztat.net/blog/

Follow-Ups:
- Re: SQL performance in prod
  - From: Stefan Koehler
- Re: SQL performance in prod
  - From: Mladen Gogala

References:
- SQL performance in prod
  - From: V Raman
- Re: SQL performance in prod
  - From: Tim Gorman
- Re: SQL performance in prod
  - From: Stefan Knecht
- Re: SQL performance in prod
  - From: Tim Gorman
- RE: SQL performance in prod
  - From: Mark W. Farnham
- RE: SQL performance in prod
  - From: Larry Elkins

Re: SQL performance in prod

Other related posts: