Battle Against Any Guess [was "high cpu on query"]

  • From: "Alex Gorbachev" <ag@xxxxxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Sun, 1 Jul 2007 03:40:29 -0400

<repost>I meant actually to change the subject line to avoid
side-tracking Joe's thread but hit send too fast so here it is

Dear list,

It was very interesting to track the development of this thread and
I've been waiting whether someone will come up with a suggestion to
take a little different investigation approach instead of, well,
guesswork. It's a little slow due to the weekend and vacation season
and I'm a bit impatient. :) So my apologies to Joe for hijacking this

First of all, it's interesting to have a look at the initial questions:

- what can cause a difference in the execution plans besides the stats and 
- in general what can cause a query to use a lot of cpu?

I don't want to guess here :) so I would like to ask the author how he
came up with those questions?

The problem is very poor query performance. There is some additional
information - the query performed well few days ago and still performs
well on a cloned database. The first impulse is to jump on it with
guesses - what might have changed that could have caused performance

Some guesses were already investigated - indexes didn't change as Joe
suggested. *Some* stats (table stats? index stats? system stats?
column stats as well?)  were manually re-synchronized with the same
results. Now it's time to ask Oracle gurus to do some guesswork.
Typical scenario.

Fair enough - an impressive list was produced (and every point can be
elaborated much further) and I'm sure there might be quite a few less
obvious but still valid guess that some people might be able to

Hey, what about another hint - increased CPU consumption?
OK... Let's start guessing on this one:
- inappropriate hash join
- sorts
- inappropriate nested loop
- out of control sub-query filtering
- latch contention (maybe workload pattern changed a lot?)
The list can go on an on.

Is that an effective approach to try and verify all our guesses even
if they seem to be educated guesses?

Let's step back for a moment... What is the problem we are trying to
solve? The query is running extremely slow.
OK. We start from there - run it through 10046 and 10053 traces. Get
execution plan. Create session profile. Now we can see where the time
is spent. Set STATISTICS_LEVEL=ALL and investigate
V$SQL_PLAN_STATISTICS - see on which step the time is spent.

We shouldn't ignore additional information (performance changed in the
last 2 days) and helpful availability of an older database clone.
However, instead of using it to validate our guesses - why not to use
it for the benefit of troubleshooting process itself?
- Compare execution plans (author mentioned they are different).
- If execution plan hasn't changed - compare session profile and SQL profile.
- If execution plan changed - compare 10053 trace - simple diff would
be a great start.

Over the last years, thanks to few very smart and dedicated gentlemen,
we have gone through revolutionary change in the approach to Oracle
performance tuning. The revolution that allowed to avoid great deal of
guessing and transformed semi-magical/gut-feeling/intuition-based
performance tuning ritual that only best of the best could really
master into a well-defined methodological scientific approach that
everyone of us, mortals, could follow. Performance analysts could stop
relying on intuition and educated (or not) guesses.

Why on earth do we still rely on guesswork in other troubleshooting
tasks especially when more deterministic paths are available?

Think about it and recall how many times you found yourself in the
same situation - jumping from one guess to another trying to find a
shortcut without success. If you recall at least once - you probably
would be interested in joining the BAAG Party -- Battle Against Any
Guess. If you don't reckon it and managed to avoid guesswork in your
troubleshooting process - join as well and spread the world how you
managed that.

Some of you already know what the BAAG party is and have already
become a member. Some of you already started to contribute. For those
of you who doesn't know yet -
Join us and don't forget to subscribe to the RSS feed.

On 6/29/07, Cary Millsap <cary.millsap@xxxxxxxxxx> wrote:
Some more for the list:

5) Change in schema (create/drop indexes, histograms,
partition/cluster/parallelism declarations, etc.)
6) Stored outline manipulation
7) Oracle CBO code change (e.g., upgrade)

Cary Millsap
Hotsos Enterprises, Ltd.

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Jeremy Paul Schneider

four things can change an exec plan:
1) change in text of sql
2) change in init params
3) change in object stats (tables and indexes)
4) change in system stats

i'm pretty sure that if those four things don't change then your plan can't
change.  do you have a job collecting system stats?


On 6/29/07, Joe Armstrong-Champ
<joseph.armstrong-champ@xxxxxxxxx> wrote:

We upgraded to 10.2 a month ago and everything was running fine until 2
 days ago when a query started running very slow in prod. It is selecting
 data from a view using distinct and does an 'order by', too. It runs ok
 in a copy of the db which was refreshed just after the upgrade. The
 difference in the number of rows between the 2 databases isn't that much
 but the execution plans are very different. The indexes are the same in
 both. I manually updated the stats for the affected tables in another
 copy which was refreshed yesterday but the plan didn't change. There is
 a high percentage of cpu involved in the long running query.

 - what can cause a difference in the execution plans besides the stats
 and indexes?
 - in general what can cause a query to use a lot of cpu?


Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
BAAG Party -

Other related posts:

  • » Battle Against Any Guess [was "high cpu on query"]