RE: dba_tables.num_rows is less than dba_indexes.num_rows (why is compute out of the question?)

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <Leng.Kaing@xxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 9 Aug 2005 05:20:51 -0400

Wolfgang and Lex really answered your primary question. I'm curious about a
combination of comments in your question:

>>
I've even used estimate_percent of 50 and still getting lower numbers for
the table.

The table is 30G in size and growing, so a COMPUTE is out of the question.

>>



<snip>



So - you already know approximately the time the statistics COMPUTE would
run (2 times 50 percent estimate). Whether or not it is out of the question
depends on your operational situation. (You didn't mention how long it took
for a 50 percent estimate, but that is only a curious aside not related to
the analysis of the situation.)



I contend that the real question is whether you have wasteful plans* with
the current statistics and whether you can make appreciable savings with
better* statistics.



Wasteful plans -

    abstractly - suboptimal. That is, another plan would resolve the query
and return the required results with some combination of less real time and
with less resources consumed.

    If interactive users or a sequential stream of batch operations in a
limited operational window is the important factor of the business
situation, then less real time is likely to be more important.

    If no one or no thing is "waiting" for the results in a way that
matters, then less resources consumed may be more important (if you are near
a margin where the percentage consumption of the throughput capacity of some
component of your system is driving you toward a "knee curve" performance
degradation). If you have many concurrent users and/or competing parallel
batch streams, this may be an important consideration.



Better statistics:



1) As in a compute is a "better" statistic than a 10 percent estimate.

2) Most recently collected.

3) I almost typed histograms - read entire long threads from this forums on
the good, bad, and indifferent about which histograms and when depending on
your data texture and the queries you write against the data.



Now, IF you were to test your important queries against a static copy of
your database, and you got the same plans with, say, a 10 percent estimate
and a COMPUTE, then even though the COMPUTE has "better" statistics, it
makes no difference to you except that it possibly cost* more to collect the
COMPUTEd statistics. (Cost*, sigh, well if you already bought the resource
and no one and no thing is waiting to use it otherwise, you might reasonably
consider it free.)



On the other hand, if COMPUTE gets you better plans such that the cost of
collecting COMPUTE minus the cost of x percent estimate minus the sum of the
savings in the queries is less than zero, then COMPUTE makes sense. (Of
course between 10 percent and COMPUTE there may be a lesser number where the
plans reliably stabilize as the "best" plans.)



I've seen several "thumbrules" posted about the average best practice, and I
won't even quote them. What matters to you is what happens with your data.
Okay, I guess the cost of research matters too.



Now, on the second point: Most recently collected.



The same decision matrix applies: Will more recently collected statistics
affect your plans? If you have histograms that are useful in getting better
plans that have time components in the keys, and your time components drift
with the calendar, then very recent statistics are probably important to
you. If you have relatively unskewed data the just grows proportionally,
then old statistics may well generate the same plans as new statistics.
Knowledge about the texture and change characteristics of your data can help
you make a rational GUESS if you know a lot about the particular vintage of
the CBO you are using. Well designed tests will give you the answer (and
confirm or deny your GUESS, if you made one.) Some people know enough that a
GUESS might be described as a scientific prediction based on a theory of
operations, and the test usually confirms the GUESS in that case.



If your plans don't change with new statistics collections, but they do
change with COMPUTE instead of 50 percent, then an infrequent collection to
the COMPUTE level on a shadow system that you then import (statistics) is
probably better.



All of these things are sliding scales, and just because something might run
long in real time does not mean it is not worthwhile. That depends on the
benefits.



Sorry for the length of this note. "I didn't have time to make it shorter."
(Argue amongst yersefs on that quote.)



And it really didn't directly answer anything you asked, but I hope it was
useful to you.



Regards,



mwf

References:
- dba_tables.num_rows is less than dba_indexes.num_rows
  - From: Leng Kaing

RE: dba_tables.num_rows is less than dba_indexes.num_rows (why is compute out of the question?)

Other related posts: