Re: dba_tables.num_rows is less than dba_indexes.num_rows



Leng Kaing wrote:

Wolfgang,

This is all news to me...

1) I always thought that NUM_ROWS is important. I know that a few weeks
ago when I noticed the differences in the num_rows in tables vs indexes,
we manually increased the table's num_rows and the query would revert to
using an index rather than a full table scan. I have yet to do a test to
see if what would happen if I set NUM_ROWS to a small number, eg. 5.

num_rows is important, but the CBO only uses the value from *_tables, not the one from *_indexes. As Christian already said, if someone has information to the contrary - with a repeatable testcase - let me know.



2) I also did "..., estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'FOR ALL INDEXED COLUMNS', degree=>2, cascade=>true"


Are you saying that FOR ALL INDEX COLUMNS or FOR COLUMNS only gathers
histograms for the tables, but not the stats? So do I have to do it in 2
steps - gather stats, then gather histograms?

3) I've never computed on any of the systems that I've worked on. And it
certainly cannot be done on the current database as it would take too
long. I always thought an estimate would be good enough. Now I'm hearing
that only a compute will do. ARGH!! We cannot afford to do this.

Estimate is usually good enough for tables. In fact amazingly small estimate percentages (.01 or smaller) produce very good estimates for num_rows and avg_row_len.
Estimate is often not good enough for index statistics and IMO never good enough for column histograms.


Maybe one of the reasons you do not have enough time for a gather_compute is that you gather too many histograms and maybe miss a few worth having. A gather_table_stats(.. method_opt=>'for all [indexed] columns size <anything other than 1>') will do a full scan of the table for every column for which a histogram is collected. For estimate_percent=n a temp table with a n% sample of the original table is used instead of the full table.
This testcase shows that even for a very moderatley sized table. I ran every gather twice to alleviate any caching issues:


SQL> select count(*) from pspnlfield;
  COUNT(*)
----------
    282100

SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.04

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254');
Elapsed: 00:05:33.01


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.02

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1');
Elapsed: 00:00:56.01


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.03

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 254');
Elapsed: 00:06:04.06


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.02

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for all columns size 1');
Elapsed: 00:01:01.01


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.01

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254');
Elapsed: 00:00:32.00


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.00

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1');
Elapsed: 00:00:06.08


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.00

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 254');
Elapsed: 00:00:31.07


SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.01

SQL> exec dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for all columns size 1');
Elapsed: 00:00:06.07



A clarification to the statement that you probably collect too many histograms and at the same time may be missing some: You have method_opt=> 'for all indexed columns [size 75]'
a) not all indexed columns really need histograms, only crucial ones with significant skew - that's the too many piece.
b) histograms on crucial columns with significant skew, even if not indexed, can lead the optimizer to a better performing plan - that's the too few piece.


If anyone thinks that too many histograms can't hurt (other than the time wasted collecting them), I had an incident where a client changed their statistics gathering from 'all columns size 1' (i.e. no histograms) to 'all indexed columns size skewonly' and the performance of a frequently used sql went from 0.01 seconds to ~ 90 seconds.

--
Regards

Wolfgang Breitling
Centrex Consulting Corporation
www.centrexcc.com
--
http://www.freelists.org/webpage/oracle-l

Other related posts: