Re: dba_tables.num_rows is less than dba_indexes.num_rows
- From: Wolfgang Breitling <breitliw@xxxxxxxxxxxxx>
- To: Leng Kaing <Leng.Kaing@xxxxxxxxxxx>
- Date: Thu, 11 Aug 2005 09:19:22 -0600
Leng Kaing wrote:
Wolfgang,
This is all news to me...
1) I always thought that NUM_ROWS is important. I know that a few weeks
ago when I noticed the differences in the num_rows in tables vs indexes,
we manually increased the table's num_rows and the query would revert to
using an index rather than a full table scan. I have yet to do a test to
see if what would happen if I set NUM_ROWS to a small number, eg. 5.
num_rows is important, but the CBO only uses the value from *_tables,
not the one from *_indexes. As Christian already said, if someone has
information to the contrary - with a repeatable testcase - let me know.
2) I also did "..., estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'FOR
ALL INDEXED COLUMNS', degree=>2, cascade=>true"
Are you saying that FOR ALL INDEX COLUMNS or FOR COLUMNS only gathers
histograms for the tables, but not the stats? So do I have to do it in 2
steps - gather stats, then gather histograms?
3) I've never computed on any of the systems that I've worked on. And it
certainly cannot be done on the current database as it would take too
long. I always thought an estimate would be good enough. Now I'm hearing
that only a compute will do. ARGH!! We cannot afford to do this.
Estimate is usually good enough for tables. In fact amazingly small
estimate percentages (.01 or smaller) produce very good estimates for
num_rows and avg_row_len.
Estimate is often not good enough for index statistics and IMO never
good enough for column histograms.
Maybe one of the reasons you do not have enough time for a
gather_compute is that you gather too many histograms and maybe miss a
few worth having. A gather_table_stats(.. method_opt=>'for all [indexed]
columns size <anything other than 1>') will do a full scan of the table
for every column for which a histogram is collected. For
estimate_percent=n a temp table with a n% sample of the original table
is used instead of the full table.
This testcase shows that even for a very moderatley sized table. I ran
every gather twice to alleviate any caching issues:
SQL> select count(*) from pspnlfield;
COUNT(*)
----------
282100
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.04
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for
all columns size 254');
Elapsed: 00:05:33.01
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.02
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for
all columns size 1');
Elapsed: 00:00:56.01
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.03
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for
all columns size 254');
Elapsed: 00:06:04.06
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.02
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>null,method_opt=>'for
all columns size 1');
Elapsed: 00:01:01.01
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.01
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for
all columns size 254');
Elapsed: 00:00:32.00
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.00
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for
all columns size 1');
Elapsed: 00:00:06.08
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:00.00
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for
all columns size 254');
Elapsed: 00:00:31.07
SQL> exec dbms_stats.delete_table_stats(user,'PSPNLFIELD');
Elapsed: 00:00:01.01
SQL> exec
dbms_stats.gather_table_stats(user,'PSPNLFIELD',estimate_percent=>10,method_opt=>'for
all columns size 1');
Elapsed: 00:00:06.07
A clarification to the statement that you probably collect too many
histograms and at the same time may be missing some: You have
method_opt=> 'for all indexed columns [size 75]'
a) not all indexed columns really need histograms, only crucial ones
with significant skew - that's the too many piece.
b) histograms on crucial columns with significant skew, even if not
indexed, can lead the optimizer to a better performing plan - that's the
too few piece.
If anyone thinks that too many histograms can't hurt (other than the
time wasted collecting them), I had an incident where a client changed
their statistics gathering from 'all columns size 1' (i.e. no
histograms) to 'all indexed columns size skewonly' and the performance
of a frequently used sql went from 0.01 seconds to ~ 90 seconds.
--
Regards
Wolfgang Breitling
Centrex Consulting Corporation
www.centrexcc.com
--
http://www.freelists.org/webpage/oracle-l
- References:
- RE: dba_tables.num_rows is less than dba_indexes.num_rows
- From: Leng Kaing
Other related posts:
- » dba_tables.num_rows is less than dba_indexes.num_rows
- » Re: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » Re: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » RE: dba_tables.num_rows is less than dba_indexes.num_rows
- » Re: dba_tables.num_rows is less than dba_indexes.num_rows
Wolfgang,
This is all news to me...
1) I always thought that NUM_ROWS is important. I know that a few weeks ago when I noticed the differences in the num_rows in tables vs indexes, we manually increased the table's num_rows and the query would revert to using an index rather than a full table scan. I have yet to do a test to see if what would happen if I set NUM_ROWS to a small number, eg. 5.
2) I also did "..., estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'FOR ALL INDEXED COLUMNS', degree=>2, cascade=>true"
Are you saying that FOR ALL INDEX COLUMNS or FOR COLUMNS only gathers histograms for the tables, but not the stats? So do I have to do it in 2 steps - gather stats, then gather histograms?
3) I've never computed on any of the systems that I've worked on. And it certainly cannot be done on the current database as it would take too long. I always thought an estimate would be good enough. Now I'm hearing that only a compute will do. ARGH!! We cannot afford to do this.
- RE: dba_tables.num_rows is less than dba_indexes.num_rows
- From: Leng Kaing