Re: More on dbms_stats

  • From: Wolfgang Breitling <breitliw@xxxxxxxxxxxxx>
  • To: achoto@xxxxxxxxxxxx
  • Date: Wed, 10 Aug 2005 14:01:34 -0600

Ana,

gather_schema_stats(options=>'GATHER AUTO"... implies estimate_percent=>auto_sample_size which also extends to the indexes because of cascade=>true. It also implies method_opt=>'for all columns size auto' (despite what you specify in your gather_schema_stats command). You therefore get histograms on columns previously referred to in predicates.

gather_table_stats(... defaults to estimate_percent=>NULL (i.e. full compute) which again extends to the indexes because of cascade=>true.

estimate_percent <> NULL is not a problem for table statistics since it generally is able to reasonably accurately estimate num_rows and avg_row_len (blocks is always accurate).

However, that is not the case for indexes or histograms. For them anything but a full compute can introduce significant estimation errors which have an impact on the CBO's calculations. Furthermore, for for the gather_table_stats the method_opt parameter is not ignored, so that gather does NOT collect histograms on any columns. Contrary to popular belief, the mere presence of a histogram - even if collected properly with a compute, but especially if collected with estimate - can cause the CBO to choose a bad plan.

Try saving the statistics when doing the gather_table_stats(... stattab=>'xxx',statid=>'yyy') (you should do that always anyways so you can restore the stats if the new ones don't work out) and then compare the before and after, especially for indexes and histograms.

A few suggestions:
a) you gan simulate the gather_auto with your own criteria and then call gather_table_stats for qualifying tables. Then you can set estimate_percent and method_opt to what you want.


b) You can get the list of qualifying objects from gather_schema_stats(options=>'GATHER LIST') and then use gather_table_stats for the list of tables returned.

c) You could just follow your "gather auto" job with the gather_table_stats script, but then you do the work for that table twice.

Ana Choto wrote:

Yesterday I posted a question regarding dbms_stats and analyze.  A
developer was running analyze compute on some tables because a report
wouldn't run unless he analyzed compute statistics on those tables.  I
received great responses from the list, for which I'm really thankful.  The
developer now is using dbms_stats and the report works just fine.

The only thing is that he still has to run dbms_stats even after the daily
job that gathers statistics finishes because the report doesn't work.  I'm
gathering schema stats with the following options:

exec dbms_stats.gather_schema_stats(ownname=>'DATATEL',options=>'GATHER
AUTO',degree=>8,cascade=>TRUE,method_opt=>'FOR ALL COLUMNS SIZE 1');

This is done on all the datawarehouse schemas and the dbms_stats job runs
after the daily load has finished.  If I understand correctly 'gather auto'
collects statistics for those tables that have experienced a 10% change or
more.  I have checked these tables and they have been analyzed.  But, the
report still won't run until the developer reanalyzes (now with dbms_stats)
those tables.  He runs dbms_stats with the following options:

exec
dbms_stats.gather_table_stats(owname=>'DATATEL',tabname=>'<table_name>',cascade=>TRUE,method_opt=>'FOR
 ALL COLUMNS SIZE 1');


-- Regards

Wolfgang Breitling
Centrex Consulting Corporation
www.centrexcc.com
--
//www.freelists.org/webpage/oracle-l

Other related posts: