Re: dbms_stats.gather_table_stats

  • From: "Greg Rahn" <greg@xxxxxxxxxxxxxxxxxx>
  • To: genegurevich@xxxxxxxxxxxx
  • Date: Fri, 5 Sep 2008 14:59:41 -0700

My first comment is that you are probably spending more time gathering
stats than you need to be.  Here is the reason why:
If you are using the command you listed below, the OPTIONS parameter
is defaulted (to GATHER).  GATHER gathers ALL statistics on every
table/partition, whether they are stale or not.  If you have tables
that are partitioned and many of those partitions are nonvolatile, it
is unlikely that the data has changed *significantly* enough to matter
(stats are still representative), so why regather stats on them?  This
is the case for most time series data warehouses.

Also with the command you are using GRANULARITY is defaulting to
'AUTO' (unless you changed the default with dbms_stats.set_param), so
when you specify a  partition you are 1) gathering stats for that
specific partition and 2) gathering stats for the entire table
(global) and 3) stats for indexes (cascade).  If you want to only
gather partition stats and not update the global stats, use
GRANULARITY=>'PARTITION'.

WRT the time difference, it may be that the first run only did a
single pass because it gathered stats for every partition and used
those samples for the global stats as well.  This may be an
optimization (but I would have to verify) for this case. This would
explain why the single partition gathers are just a bit longer than
the first - they need to do 1 partition + full table, where the first
just does full table.

Also 25% seems like quite a large ESTIMATE_PERCENT.  Do you require
such a large sample to get representative stats?  Dos
dbms_stats.auto_sample_size work?

I have some more notes on my blog on the subject
http://structureddata.org/2008/03/26/choosing-an-optimal-stats-gathering-strategy/

Hope that helps.

On Fri, Sep 5, 2008 at 12:00 PM,  <genegurevich@xxxxxxxxxxxx> wrote:
> Hi everybody:
>
> I am running oracle 10.2.0.1. Once a week we have a analyze process run to
> gather stats on all the
> tables in the schema via commands like this one exec
> dbms_stats.gather_table_stats( 'schema', 'table',            null, 25,
> degree=>8, cascade=>true );
>
> Some of the tables are partitioned and we have 6 or 7 historical partitions
> which are not being updated anymore
> so I thought I would save time by restricting this command to only last two
> partitions:
>
> exec dbms_stats.gather_table_stats( 'owner', 'table', 'part1', 25,
> degree=>8, cascade=>true );
> exec dbms_stats.gather_table_stats( 'owner', 'table', 'part2', 25,
> degree=>8, cascade=>true );
>
> The partitions are roughly the same size. The indices are globally
> partitioned. The first command (analyze
> the whole table) took 25 minutes. Last two command combined - 27 min.
> Considering that there are 5 more patitions
> in the table, the math does not add up. What am I missing?
>
>
> thank you
>
> Gene Gurevich
>
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>



-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l


Other related posts: