Re: Performance impact of MONITORING and GATHER_STALE

  • From: Stephen Barr <ascaroth969@xxxxxxxxxxx>
  • To: niall.litchfield@xxxxxxxxx, Leng.Kaing@xxxxxxxxxxx
  • Date: Wed, 16 Mar 2005 11:25:44 +0000 (GMT)

We've actually gone down the route of nor gathering
statistics at all on the majority of our tables once
they have been "baselined".

This is a DSS environment and our structures mainly
fall in to three categories -

1. Partitioned, time series
2. Non-partitioned, time series
3. Non-partitioned, non-time series

For the partitioned tables we do analyze the most
recent partition on a nightly basis post ETL. The
global statistics on the table are updated through a
package we've put together to update specific stats
such as num_rows, num_blks, null_cnt, max/min values,
etc.

For the other tables we have a few different
requirements, but the process is similar - we manually
update the stats using this package and the data from
the ETL process.

All of our stats are also under change control, so we
backup the statistics each night, just in case, and
keep a months worth of statistics off-line in a change
management program.

Histograms are collected and refreshed only when
required. The profile of our skewed data doesn't
usually change, so we create the histogram once and
then leave it alone until it is identified as a
problem through pro-active monitoring.

Our analyze run now completes in ~10 minutes on our
4TB DSS.


I suppose the main question may be WHY take this
approach? Well, we have a small subset of our data
being queried by the majority of our users...this is
recent data (last 6 months). However, we have 8 years
of data in the warehouse. Most of the data was
migrated from an older source system and is therefore
has a very different profile from the data being
loaded from the new source systems.

Essentially, we need to skew the statistics towards
the data we will actually be querying...something we
couldn't do with traditional gather_stale options. We
completed this during a "baselining" period.

Of course there are disadvantages to this approach -
probably a little extra maintenace, much more
proactive monitoring is required and the statistics
will undoubtedly slowly get out of synch and require
manual intervention...but the benefits in our
situation I believe out-weigh the disadvantages.

I would be interested to hear from anyone else who has
tried this approach - or anyone who thoroughly
disagrees!

Many thanks,

Steve.

--- Niall Litchfield <niall.litchfield@xxxxxxxxx>
wrote:
> On Wed, 16 Mar 2005 12:32:55 +1100, Leng Kaing
> <Leng.Kaing@xxxxxxxxxxx> wrote:
> > Hi Mladen,
> > 
> > Sorry, I don't quite understand. What do you mean
> when you say "It also
> > takes away any purpose from gathering statistics
> based on STALE status."
> > 
> > I was going to turn on schema MONITORING and then
> use GATHER_STALE to
> > speed up the analyse command. Is this not the
> correct approach?
> 
> I think it rather depends what your goal is.
> Certainly using
> GATHER_STALE will speed up the length of time stats
> gathering takes,
> I'm not convinced that this is a good goal.
> 
> I think that what Mladen was on about was that the
> idea of monitoring
> and GATHER_STALE is that you only need to collect
> statistics on
> objects that have been subject to a significant
> amount of DML. The
> trouble with not subjecting the modification
> tracking to be part of
> the transaction is that you can radically overstate
> the amount of DML 
> that has actually occurred leading to unnecessary
> analysis.
> 
> My view is that whilst gather_stale is a step in the
> right direction,
> it is predicated upon a bad assumption (or rather a
> rule of thumb)
> that changes of 10% are significant for all tables
> and changes of less
> than 10% aren't. If you could do ALTER TABLE T
> MONITORING THRESHOLD x%
> then I'd be more inclined to go along with it.
> 
> Incidentally I'm wondering how a strategy of never
> gathering stats at
> all, but relying on dynamic sampling at a high level
> (probably 4)
> would work, my guess is it would break down in
> high-load systems, but
> it might help more normal systems with specific
> problem sqls because
> of the differences between dbms_stats and dynamic
> sampling. If anyone
> has looked at this idea it would be interesting to
> know what you
> found.
> 
>  
> -- 
> Niall Litchfield
> Oracle DBA
> http://www.niall.litchfield.dial.pipex.com
> --
> //www.freelists.org/webpage/oracle-l
> 

Send instant messages to your online friends http://uk.messenger.yahoo.com 
--
//www.freelists.org/webpage/oracle-l

Other related posts: