Re: Periodic Stats Collection -- CBO Stats Myth?

  • From: "Sriram Kumar" <k.sriramkumar@xxxxxxxxx>
  • To: tim@xxxxxxxxx
  • Date: Fri, 3 Oct 2008 14:50:04 -0400

Hi Tim,

Thanks for your inputs. Appriciate your inputs.

I had one more question. If we are performing sampling ( lets say 10
percent), how does the stats gatherer estimate the high value especially for
date columns?.

Best Regards

Sriram Kumar




On Wed, Oct 1, 2008 at 11:20 AM, Tim Gorman <tim@xxxxxxxxx> wrote:

> For changing just the high-value column-level statistics, don't waste time
> and resources re-gathering on the table, but instead set them manually using
> DBMS_STATS.SET_COLUMN_STATS.  You can just query the current SYSDATE value
> or query the NEXTVAL of the sequence in question to know the current
> high-value, so finding the values to set is cheap.
>
> You'll have to use the PREPARE_COLUMN_VALUES procedure to put the histogram
> endpoint(s) into PL/SQL tables for the SET_COLUMN_STATS call first, so there
> is some involved PL/SQL coding involving collections of record types.
>
> Hope this helps...
>
>
>
> Sriram Kumar wrote:
>
>  Hi William/Andrew,
>
> Thanks for your valuable inputs.
>
> I totally agree with you william on the potential impact of outdated
> low_value and high_value on date columns and columns that are generated
> using sequences.
>
> I have a follow up question on how this data is populated. If I had done an
> estimate percent say 10%, how does CBO guess the high value?.
>
> Best Regards
>
> Sriram Kumar
>
>
>
>
> On Wed, Oct 1, 2008 at 6:54 AM, William Robertson <
> william@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>> I've found that dates are more likely to present a problem because
>> sequentially generated keys are usually highly selective anyway. Say a
>> financial application loads three million trades a day. If the stats are not
>> updated the optimizer will think a given TRADE_ID has 0 occurrences (since
>> it's after the column's high endpoint) and round that up to 1, which may
>> well be correct anyway. However when you run a query for this week's trades
>> on a Friday and it thinks there are none when in fact there are 15 million
>> then you have a much bigger problem. That's my experience anyway.
>>
>>
>> 2008/10/1 Niall Litchfield <niall.litchfield@xxxxxxxxx>
>>
>>> dates and sequence driven key columns both exhibit this behaviour.
>>>
>>> On 30/09/2008, William Robertson <william@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> > I'm not sure how using or not using histograms makes any fundamental
>>> > difference to your approach.
>>> >
>>> > One thing that comes to mind is the use of date predicates. If the
>>> > optimizer knows the highest sale date you have (for example) is 30 Sept
>>> > 2008, you are fine today and maybe some of next week, but in a few
>>> > months' time any queries for "last week's sales" may get
>>> unrealistically
>>> > low cardinality estimates leading to inappropriate index/nested loop
>>> > access paths, Cartesian joins and so on.
>>> >
>>> > If you use partitioning and you have date columns as
>>> > partition/subpartition keys you could have even more fun in store.
>>> >
>>> >
>>> > -----Original message-----
>>> > From: Sriram Kumar
>>> > Date: 30/9/08 22:41
>>> >> Hi Folks,
>>> >>
>>> >>        We had migrated a OLTP/Batch hybrid 9i RBO application to 10g
>>> >> and after rounds of tuning the application now is working at its best.
>>> >> We had experimented with various sampling  rates for statistics for
>>> >> various tables and now I feel we are in a optimal point of performance
>>> >> where CBO is picking up the right Indexes.
>>> >>
>>> >> 1) We do not use histograms
>>> >> 2) From here on the application volumes are bound increase
>>> proportionately
>>> >> 3) No new tables would be added in production.
>>> >> 4) As of now, We have locked the schema stats for the application
>>> >> schemas. The default scheduler job collects stats for all other
>>> schemas
>>> >>
>>> >> Since the increase in volume of data is quite proportional to the
>>> >> existing data, what would be the compelling reason to periodically
>>> >> recollect statistics for the tables that already have good execution
>>> >> plans?
>>> >>
>>> >> e.g lets say that there is a table with million records and with few
>>> >> indexes. We have collected the stats for this table and indexes with
>>> >> million records and all the access paths to this table has been
>>> >> validated that the CBO is picking up the correct optimal index in
>>> >> every case.
>>> >>
>>> >> Now the volume of the table increases to 2 million and from
>>> >> application point of view, the access paths that were valid for 1
>>> >> million would be the valid for 2 million as well.
>>> >>
>>> >> Given this scenario, is it required to collect statistics for 2
>>> >> million as well?. I opine that it would not be required but some of my
>>> >> colleagues feel that we should collect statistics periodically but I
>>> >> am not able to get a valid reason for periodically recollecting the
>>> >> stats?. Is this one of the myths?
>>> >>
>>> >> I would agree to periodically recollect stats if we are using
>>> >> histograms but we are not using histograms. Any other reason that
>>> >> would need a periodic statistic gathering?
>>>
>>
>>
>

Other related posts: