Re: Optimizer issue - cost of full table scans

Pavel-

This is actually a poor recommendation.  Using a "estimate_percent=>null"
will be very very costly and since the OP is on 11g (11.2 in fact as he is
on Exadata V2), the default value for estimate_percent of
dbms_stats.auto_sample_size is much faster and usually within
>99% accuracy of a 100% sample.

More details on why:
http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/


On Mon, Sep 13, 2010 at 6:14 AM, Pavel Ermakov <ocp.pauler@xxxxxxxxx> wrote:

> Hi
>
> Try to gather table statistics with this approach:
>
> exec
> dbms_stats.gather_table_stats(ownname=>'GMMI_CDI','CUST_ADDRESS',estimate_percent=>null,method_opt=>'FOR
> ALL COLUMNS SIZE 1',cascade=>true)
>
> exec
> dbms_stats.gather_table_stats(ownname=>'GMMI_STG','NAME_ADDRESS_STG',estimate_percent=>null,method_opt=>'FOR
> ALL COLUMNS SIZE 1',cascade=>true)
>
>
> Best regards, Pavel.
>
>
>
> 2010/9/10 Brad Peek <brad_peek@xxxxxxxxx>
>
>
>>
>> Listers – Is there a way to “inform” the optimizer of the true cost of the
>> full scans?   I think there is, and I fully intend to RTFM, but I’m hoping
>> for a quick answer….
>>
>>
>>
>> Background:
>>
>> Our company has been in production on an Exadata V2 quarter-rack (SATA
>> drives) for several months now and have been generally pleased with
>> performance.   However, over the past few days we have had a number of
>> queries that switched from hash joins to nested loop joins and the
>> performance of these queries is MUCH worse.
>>
>>
>>
>> This is an ETL application using Informatica to apply batch updates from
>> staging tables to our target tables.
>>
>>
>>
>> I’ve just started looking at this, but I have begun to wonder if part of
>> the problem is that the optimizer is over-estimating the cost of full table
>> scans.   Right from the beginning (i.e. before this current problem) I
>> noticed that the time estimates in the execution plans were significantly
>> higher than our actual response times.   My theory is that as our tables
>> have grown we have reached a tipping point that has now caused the optimizer
>> to favor nested loops, and that the inflated cost of the full scans is a
>> contributing factor.
>>
>>
>>
>> I feel the fact that we are on Exadata is likely compounding the issue
>> since full scans are much faster that non-Exadata while single-block reads
>> are about the same (discounting the potential speed-up from flash cache).
>>
>>
>>
>> Most of these queries are quite simple.  Below is a representative
>> example:
>>
>>
>>
>> SELECT   gmmi_stg.name_address_stg.batch_id
>>
>>   , gmmi_stg.name_address_stg.first_name
>>
>>   , gmmi_stg.name_address_stg.last_name
>>
>>   , gmmi_stg.name_address_stg.oid_customer
>>
>>   , gmmi_stg.name_address_stg.oid_address
>>
>>   FROM gmmi_cdi.cust_address
>>
>>   , gmmi_stg.name_address_stg
>>
>>   WHERE cust_address.oid_customer (+)    = name_address_stg.oid_customer
>>
>>     AND cust_address.oid_address (+)     = name_address_stg.oid_address
>>
>>     AND name_address_stg.ident_matched   = 'Y'
>>
>>     AND name_address_stg.address_matched = 'Y'
>>
>>     AND cust_address.oid_cust_address   IS NULL
>>
>>     AND name_address_stg.batch_id        = (select 100908091626685.0 from
>> dual) ;
>>
>>
>>
>> Since I know the SELECT FROM DUAL on the last line will look odd to some
>> people, let me explain that we use that syntax to trick the optimizer into
>> treating the literal as a bind variable rather than checking the value
>> against the BATCH_ID column histogram.  We found that the optimizer would
>> greatly underestimate the cardinality if the value was not present when the
>> statistics were gathered last.   At any rate, I don’t think it is part of
>> the current nested loop issue since that syntax has not changed.
>>
>>
>>
>> Note that If I “ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=200;” then it
>> goes back to hash joins.
>>
>>
>>
>> Thanks in advance.   BPeek
>>
>
>


-- 
Regards,
Greg Rahn
http://structureddata.org

Other related posts: