Re: Optimizer issue - cost of full table scans

Hi

Try to gather table statistics with this approach:

exec
dbms_stats.gather_table_stats(ownname=>'GMMI_CDI','CUST_ADDRESS',estimate_percent=>null,method_opt=>'FOR
ALL COLUMNS SIZE 1',cascade=>true)

exec
dbms_stats.gather_table_stats(ownname=>'GMMI_STG','NAME_ADDRESS_STG',estimate_percent=>null,method_opt=>'FOR
ALL COLUMNS SIZE 1',cascade=>true)


Best regards, Pavel.



2010/9/10 Brad Peek <brad_peek@xxxxxxxxx>

>
>
> Listers – Is there a way to “inform” the optimizer of the true cost of the
> full scans?   I think there is, and I fully intend to RTFM, but I’m hoping
> for a quick answer….
>
>
>
> Background:
>
> Our company has been in production on an Exadata V2 quarter-rack (SATA
> drives) for several months now and have been generally pleased with
> performance.   However, over the past few days we have had a number of
> queries that switched from hash joins to nested loop joins and the
> performance of these queries is MUCH worse.
>
>
>
> This is an ETL application using Informatica to apply batch updates from
> staging tables to our target tables.
>
>
>
> I’ve just started looking at this, but I have begun to wonder if part of
> the problem is that the optimizer is over-estimating the cost of full table
> scans.   Right from the beginning (i.e. before this current problem) I
> noticed that the time estimates in the execution plans were significantly
> higher than our actual response times.   My theory is that as our tables
> have grown we have reached a tipping point that has now caused the optimizer
> to favor nested loops, and that the inflated cost of the full scans is a
> contributing factor.
>
>
>
> I feel the fact that we are on Exadata is likely compounding the issue
> since full scans are much faster that non-Exadata while single-block reads
> are about the same (discounting the potential speed-up from flash cache).
>
>
>
> Most of these queries are quite simple.  Below is a representative example:
>
>
>
> SELECT   gmmi_stg.name_address_stg.batch_id
>
>   , gmmi_stg.name_address_stg.first_name
>
>   , gmmi_stg.name_address_stg.last_name
>
>   , gmmi_stg.name_address_stg.oid_customer
>
>   , gmmi_stg.name_address_stg.oid_address
>
>   FROM gmmi_cdi.cust_address
>
>   , gmmi_stg.name_address_stg
>
>   WHERE cust_address.oid_customer (+)    = name_address_stg.oid_customer
>
>     AND cust_address.oid_address (+)     = name_address_stg.oid_address
>
>     AND name_address_stg.ident_matched   = 'Y'
>
>     AND name_address_stg.address_matched = 'Y'
>
>     AND cust_address.oid_cust_address   IS NULL
>
>     AND name_address_stg.batch_id        = (select 100908091626685.0 from
> dual) ;
>
>
>
> Since I know the SELECT FROM DUAL on the last line will look odd to some
> people, let me explain that we use that syntax to trick the optimizer into
> treating the literal as a bind variable rather than checking the value
> against the BATCH_ID column histogram.  We found that the optimizer would
> greatly underestimate the cardinality if the value was not present when the
> statistics were gathered last.   At any rate, I don’t think it is part of
> the current nested loop issue since that syntax has not changed.
>
>
>
> Note that If I “ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=200;” then it
> goes back to hash joins.
>
>
>
> Thanks in advance.   BPeek
>

Other related posts: