Pavel- This is actually a poor recommendation. Using a "estimate_percent=>null" will be very very costly and since the OP is on 11g (11.2 in fact as he is on Exadata V2), the default value for estimate_percent of dbms_stats.auto_sample_size is much faster and usually within >99% accuracy of a 100% sample. More details on why: http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ On Mon, Sep 13, 2010 at 6:14 AM, Pavel Ermakov <ocp.pauler@xxxxxxxxx> wrote: > Hi > > Try to gather table statistics with this approach: > > exec > dbms_stats.gather_table_stats(ownname=>'GMMI_CDI','CUST_ADDRESS',estimate_percent=>null,method_opt=>'FOR > ALL COLUMNS SIZE 1',cascade=>true) > > exec > dbms_stats.gather_table_stats(ownname=>'GMMI_STG','NAME_ADDRESS_STG',estimate_percent=>null,method_opt=>'FOR > ALL COLUMNS SIZE 1',cascade=>true) > > > Best regards, Pavel. > > > > 2010/9/10 Brad Peek <brad_peek@xxxxxxxxx> > > >> >> Listers – Is there a way to “inform” the optimizer of the true cost of the >> full scans? I think there is, and I fully intend to RTFM, but I’m hoping >> for a quick answer…. >> >> >> >> Background: >> >> Our company has been in production on an Exadata V2 quarter-rack (SATA >> drives) for several months now and have been generally pleased with >> performance. However, over the past few days we have had a number of >> queries that switched from hash joins to nested loop joins and the >> performance of these queries is MUCH worse. >> >> >> >> This is an ETL application using Informatica to apply batch updates from >> staging tables to our target tables. >> >> >> >> I’ve just started looking at this, but I have begun to wonder if part of >> the problem is that the optimizer is over-estimating the cost of full table >> scans. Right from the beginning (i.e. before this current problem) I >> noticed that the time estimates in the execution plans were significantly >> higher than our actual response times. My theory is that as our tables >> have grown we have reached a tipping point that has now caused the optimizer >> to favor nested loops, and that the inflated cost of the full scans is a >> contributing factor. >> >> >> >> I feel the fact that we are on Exadata is likely compounding the issue >> since full scans are much faster that non-Exadata while single-block reads >> are about the same (discounting the potential speed-up from flash cache). >> >> >> >> Most of these queries are quite simple. Below is a representative >> example: >> >> >> >> SELECT gmmi_stg.name_address_stg.batch_id >> >> , gmmi_stg.name_address_stg.first_name >> >> , gmmi_stg.name_address_stg.last_name >> >> , gmmi_stg.name_address_stg.oid_customer >> >> , gmmi_stg.name_address_stg.oid_address >> >> FROM gmmi_cdi.cust_address >> >> , gmmi_stg.name_address_stg >> >> WHERE cust_address.oid_customer (+) = name_address_stg.oid_customer >> >> AND cust_address.oid_address (+) = name_address_stg.oid_address >> >> AND name_address_stg.ident_matched = 'Y' >> >> AND name_address_stg.address_matched = 'Y' >> >> AND cust_address.oid_cust_address IS NULL >> >> AND name_address_stg.batch_id = (select 100908091626685.0 from >> dual) ; >> >> >> >> Since I know the SELECT FROM DUAL on the last line will look odd to some >> people, let me explain that we use that syntax to trick the optimizer into >> treating the literal as a bind variable rather than checking the value >> against the BATCH_ID column histogram. We found that the optimizer would >> greatly underestimate the cardinality if the value was not present when the >> statistics were gathered last. At any rate, I don’t think it is part of >> the current nested loop issue since that syntax has not changed. >> >> >> >> Note that If I “ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=200;” then it >> goes back to hash joins. >> >> >> >> Thanks in advance. BPeek >> > > -- Regards, Greg Rahn http://structureddata.org