Hi Try to gather table statistics with this approach: exec dbms_stats.gather_table_stats(ownname=>'GMMI_CDI','CUST_ADDRESS',estimate_percent=>null,method_opt=>'FOR ALL COLUMNS SIZE 1',cascade=>true) exec dbms_stats.gather_table_stats(ownname=>'GMMI_STG','NAME_ADDRESS_STG',estimate_percent=>null,method_opt=>'FOR ALL COLUMNS SIZE 1',cascade=>true) Best regards, Pavel. 2010/9/10 Brad Peek <brad_peek@xxxxxxxxx> > > > Listers – Is there a way to “inform” the optimizer of the true cost of the > full scans? I think there is, and I fully intend to RTFM, but I’m hoping > for a quick answer…. > > > > Background: > > Our company has been in production on an Exadata V2 quarter-rack (SATA > drives) for several months now and have been generally pleased with > performance. However, over the past few days we have had a number of > queries that switched from hash joins to nested loop joins and the > performance of these queries is MUCH worse. > > > > This is an ETL application using Informatica to apply batch updates from > staging tables to our target tables. > > > > I’ve just started looking at this, but I have begun to wonder if part of > the problem is that the optimizer is over-estimating the cost of full table > scans. Right from the beginning (i.e. before this current problem) I > noticed that the time estimates in the execution plans were significantly > higher than our actual response times. My theory is that as our tables > have grown we have reached a tipping point that has now caused the optimizer > to favor nested loops, and that the inflated cost of the full scans is a > contributing factor. > > > > I feel the fact that we are on Exadata is likely compounding the issue > since full scans are much faster that non-Exadata while single-block reads > are about the same (discounting the potential speed-up from flash cache). > > > > Most of these queries are quite simple. Below is a representative example: > > > > SELECT gmmi_stg.name_address_stg.batch_id > > , gmmi_stg.name_address_stg.first_name > > , gmmi_stg.name_address_stg.last_name > > , gmmi_stg.name_address_stg.oid_customer > > , gmmi_stg.name_address_stg.oid_address > > FROM gmmi_cdi.cust_address > > , gmmi_stg.name_address_stg > > WHERE cust_address.oid_customer (+) = name_address_stg.oid_customer > > AND cust_address.oid_address (+) = name_address_stg.oid_address > > AND name_address_stg.ident_matched = 'Y' > > AND name_address_stg.address_matched = 'Y' > > AND cust_address.oid_cust_address IS NULL > > AND name_address_stg.batch_id = (select 100908091626685.0 from > dual) ; > > > > Since I know the SELECT FROM DUAL on the last line will look odd to some > people, let me explain that we use that syntax to trick the optimizer into > treating the literal as a bind variable rather than checking the value > against the BATCH_ID column histogram. We found that the optimizer would > greatly underestimate the cardinality if the value was not present when the > statistics were gathered last. At any rate, I don’t think it is part of > the current nested loop issue since that syntax has not changed. > > > > Note that If I “ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=200;” then it > goes back to hash joins. > > > > Thanks in advance. BPeek >