Re: Optimizer issue - cost of full table scans

  • From: Brad Peek <brad_peek@xxxxxxxxx>
  • To: Andrew Kerber <andrew.kerber@xxxxxxxxx>, Kerry Osborne <kerry.osborne@xxxxxxxxxxx>
  • Date: Tue, 14 Sep 2010 16:03:34 -0700 (PDT)

Dropping the indexes is a way to get around this and other query optimization 
issues, but at least in our case the application applies individual key-based 
updates that depend on the indexes being in place.   It will take some time to 
re-engineer our application to do less of that type of processing.   I imaging 
there are more than a few packaged applications that wouldn't perform as well 
the indexes were dropped.  Since Exadata V2 is marketed for both DW and OLTP I 
view the "drop the indexes" advice as problematic for Oracle.  

By the way, the closest thing I have found to what I was looking for in terms 
tailoring the optimizer for a particular environment is 

Rather than start another thread to get opinions on whether or not that is a 
good idea, I think I will just do some experimenting in our DEV/QA environment. 
  For what it's worth I found the following quote in section 13.4 of the 
Database Performance Tuning Guide (11gR2) -- "Oracle highly recommends that you 
gather system statistics".   It was also recommended in an 11G upgrade 
presentation I found over the weekend 

I already know that a lot of people disagree with that recommendation....   I 
promise to be careful.

From: Andrew Kerber <andrew.kerber@xxxxxxxxx>
To: Kerry Osborne <kerry.osborne@xxxxxxxxxxx>
Cc: ORACLE-L L <oracle-l@xxxxxxxxxxxxx>; Brad Peek <brad_peek@xxxxxxxxx>; Greg 
Rahn <greg@xxxxxxxxxxxxxxxxxx>
Sent: Tue, September 14, 2010 8:44:49 AM
Subject: Re: Optimizer issue - cost of full table scans

I did have a conversation with an oracle consultant who has worked on exadata 
several months ago.  He (smugly) told me that the first step in tuning oracle 
for exadata is removing indexes...  So, if the index is causing a problem, why 
not drop it?

On Tue, Sep 14, 2010 at 8:36 AM, Kerry Osborne <kerry.osborne@xxxxxxxxxxx> 

Well I hate to say I agree with using that parameter, because in general I am 
dead set against "big knob" tuning like this, but in this case I think it might 
be a valid approach. Brad and I had an offline discussion and he has a done a 
pretty thorough analysis of the issue. It does appear that the optimizer will 
favor a NL join using an index over a hash join using full table scans once it 
gets past a certain threshold. Unfortunately, the hash join is still light 
faster when it crosses that threshold. I believe that the optimizer is 
completely unaware of how much better full scans can be when using Exadata 
storage (someone please correct me if I'm wrong on this point). But if that's 
the case, then it stands to reason that there will be situations where the 
optimizer will favor index usage where it probably shouldn't. That's because 
optimizer just doesn't know how fast the table scans can be. Removing indexes 
a valid option (or making them invisible), but in some mixed usage scenarios 
this is not really possible. So I'm thinking that this may actually work pretty 
well. The parameter basically applies a fudge factor to the optimizer's 
calculations saying index access is faster or slower. So setting it to a high 
value makes it look like index access is slower. In this type of mixed use 
Exadata environment this seems to make some sense and in fact Brad has had some 
success with setting this parameter at the session level for this particular 
of statements already. I need to do some testing though to verify the theory 
>Feel free to jump in and correct me anywhere I mispoke Brad.
>Kerry Osborne
>On Sep 14, 2010, at 7:33 AM, Andrew Kerber wrote:
>Did someone already mention the optimizer_index_cost_adj setting?  Sets the 
>relative cost of index use versus table scans, Set it at something like 200 
>would probably force full table scans.  I have never tried that though.
>>On Tue, Sep 14, 2010 at 4:05 AM, Niall Litchfield 
>>Well, system stats generally boil down to what happens to sreadtim and 
>>and the ratio between them. I'm not clear what would happen on a Database 
>>Machine, but I can imagine it *might* break the assumptions of the system 
>>costing model, not least that Multi Block reads are slower than single block 
>>>I personally on a non-exadata enabled machine would deal with the stability 
>>>issue that Greg mentions by grabbing system stats repeatedly over a period 
>>>week or a month most likely) to a user defined stattab and then sticking 
>>>them in 
>>>a spreadsheet to determine what values are representative of the system. You 
>>>then just set the stats once and for all. After all I/O and CPU capability 
>>>rarely changes in a normal server (VMs of course blow this assumption out of 
>>>water). I may be overly optimistic but I personally believe (and it is 
>>>mostly a 
>>>matter of logic/faith sadly I don't have enough actual data to say one way 
>>>the other) that representative system statistics are likely to be a sensible 
>>>choice for *most* systems - certainly ones where the OICA and OIC tricks are 
>>>being used.     
>>>On 13 Sep 2010 21:43, "Job Miller" <jobmiller@xxxxxxxxx> wrote:
>>>>I don't know how system stats are calculated, but if they are measured from 
>>>>perspective of the database server, wouldn't they reflect the benefits of 
>>>>exadata storage?
>>>>The server knows it needs to do a FTS, the high water mark says that 
>>>>reading N blocks, exadata storage skips 50% of the blocks via the storage 
>>>>indexes, and the multi-block read times from the servers perspective look 
>>>>fast because storage optimized away some of the i/o.
>>>>If that's true, the system stats would measure the typical system benefit 
>>>>exadata storage experienced during the system stats workload capture.
>>>>Does that make sense?  What am I missing? 
>>>>> How does one maintain how much of a table is on disk vs memory? A> 
>>>>> costing 
>>>>>model that considers all those cases becomes increasingly more 
>>>>> difficult to program and manage.
>>>>The costing model doesn't really attempt to consider all that though does 
>>>>it? I 
>>>>just thought that it measures the average system-wide performance based on 
>>>>workload during capture.  Hopefully what happens during that workload 
>>>>your cache ratios for data, your data placement ratios and your true 
>>>>multi-block average read rates.
>>>>That average system-wide multi-block read should lead to pretty good plans 
>>>>everything.  instead of good only for a few special cases.
>>>>--- On Mon, 9/13/10, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx> wrote:
>>>>> From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
>>>>> Subject: Re: Optimizer issue - cost of full table scans
>>>>> To: "Niall Litchfield" <niall.litchfield@xxxxxxxxx>
>>>>> Cc: "ORACLE-L" <oracle-l@xxxxxxxxxxxxx>
>>>>> Date: Monday, September 13, 2010, 2:16 PM
>>>>> There are no special changes as of
>>>>> today.  When I made the reference
>>>>> to representative stats, ...
>>>>> --
>>Andrew W. Kerber
>>'If at first you dont succeed, dont take up skydiving.'

Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

Other related posts: