Re: Indexing options to avoid contention -- 10gR2

  • From: David Aldridge <david@xxxxxxxxxxxxxxxxxx>
  • To: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
  • Date: Wed, 9 Apr 2008 19:25:24 -0700 (PDT)

It's really a matter of the lesser of two evils, I think -- losing the index 
range scan or being vulnerable to index contention. The fast full index scan 
necessitated by the reverse index would still be better than the full table 
scan without any index, even if it's not up to the standards of an index range 
scan. I could cut down on the size by using a function-based index that nulls 
anything older than the first day that this process will run, which is a 
considerable amount of data, and redefining the historical cut-off every now 
and then to keep it trimmed down.

Updates are probably more frequent than inserts, given the nature of the table 
and the data. After all, the average number of times for a row to be updated 
only has to be slightly greater than one for it to be grater than the number of 
inserts by definition.

I've got doubts about the UNION -- I'd want to avoid the inherent DISTINCT by 
using UNION ALL, and slightly more complex logic in one of the two queries to 
avoid projecting the same row more than once if it was both created and updated 
within the capture time period. Unfortunately partitioning, while technically 
an option installed on the database, is not an option in the more casual sense. 
It would be a Really Big Deal and I have to tread very lightly on this system.

I think you're right about the FBI. It does sound like an attractive option.

Thanks Stephane

----- Original Message ----
From: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
To: david@xxxxxxxxxxxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Sent: Wednesday, April 9, 2008 12:38:27 AM
Subject: Re: Indexing options to avoid contention -- 10gR2

David,

    First of all, forget about reverse indexes, because then you 
couldn't have range scans.
There is something that is missing from your analysis, which is the 
relative ratio of inserts vs updates.
Presumably, contention is mostly a concern for inserts, and create_dt. 
Unless updates are applied following a condition on create_dt, they 
shouldn't be much of an issue.
I believe that you should consider your query as the union of two 
different SELECT statements (it's quite possible that this is what the 
optimizer does, anyway). For the SELECT that involves create_dt, what 
I'd like to have in such a case is a table partitioned by day, which 
would spare me an index and make contention not much worse than what it 
is without an index (moreover, depending on your data, subpartitioning 
might help). This would make update_dt the only column worth indexing, 
adding some overhead because of index management but little contention 
issues.

If partitioning isn't an option (not least because the table is already 
partitioned), two separate columns are more costly than one, and I don't 
believe that ii) would be very efficient.  I'd have the same doubts as 
you concerning iii), you have high odds of sending the optimizer on a 
wrong track with such a query. The solution I'd like best is probably 
the function-based index, it would cost a little extra CPU, and all the 
more that rows are frequently updated, but if CPU isn't the bottleneck 
it may be OK. Note also that if you have more updates than inserts, it 
should relieve contention. I wouldn't consider such a simple FBI to be 
much more of a risk than a regular index.

HTH

Stéphane Faroult

David Aldridge wrote:
> I have a large and busy OLTP table, 100GB or so, against which there 
> is a need to capture changes. Until an asynchronous CDC solution is in 
> place we have to rely on two columns: create_date and update_date 
> (null until the first update), both being of DATE type of course.
>
> These are currently unindexed, but there is a desire to index them to 
> improve change capture performance for queries such as:
>  
> select ...
> from  ...
> where    (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
>        or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
>  
> The desire is obviously to provide the maximum performance benefit 
> while reducing the impact on the OLTP system.
>  
> I thought of four different indexing options:
>  
> i) Indexing the columns separately, leading in the best case to an 
> unusual execution plan where the indexes are range scanned and the 
> results merged before accessing the table.
> ii) A single composite index (create_date,update_date),leading to a 
> fast full index scan.
> iii) A single composite index (update_date,create_date), rewriting the 
> query predicate as ...
>    (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and 
> update_date is null)
> or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
> ... and leading to two index range scans. (not sure about this)
> iv) A single-column function based index on 
> (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
> Whichever of these is chosen the growth pattern for the data naturally 
> tends towards index contention as all the new values are equal to 
> sysdate.
>  
> So the nub of my problem comes down to these questions:
>  
> Is it possible to mitigate the index contention problem with a 
> suitably high initrans values? Does it seem likely that implementing a 
> reverse index would be worthwhile in reducing contention (I could 
> probably take the pain of the fast full scan if it was, although it 
> would lead to a larger index due to the 50/50 block splits).
>  
> Would you regard implementing a function-based index as risky in any 
> significant way on major tables of a busy OLTP system? How about a 
> reverse function based index?
>  
> Basically, "what would you do"?
>  
> Maybe I'm overthinking it, but I'd like to go to the sytem developers 
> with a fully thought out set of options (even if they do tell me to 
> take a running jump anyway).

Other related posts: