Designing a process flow, was RE: Limit on number of columns in an index

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <veeeraman@xxxxxxxxx>, "'Bobak, Mark'" <Mark.Bobak@xxxxxxxxxxxx>
  • Date: Thu, 20 Mar 2008 10:07:53 -0400

Someone else has described the current technical limits of Oracle on this.
(about 32 or 30 depending, and a bit more than 2 times the longest possible
concatenated key plus rowid needs to fit in the block size remaining after
overhead. The 32/30 limit is arbitrary, while the length thing is a logical
requirement unless Oracle decides to break through the huge technical
barrier [that is unlikely of real benefit] of splicing blocks together. [Far
simpler to just use a larger blocksize tablespace for that index if you ever
really needed it, and Oracle could de-deprecate multiple block sizes by
removing the artificial imposed penalty of refusing to implement the option
to automanage the buffer cache size(s) if you have multiple block sizes.])

 

But you have asked "before it is considered a bad design."

 

I consider a design to be bad if it is ill suited to the problem at hand or
error prone.

 

In your case you are populating a temporary table and then updating the
table based on various criteria.

 

The first design question is whether having a single row in hand at the
beginning of this process, could you make all the updates? If that is the
case, you almost certainly should grab a buffer of rows of the convenient
size (small enough to avoid memory problems and accumulated required undo,
large enough to avoid generating significant commit overhead. Without
knowing anything about your data I'd hazard only a very wide guess such as
greater than 500 and less than 100,000 rows per chunk processed circa 2008
AD). Then you pass over the inbound data once without a monolith problem and
you can carve the inbound data into as many parallel threads as is
appropriate to the available hardware. A crude means of buffering could be
to write many temporary tables in parallel and simply FTS each of them in
parallel to perform the desired manipulations.

 

If, on the other hand, there is a set of ordered updates made on subsets of
the rows such that each ultimate tuple cannot be determined until other
phased updates are made to other rows, you may have an interesting problem.
For example, let's say you determine some column category value from other
column values and you treat categories differently based on rank. Then you
would need all the rows in hand at once for the rank analysis.

 

At that point you need to play manual cost based optimizer based on your
long term knowledge and/or likely assumptions about the inbound data
combined with the transforms you need to apply to the data.

 

If a column is used as a filter/value factor only once, it makes no sense to
index it. (Think that through. If someone thinks I'm wrong, please educate
me.)

On the other hand a single column reference *might* provide an effective
intermediate partitioning key, if quite different process groups succeed
such segregation, even if you have to Union-ize the result sets for
subsequent processing.

 

Barring some correction, at some value greater than one filter/value
affecting reference, it becomes worthwhile to index a column. Whether there
are pairwise or setwise concatenated indexes that provide net utility is
another interesting question. During your temporary processing, it might be
the case that some column you indexed for efficiency in an early step would
later be updated after it is no longer needed for any more decisions. In
that case is is likely productive to drop the index before that column
starts getting updated.

 

Pruning the sets on which you must operate makes sense when it is a good bet
that the pruning costs less than performing a null operation on the rows
that would have been pruned.

 

Usually if you lay out the ordered transform steps (and consider whether any
of the steps are arbitrarily ordered and therefore have flexibility with
respect to the transform), you can easily discern a pretty good transform
centered design and it will become obvious when in the overall transform you
need particular indexes.

 

Good luck!

 

mwf

 

 

  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Ram Raman
Sent: Wednesday, March 19, 2008 4:06 PM
To: Bobak, Mark
Cc: oracle-l
Subject: Re: Limit on number of columns in an index

 

The table has 60 columns. The problem is that this is a temporary table
which gets used only when the process is run and gets truncated afterwards.
While running, currenlty it gets populated with 1.4 million rows. The rows
are inserted and then updated based on several criteria. Towards the end of
the process, the values from this table are inserted to another permanent
table.  The process is taking more than 2x longer now (2+ hrs now) than
before because of increased volumes of data. 

   

The insertion was slow, I tuned it. Many update statements are issued
against the temp table. I am trying to make things faster. I tried creating
few indexes ont he temp table(!).  Hope they dont slow down the insertions.
I did not try parallelizing the queries. 

 

Env: Pplsoft, 10g



 

<snip>

-- original question:

Is there any limit on the number of columns a BTree index can have, before
it is considered a bad design. I see a need to build an index with almost 30
columns. 

<snip>

 

Other related posts:

  • » Designing a process flow, was RE: Limit on number of columns in an index