RE: indexes in big character columns

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <sfaroult@xxxxxxxxxxxx>, <oralrnr@xxxxxxxxx>
  • Date: Wed, 26 Dec 2012 15:38:11 -0500

Excellent point by Stephane, as usual.

Now if you do have an application where using an index to dramatically
reduce the candidate pool and then you paw through the remaining actual full
values for an exact match, you *MAY* want to consider a hash (possibly
virtual column) on the long text and then index that surrogate. Or implement
the similar thing as a function based index.

So your predicate for that bit would end up looking like
...
...
and vhash_of_title = your_hash_function(<title_column_expression>)
and title = <title_column_expression>...


That will still not solve the difference in value between a full name and a
nick name, but it does potentially dramatically shrink the size of the index
"keys" so that many more will fit per page at the cost of having extra full
values to sift through on any hash collisions.

Whether this makes sense is a case-by-case measurement of actuals if you
have test data accumulated, and a case-by-case prognostication if you do
not.

And, a hash might make no sense at all in some applications, so if I see
this as a silver bullet or best practice, I'll be very disappointed.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Stephane Faroult
Sent: Wednesday, December 26, 2012 2:32 AM
To: oralrnr@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: indexes in big character columns

Orlando,
   I believe that there are two sides to your question.

First of all, from a technical point of view, an index is basically a tree
plugged over an ordered list of keys and locators (in other words, the
values that are indexed and the rowids where to find them). The bigger the
keys, the bigger the index, the more bytes you have to shuffle when you
insert or delete rows, or update the value. Note though that you can have
index compression (it basically removes common prefixes/suffixes from
successive key values) which improves the byte count, at the cost of
additional table accesses for some queries since you no longer find the full
value in the index. And it's a question of actual value size, not column
size.

The second side is that index search is based on equality between input and
key value. The longer the key, the higher the odds of having divergences. If
you take the example of book titles, for instance, if people look for "Don
Quixote" (the full title of which is "The Ingenious Gentleman Don Quixote of
La Mancha") they will never find it in the index. That's where full text
indexing chimes in.

For me, the second reason is the bigger problem.

HTH,

--
Stephane Faroult
RoughSea Ltd <http://www.roughsea.com>
Konagora <http://www.konagora.com>
RoughSea Channel on Youtube <http://www.youtube.com/user/roughsealtd>




On 12/26/2012 06:53 AM, Orlando L wrote:
> All,
> Me and a colleague were having discussion about creating indexes in
> character columns that are big in size. He is of the opinion that big
> character columns should not be indexed. He could not explain why. I
> remember hearing about it, but did not know why that was the case. Does it
> still apply and can someone please explain. Thanks for the help.
>
> Orlando.
>
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>



--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: