Re: indexes in big character columns

  • From: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
  • To: jonathan@xxxxxxxxxxxxxxxxxx
  • Date: Fri, 28 Dec 2012 16:10:27 +0100

Jonathan,
> The index compression deduplicates the prefixes in the block, but that
> doesn't mean Oracle can't reconstruct the full index entry.
You are of course absolutely right. I had in mind the removal of common 
suffixes but that was a blast from the past.
> Perhaps the "shouldn't be indexed" is really related to the fact that the
> statistics on character columns that are longer than a few characters can
> be really bad.  Essentially Oracle has some stats about the first 6 bytes
> of a character column - though this can go up to 15 bytes for histograms on
> character columns, and in special cases 32 bytes.  If your 'long character
> columns" all start with http://www. then the optimizer may say that there's
> only one distinct value for the column across the whole table !
>
I wouldn't call a 15 character column a "long column" . You have a 
significant number of surnames that are longer than that, and not only 
in Madagascar. I am not sure that stats should enter in the picture - or 
perhaps to express myself better, the advice should probably be to have 
no stats on columns past a certain width rather than have no index. 
Although in that case Mark's suggestion of a hash could come handily.
I'm wondering whether when the index is compressed it wouldn't be 
possible to omit the prefix from histogram analysis, by replacing 
prefixes by an index (not as in "Oracle", as in "GIF"), hopefully 
shorter than 15 bytes, and completing by the really different bytes. 
That would solve the URL problem you mention.
-- 
Stephane Faroult
RoughSea Ltd <http://www.roughsea.com>
Konagora <http://www.konagora.com>
RoughSea Channel on Youtube <http://www.youtube.com/user/roughsealtd>


--
//www.freelists.org/webpage/oracle-l


Other related posts: