Re: Indexing a char column

  • From: MARK BRINSMEAD <mark.brinsmead@xxxxxxxxx>
  • To: mark.powell2@xxxxxx
  • Date: Tue, 10 Mar 2015 16:36:29 -0400

Agreed.

If the predicate (and the data distribution) is selective enough, then the
index may be beneficial.

Ultimately, somebody may have to decide which is more important -- faster
performance for the SELECT statement, or (slightly) slower performance on
INSERTS.  Its a tradeoff, but it is one than we cannot make observing from
here.

Putting an index on a VARCHAR2(150) might not be "ideal".  But I have seen
perfectly good results with things that look far worse.  At the end of the
day, if you need the index, you need the index.

If the size of the key is what's freaking you out  (and again, I have seen
useful indexes with MUCH larger keys), and the predicates of interest are
ALWAYS equality comparisons, then you could maybe consider an FBI based on
something like a 32-bit CRC_CHECKSUM (or maybe an MD5_HASH, although this
is much more computationally expensive), but I expect that in the end
indexing the original VARCHAR2(150) column will provide the best result.

As just pointed out, if you wanted to use an FBI like this, you would need
to modify the queries, anyway.  A predicate like:
   ...
*and some_tab_column = :SOME_CONSTANT*
would need to be rewritten as
   ...

*and (some_tab_column = :SOME_CONSTANT              AND
CRC_CHECKSUM(some_tab_column) = CRC_CHECKSUM(:SOME_CONSTANT)           )*
Really, I don't see the value to that -- at best the checksums will degrade
the selectivity of index only slightly, but occasional hash collisions will
be inevitable, so degrade the selectivity it will.

On Tue, Mar 10, 2015 at 3:38 PM, Powell, Mark <mark.powell2@xxxxxx> wrote:

>  I do not see how a function based index on the length of the varchar2
> data would be beneficial since there are only six lengths between 85 and 90
> and if the number of distinct lengths is fairly evenly distributed then the
> CBO is probably not going to use the index anyway because I would expect
> the data to fairly evenly distributed throughout the blocks.  If you have
> to visit very block or nearly every block why use an index?  What you
> really want is for this column to have hundreds of thousands of distinct
> values.
>
>
>
> The other problem with using a FBI is that for the CBO to use it you
> usually have to modify your code.  I had thought about the possibility of a
> FBI earlier but I was thinking if the data is the column of interest
> followed the pattern where a great many of the rows shared the first N
> characters, such as might happen with stored file name paths, then you
> might be able to build a FBI index on a substring of the tailing portion.
> But you have not described the nature of the data in the column so I held
> off on the suggestion since it will probably not be applicable.
>
>
>
> If this column in question is not updated after the initial insert then
> you are only adding one index key insert in overhead.  If the data is
> selective that has to be far cheaper than full scanning a million rows
> repeatedly.  It should not be that hard to test.
>
>
>
> *From:* stephen van linge [mailto:swvanlinge@xxxxxxxxx]
> *Sent:* Tuesday, March 10, 2015 3:18 PM
> *To:* Powell, Mark; oracle-l@xxxxxxxxxxxxx
>
> *Subject:* Re: Indexing a char column
>
>
>
> I'm not an expert by any means, but just a thought, if the distribution is
> fairly equal between 85 and 90 characters in length, would the
> performance benefit be enough to justify the DML overhead of adding a
> function-based index on length of the varchar2 column?  If not, I'm
> interested in hearing the reasons why it wouldn't as well.
>
>
>
> Thanks,
>
>
>
> Stephen
>
>
>    ------------------------------
>
> *From:* "Powell, Mark" <mark.powell2@xxxxxx>
> *To:* "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
> *Sent:* Tuesday, March 10, 2015 10:39 AM
> *Subject:* RE: Indexing a char column
>
>
>
> If the table has more than a million rows then I expect the cost of the
> repeated full table scans will exceed the cost of maintaining the index
> handily.  The real question is how distinctive is the column in question?
>   If the column is not fairly distinctive then an index will not be of any
> real use anyway.  In the query below is the subject column x or xa?  Is the
> other column already indexed?  Is the constant value shown a single value
> or should the code show comma delimited digits?  Just trying to make sure I
> understand what you posted.
>
>
>
>
>
>
>
>
>
> *From:* oracle-l-bounce@xxxxxxxxxxxxx [
> mailto:oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx>] *On
> Behalf Of *Cee Pee
> *Sent:* Tuesday, March 10, 2015 12:53 PM
> *To:* l.flatz@xxxxxxxxxx
> *Cc:* oracle-l@xxxxxxxxxxxxx
> *Subject:* Re: Indexing a char column
>
>
>
>
>
> It is a varchar2 column with max length of 150.  I checked and the lengths
> of strings stored currently vary from 85 to 90 characters distributed
> evenly across the values in about 1M rows. The sql in modified form with
> table name and col name changed:
>
>        *SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM  xa WHERE x
> IN ('123456789101112') ) *
>
>
> table name and col name in the predicates are same; the subquery selects
> from the same user.table as the main query.
>
> The same SQL is getting executed several hundreds of times during the peak
> hours.
>
>
>
>
> On Tue, Mar 10, 2015 at 7:44 AM, Lothar Flatz <l.flatz@xxxxxxxxxx> wrote:
>
> Hi CP,
>
> I would rather not index a 150 char long  column. Is the actual length
> really 150?
> what is the comparison operator? I think there is a chance it is not
> equal, but substr, like or similar.
>
> Thanks
>
> Lothar
>
>
>
> On 10.03.2015 12:22, Cee Pee wrote:
>
> List
>
> I see a table being hit and queried a few thousand times over peak hours.
> The table has more than million rows and grows while operational. I did not
> measure the growth fully, but based on monitoring for several minutes, it
> is adding about 20 to 40 rows in few seconds, when i was checking via
> sqlplus. I dont see this growth most of the times though. It is a small
> table with 6 columns and the app in a web application. There is also
> another sql that is run constantly that accesses the rows based on one of
> the char columns which is 150 characters, but there is no index on the
> column which is causing a tablescan. Is this column a bad candidate for
> indexing. Any rule of thumb length for char columns above which adding
> index is considered moot? v11.2.
>
> CP.
>
>
>
> --
>
>
>
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> http://www.avast.com
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>
>
>

Other related posts: