Re: Indexing a char column

  • From: Lothar Flatz <l.flatz@xxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 11 Mar 2015 08:54:54 +0100

Hi,

I think the index is inefficient because you do by no means need 80 character to uniquely identify a million rows. We can assume to need 12 digits for reasonable key playing it very safe and the 80-90 column still has an overhead of factor 7. This is also the reason why I am less concerned about coalitions using a FBI based on ora_hash, checkum, etc. The size of the index key does matter in many respects. Mainly there will be less entries per index block as compared to a key that is shorter. As a consequence the index is larger, will trigger more block splits, more maintenance overhead. It will have a likely lesser hit rate per block in buffer cache on average, will therefore not cache as well etc. I was once involved in a project where they where mimicking ISAM by padding key values to the max length. It was quite a nightmare how slow that stuff worked.

My gut feeling would still be that you might want to just go ahead and try it, but it is possible by the rate of data growth that will build a maintenance nightmare. I think a viable plan B might be to index on ORA_HASH(xa,any_prime_number_with_enough_digits) and use VPD to change your query to:

/SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM xa WHERE x IN ('123456789101112') *and */*/ORA_HASH(xa,any_prime_number_with_enough_digits) = /**/ORA_HASH(/*/*/'123456789101112'/**,any_prime_number_with_enough_digits) *) /;

Thanks

Lothar

On 10.03.2015 22:29, Cee Pee wrote:
To answer Seth and Lothar, the column is almost unique though no unique constraint is declared. There is only one duplicate value among the million rows, could be due to testing. I think that would make the index highly selective. Can you please explain what you mean by the index will not be efficient.

I do not know yet how big the table can grow or what will be the delete or insert frequency, we are testing.


On Tue, Mar 10, 2015 at 12:16 PM, Lothar Flatz <l.flatz@xxxxxxxxxx <mailto:l.flatz@xxxxxxxxxx>> wrote:

    Hi,

    I think you could eventually use an Index here, but i am certainly
    not very happy with it considering the update frequency you are
    talking about. I have seen Indexes based on columns that long.
    There are not very efficient.
    I would consider clever ways to decrease the length of the column,
    e.g. use the ORA_HASH function. Certainly the consequence is
    rewriting the query though.
     If the data is not only growing but also gets deleted eventually
    you might even consider hash cluster. I normally do not recommend
    it, but that could be an exceptional case.
    If teh table is constantly growing forget about hash cluster.

    On other more conservative way would be using hash partitioning
    and compressing the table. Few columns is supporting compression.
    but your long column looks bad in that respect.
    Is the column unique?
    I have to admit the choice is not easy and I am not very happy
    with either suggestion.
    I am afraid you have to try.

    Thanks

    Lothar



    On 10.03.2015 17:52, Cee Pee wrote:

    It is a varchar2 column with max length of 150. I checked and the
    lengths of strings stored currently vary from 85 to 90 characters
    distributed evenly across the values in about 1M rows. The sql in
    modified form with table name and col name changed:

    /SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM  xa WHERE
    x IN ('123456789101112') ) /

    table name and col name in the predicates are same; the subquery
    selects from the same user.table as the main query.
    The same SQL is getting executed several hundreds of times during
    the peak hours.



    On Tue, Mar 10, 2015 at 7:44 AM, Lothar Flatz <l.flatz@xxxxxxxxxx
    <mailto:l.flatz@xxxxxxxxxx>> wrote:

        Hi CP,

        I would rather not index a 150 char long  column. Is the
        actual length really 150?
        what is the comparison operator? I think there is a chance it
        is not equal, but substr, like or similar.

        Thanks

        Lothar


        On 10.03.2015 12:22, Cee Pee wrote:

            List

            I see a table being hit and queried a few thousand times
            over peak hours. The table has more than million rows and
            grows while operational. I did not measure the growth
            fully, but based on monitoring for several minutes, it is
            adding about 20 to 40 rows in few seconds, when i was
            checking via sqlplus. I dont see this growth most of the
            times though. It is a small table with 6 columns and the
            app in a web application. There is also another sql that
            is run constantly that accesses the rows based on one of
            the char columns which is 150 characters, but there is no
            index on the column which is causing a tablescan. Is this
            column a bad candidate for indexing. Any rule of thumb
            length for char columns above which adding index is
            considered moot? v11.2.

            CP.



--




        ---
        Diese E-Mail wurde von Avast Antivirus-Software auf Viren
        geprüft.
        http://www.avast.com

        --
        //www.freelists.org/webpage/oracle-l





--





    ------------------------------------------------------------------------
    Avast logo <http://www.avast.com/>    

    Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
    www.avast.com <http://www.avast.com/>





--






---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
http://www.avast.com

Other related posts: