Re: Indexing a char column

  • From: Cee Pee <carlospena999@xxxxxxxxx>
  • To: jonathan@xxxxxxxxxxxxxxxxxx
  • Date: Fri, 13 Mar 2015 17:41:19 -0500

Thanks to everyone who answered.  I have been too busy before to get back
quickly. Jonathan was spot on when he said it was a web application. Sorry
about not being open with table names; restrictions with the project. Here
is the plan with autotrace:

Execution Plan
----------------------------------------------------------
Plan hash value: 2021814760

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes |
Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |   206 |
4723   (2)| 00:00:57 |
|   1 |  HASH UNIQUE                  |                |     1 |   206 |
4723   (2)| 00:00:57 |
|*  2 |   HASH JOIN                   |                |     1 |   206 |
4722   (2)| 00:00:57 |
|   3 |    TABLE ACCESS BY INDEX ROWID| XA            |     1 |   103 |
4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | I1394550123266 |     1 |
|     3   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL          | XA            |   888K|    87M|
4707   (1)| 00:00:57 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("XA"="XA")
   4 - access("X"='1234567891011')


The stats are bit old, the table has close to million rows now. The testing
is paused temporarily due to a different reason. I am unable to find the
answer for how big the table will grow yet.

There are two indexes currently on the table, a single column index on a
number column (not in this query) and a two column index on column 'X' and
a third column. Jonathan's idea of building index on both columns in the
query to skip the table looks like a brilliant idea to me especially given
that the query is executed so many times.

The columns all have the same value upto first 10 characters. Here is an
example of a couple of rows:

siteAlias:G5TdJiXR4fRpJM2yvcdSy9JDZzsRcrjBsp8hBTfxThLYvT04KnkW!1121631113!1415190289822
siteAlias:RRcQJgXSblWT0MJt9BSC13RMTPy7JrhGpRGr7ZbrlWlFnqBgfcn0!1121631113!1415190290627
siteAlias:flBxJhXSJmGtcZk68T3ZvnZfMJqBWW970vvQ6qMXRWKnGQlB9v3R!1121631113!1415190290628

Can you please explain why the first 7 characters being same would mess up
the cardinality estimates. Is the column not indexed fully to the whole
length.




On Wed, Mar 11, 2015 at 4:01 AM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx
> wrote:

>
>  Given the nature of your query, the first thing to do is check the
> execution plan. If the query can be made efficient by indexing then you
> will need two indexes on the table, one on the X column to drive the
> subquery and one on the XA column to be the target of the values identified
> by the subquery.
>
>  As it stands it's possible (if the XA index already exists) that the
> tablescan is the outer select and the optimizer is using the subquery as a
> filter subquery (or maybe semi-join) through an index on XA.
>
>  If the table is only 1M rows, and you're adding  20 to 40 rows in a few
> seconds, then you seem to be growing the table at the rate of around 1M
> rows per week (ballpark) - so you must also have a process that is deleting
> those rows; I'd guess that this table is probably keeping track of
> web-connections in some way and that rapid access to this table is
> important to the web-users; that being the case, and given the "near
> uniqueness" of the column I think I would index it; if you're modified SQL
> is close to true I'd index (x, xa) so that the subquery could drive through
> an index range scan and avoid the table. The one special thing I would
> consider is whether or not to create a reverse key index: when I say this,
> I'm guessing that the table is a fairly fixed size with balanced inserts
> and deletes, and I'm guessing that the X column might be a constructed
> value that starts with a time or sequence-based component; if the latter
> (particularly) is not true and the data values arrive in a completely
> random order then reverse key won't be of any benefit; if mu guesses happen
> to be right you'll be minimising the impact of bug in the handling of
> leaf-block splits.  (
> https://jonathanlewis.wordpress.com/2009/09/19/index-itls/ )
>
>  The fact that the column averages 85 to 90 characters is a little
> undesirable - but not a disaster. On a related note, though, if many of
> the values look similar across the first 6 or 7 characters the optimizer
> could get really messed up with its cardinality estimates and produce
> massive over-estimates of the expected row counts.
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> @jloracle
>
>
>>
>> On 10.03.2015 17:52, Cee Pee wrote:
>>
>>
>>  It is a varchar2 column with max length of 150.  I checked and the
>> lengths of strings stored currently vary from 85 to 90 characters
>> distributed evenly across the values in about 1M rows. The sql in modified
>> form with table name and col name changed:
>>
>>        *SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM  xa WHERE
>> x IN ('123456789101112') ) *
>>
>> table name and col name in the predicates are same; the subquery selects
>> from the same user.table as the main query.
>>  The same SQL is getting executed several hundreds of times during the
>> peak hours.
>>
>>
>>>
>>> On 10.03.2015 12:22, Cee Pee wrote:
>>>
>>>> List
>>>>
>>>> I see a table being hit and queried a few thousand times over peak
>>>> hours. The table has more than million rows and grows while operational. I
>>>> did not measure the growth fully, but based on monitoring for several
>>>> minutes, it is adding about 20 to 40 rows in few seconds, when i was
>>>> checking via sqlplus. I dont see this growth most of the times though. It
>>>> is a small table with 6 columns and the app in a web application. There is
>>>> also another sql that is run constantly that accesses the rows based on one
>>>> of the char columns which is 150 characters, but there is no index on the
>>>> column which is causing a tablescan. Is this column a bad candidate for
>>>> indexing. Any rule of thumb length for char columns above which adding
>>>> index is considered moot? v11.2.
>>>>
>>>> CP.
>>>>
>>>
>>>
>>>  --
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>>> http://www.avast.com
>>>
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>>
>>
>>
>>  --
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>   [image: Avast logo] <http://www.avast.com/>
>>
>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>> www.avast.com
>>
>>
>
>
> --
>
>
>
>
>
>
>
> ------------------------------
>   [image: Avast logo] <http://www.avast.com/>
>
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> www.avast.com
>
>

Other related posts: