Re: Indexing a char column

From: "Jonathan Lewis" <jonathan@xxxxxxxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 16 Mar 2015 12:01:33 -0000
The problem with cardinality estimates is to do with the way that Oracle 
gathers statistics, not to do with the actual index.

If you've got a histogram on the column and Oracle uses the histogram 
information to estimate cardinality then you run in to a problem with the 
limited string length that Oracle uses for histogram data.  Here's an 
introductory note to the topic: 
https://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/ and 
here's a note that highlights the silly accidents that can happen with very 
long, but similar, strings: 
https://jonathanlewis.wordpress.com/2010/10/13/frequency-histogram-5/




Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com/all-postings

Author: Oracle Core (Apress 2011)
http://www.apress.com/9781430239543

----- Original Message ----- 
From: "Cee Pee" <carlospena999@xxxxxxxxx>
To: <jonathan@xxxxxxxxxxxxxxxxxx>
Cc: <oracle-l@xxxxxxxxxxxxx>
Sent: Friday, March 13, 2015 10:41 PM
Subject: Re: Indexing a char column



Can you please explain why the first 7 characters being same would mess up
the cardinality estimates. Is the column not indexed fully to the whole
length.




On Wed, Mar 11, 2015 at 4:01 AM, Jonathan Lewis 
<jonathan@xxxxxxxxxxxxxxxxxx
> wrote:

>
>  Given the nature of your query, the first thing to do is check the
> execution plan. If the query can be made efficient by indexing then you
> will need two indexes on the table, one on the X column to drive the
> subquery and one on the XA column to be the target of the values 
> identified
> by the subquery.
>
>  As it stands it's possible (if the XA index already exists) that the
> tablescan is the outer select and the optimizer is using the subquery as 
> a
> filter subquery (or maybe semi-join) through an index on XA.
>
>  If the table is only 1M rows, and you're adding  20 to 40 rows in a few
> seconds, then you seem to be growing the table at the rate of around 1M
> rows per week (ballpark) - so you must also have a process that is 
> deleting
> those rows; I'd guess that this table is probably keeping track of
> web-connections in some way and that rapid access to this table is
> important to the web-users; that being the case, and given the "near
> uniqueness" of the column I think I would index it; if you're modified 
> SQL
> is close to true I'd index (x, xa) so that the subquery could drive 
> through
> an index range scan and avoid the table. The one special thing I would
> consider is whether or not to create a reverse key index: when I say 
> this,
> I'm guessing that the table is a fairly fixed size with balanced inserts
> and deletes, and I'm guessing that the X column might be a constructed
> value that starts with a time or sequence-based component; if the latter
> (particularly) is not true and the data values arrive in a completely
> random order then reverse key won't be of any benefit; if mu guesses 
> happen
> to be right you'll be minimising the impact of bug in the handling of
> leaf-block splits.  (
> https://jonathanlewis.wordpress.com/2009/09/19/index-itls/ )
>
>  The fact that the column averages 85 to 90 characters is a little
> undesirable - but not a disaster. On a related note, though, if many of
> the values look similar across the first 6 or 7 characters the optimizer
> could get really messed up with its cardinality estimates and produce
> massive over-estimates of the expected row counts.
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> @jloracle
>
>
>>
>> On 10.03.2015 17:52, Cee Pee wrote:
>>
>>
>>  It is a varchar2 column with max length of 150.  I checked and the
>> lengths of strings stored currently vary from 85 to 90 characters
>> distributed evenly across the values in about 1M rows. The sql in 
>> modified
>> form with table name and col name changed:
>>
>>        *SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM  xa WHERE
>> x IN ('123456789101112') ) *
>>
>> table name and col name in the predicates are same; the subquery selects
>> from the same user.table as the main query.
>>  The same SQL is getting executed several hundreds of times during the
>> peak hours.
>>
>>
>>>
>>> On 10.03.2015 12:22, Cee Pee wrote:
>>>
>>>> List
>>>>
>>>> I see a table being hit and queried a few thousand times over peak
>>>> hours. The table has more than million rows and grows while 
>>>> operational. I
>>>> did not measure the growth fully, but based on monitoring for several
>>>> minutes, it is adding about 20 to 40 rows in few seconds, when i was
>>>> checking via sqlplus. I dont see this growth most of the times though. 
>>>> It
>>>> is a small table with 6 columns and the app in a web application. 
>>>> There is
>>>> also another sql that is run constantly that accesses the rows based 
>>>> on one
>>>> of the char columns which is 150 characters, but there is no index on 
>>>> the
>>>> column which is causing a tablescan. Is this column a bad candidate 
>>>> for
>>>> indexing. Any rule of thumb length for char columns above which adding
>>>> index is considered moot? v11.2.
>>>>
>>>> CP.
>>>>
>>>
>>>
>>>  --
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>>> http://www.avast.com
>>>
>>> --
>>> //www.freelists.org/webpage/oracle-l
>>>
>>>
>>>
>>
>>
>>  --
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>   [image: Avast logo] <http://www.avast.com/>
>>
>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>> www.avast.com
>>
>>
>
>
> --
>
>
>
>
>
>
>
> ------------------------------
>   [image: Avast logo] <http://www.avast.com/>
>
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> www.avast.com
>
>



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9311 - Release Date: 03/15/15



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9313 - Release Date: 03/16/15

--
//www.freelists.org/webpage/oracle-l
References:
- Indexing a char column
  - From: Cee Pee
- Re: Indexing a char column
  - From: Lothar Flatz
- Re: Indexing a char column
  - From: Cee Pee
- Re: Indexing a char column
  - From: Lothar Flatz
- Re: Indexing a char column
  - From: Cee Pee
- Re: Indexing a char column
  - From: Lothar Flatz
- RE: Indexing a char column
  - From: Jonathan Lewis
- Re: Indexing a char column
  - From: Cee Pee
Re: Indexing a char column

Other related posts: