The problem with cardinality estimates is to do with the way that Oracle gathers statistics, not to do with the actual index. If you've got a histogram on the column and Oracle uses the histogram information to estimate cardinality then you run in to a problem with the limited string length that Oracle uses for histogram data. Here's an introductory note to the topic: https://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/ and here's a note that highlights the silly accidents that can happen with very long, but similar, strings: https://jonathanlewis.wordpress.com/2010/10/13/frequency-histogram-5/ Regards Jonathan Lewis http://jonathanlewis.wordpress.com/all-postings Author: Oracle Core (Apress 2011) http://www.apress.com/9781430239543 ----- Original Message ----- From: "Cee Pee" <carlospena999@xxxxxxxxx> To: <jonathan@xxxxxxxxxxxxxxxxxx> Cc: <oracle-l@xxxxxxxxxxxxx> Sent: Friday, March 13, 2015 10:41 PM Subject: Re: Indexing a char column Can you please explain why the first 7 characters being same would mess up the cardinality estimates. Is the column not indexed fully to the whole length. On Wed, Mar 11, 2015 at 4:01 AM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx > wrote: > > Given the nature of your query, the first thing to do is check the > execution plan. If the query can be made efficient by indexing then you > will need two indexes on the table, one on the X column to drive the > subquery and one on the XA column to be the target of the values > identified > by the subquery. > > As it stands it's possible (if the XA index already exists) that the > tablescan is the outer select and the optimizer is using the subquery as > a > filter subquery (or maybe semi-join) through an index on XA. > > If the table is only 1M rows, and you're adding 20 to 40 rows in a few > seconds, then you seem to be growing the table at the rate of around 1M > rows per week (ballpark) - so you must also have a process that is > deleting > those rows; I'd guess that this table is probably keeping track of > web-connections in some way and that rapid access to this table is > important to the web-users; that being the case, and given the "near > uniqueness" of the column I think I would index it; if you're modified > SQL > is close to true I'd index (x, xa) so that the subquery could drive > through > an index range scan and avoid the table. The one special thing I would > consider is whether or not to create a reverse key index: when I say > this, > I'm guessing that the table is a fairly fixed size with balanced inserts > and deletes, and I'm guessing that the X column might be a constructed > value that starts with a time or sequence-based component; if the latter > (particularly) is not true and the data values arrive in a completely > random order then reverse key won't be of any benefit; if mu guesses > happen > to be right you'll be minimising the impact of bug in the handling of > leaf-block splits. ( > https://jonathanlewis.wordpress.com/2009/09/19/index-itls/ ) > > The fact that the column averages 85 to 90 characters is a little > undesirable - but not a disaster. On a related note, though, if many of > the values look similar across the first 6 or 7 characters the optimizer > could get really messed up with its cardinality estimates and produce > massive over-estimates of the expected row counts. > > > Regards > Jonathan Lewis > http://jonathanlewis.wordpress.com > @jloracle > > >> >> On 10.03.2015 17:52, Cee Pee wrote: >> >> >> It is a varchar2 column with max length of 150. I checked and the >> lengths of strings stored currently vary from 85 to 90 characters >> distributed evenly across the values in about 1M rows. The sql in >> modified >> form with table name and col name changed: >> >> *SELECT DISTINCT x FROM xa WHERE xa IN (SELECT xa FROM xa WHERE >> x IN ('123456789101112') ) * >> >> table name and col name in the predicates are same; the subquery selects >> from the same user.table as the main query. >> The same SQL is getting executed several hundreds of times during the >> peak hours. >> >> >>> >>> On 10.03.2015 12:22, Cee Pee wrote: >>> >>>> List >>>> >>>> I see a table being hit and queried a few thousand times over peak >>>> hours. The table has more than million rows and grows while >>>> operational. I >>>> did not measure the growth fully, but based on monitoring for several >>>> minutes, it is adding about 20 to 40 rows in few seconds, when i was >>>> checking via sqlplus. I dont see this growth most of the times though. >>>> It >>>> is a small table with 6 columns and the app in a web application. >>>> There is >>>> also another sql that is run constantly that accesses the rows based >>>> on one >>>> of the char columns which is 150 characters, but there is no index on >>>> the >>>> column which is causing a tablescan. Is this column a bad candidate >>>> for >>>> indexing. Any rule of thumb length for char columns above which adding >>>> index is considered moot? v11.2. >>>> >>>> CP. >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> >>> --- >>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. >>> http://www.avast.com >>> >>> -- >>> //www.freelists.org/webpage/oracle-l >>> >>> >>> >> >> >> -- >> >> >> >> >> >> >> >> ------------------------------ >> [image: Avast logo] <http://www.avast.com/> >> >> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. >> www.avast.com >> >> > > > -- > > > > > > > > ------------------------------ > [image: Avast logo] <http://www.avast.com/> > > Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. > www.avast.com > > ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2015.0.5751 / Virus Database: 4306/9311 - Release Date: 03/15/15 ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2015.0.5751 / Virus Database: 4306/9313 - Release Date: 03/16/15 -- //www.freelists.org/webpage/oracle-l