RE: index columns

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Sat, 18 Apr 2015 08:24:40 +0000



I think an example like this is worth a couple of pages on its own,
particularly in light of my comment about "consider the full set of indexes on
the table".

First - it is extremely likely that an index on just (account_id) would exist
because it's "almost distinct" and looks like the sort of column which might
commonly be used to find data (irrespective of whether the flag was Y or N).
This "sole use" of column would (almost guaranteeably) over-ride any choice of
column ordering.

Second - If you had a query that was concerned with the 99% YES rows then you
could argue that the 1% overhead introduced by going to the table rather than
resolving in the index was not sufficient to justify adding the column to the
index anyway - presumably it's a column that can change value, so there's a
cost in undo and redo to consider if you do add it.

Third - if you had a common requirement to pick account_ids from in the 1% NO
rows then you could argue the case for creating a function-based index (and
modifying the SQL to match) defined as something like: (case yes_or_no when
'NO' then account_id end)


Regarding the accidental use of the index - accidents will always happen with
the optimizer, and the potential for side effects and anomalies is just part of
the consideration that goes into the indexing set and planning for special
queries. In this example, by the way, I'm not sure that column group stats
would be likely to help: with a 99/1 split I imagine you'd have a histogram on
yes_or_no as a general aid to cardinality calculations, and with "almost
distinct" on the account_id and only two values on yes_or_no the column group
"num_distinct" would be very close to the number of distinct values for
account_id, and should echo the distinct_keys in the two-column index anyway:
and you wouldn't get a useful histogram on the column group, then you'd have to
worry about odd interference between the column group stats and the histogram
on yes_or_no. (I don't know what the state of play is at present when
individual columns have histograms and the column group doesn't, but I recall
that Chris Antognini found at least one anomaly:
http://antognini.ch/2014/02/extension-bypassed-because-of-missing-histogram/ )






Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf
of Tim Gorman [tim@xxxxxxxxx]
Sent: 18 April 2015 00:34
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: index columns

Consider the scenario where there are two columns, ACCOUNT_ID and YES_OR_NO,
where ACCOUNT_ID is almost distinct and YES_OR_NO is populated with the "YES"
in 99% of rows and "NO" in the remainder?

If both columns are used in a query, then clearly the ordering of the columns
doesn't matter, as cited in Richard Foote's referenced blog posts.

But what if *only* the YES_OR_NO column is used in a query? If that column is
leading the index, then it is quite possible that the optimizer might choose to
perform an INDEX RANGE SCAN on that index rather than performing a FULL table
scan, which would be disastrous for performance. However, if ACCOUNT_ID is the
leading column, then that is far less likely to happen, although a rogue INDEX
SKIP SCAN remains a remote possibility.

Column statistics, especially extended column statistics such as column groups
(as described by Maria Colgan online
here<https://blogs.oracle.com/optimizer/entry/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload>),
can provide the optimizer enough information to prevent it from making an
inappropriate decision in the scenario described above.

If column statistics or extended column statistics are not guaranteed to be in
use and gathered correctly, then it is definitely worthwhile to consider
cardinality when ordering columns in a concatenated index.

If your goal is that your trousers never fall down accidentally, then wear both
a belt and suspenders.





On 4/17/15 16:44, Stefan Koehler wrote:

Hi Orlando,
oh a classic oldie and common myth :-))

I think the only source i need to point to is Richard Foote's blog post about
this:
-
https://richardfoote.wordpress.com/2008/02/13/its-less-efficient-to-have-low-cardinality-leading-columns-in-an-index-right/
-
https://richardfoote.wordpress.com/2008/02/15/clustering-factor-a-consideration-in-index-lead-column-decision-sweet-thing/

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher
Homepage: http://www.soocs.de
Twitter: @OracleSK



Orlando L <oralrnr@xxxxxxxxx><mailto:oralrnr@xxxxxxxxx> hat am 18. April 2015
um 00:02 geschrieben:

All

My colleague and I got into a discussion about indexes. I feel that putting
the most selective column first while creating multi column indexes is
the correct approach, followed by second most selective column as the second
column in the index and so on. My colleague feels that the order does
not matter. Can someone clarify.

OL



--
//www.freelists.org/webpage/oracle-l






Other related posts: