"Multi-Column" Histogram / Histogram for Concatenated Index

  • From: Hemant K Chitale <hkchital@xxxxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 13 Dec 2007 23:35:38 +0800

I have a table with an Index on 4 columns. One particular column (the third in the index)
is highly skewed with one value accounting for 70% of the rows.
Queries with the Index columns as predicates are perfectly fine for all values
other than that one value.
Even if I gather a histogram on the skewed column, queries for the "bad" value
still use the index.  The bad value alone accounts for 70% of the rows.
In combination with a particular set of values for the other columns, the query
fetches about 55% of the rows.  I do not want to use the index.

Say :
Table TAB_1
Columns  C_1, C_2, C_3, C_4, C_5, C_6, C7
Index on C_2,C_4,C_5,C6

Value "881" in column C_5 accounts for 70% of the rows.
A particular combination of values for C_2, C_4, C_5 ("881") and C_6
account for 55% of the rows.

I cannot change the SQL code.  It is "generated" by the application.
Even if I were to change the code,  I find that this SQL query runs
in a loop for different combinations  of these columns.  It is only
one combination that is my problem.  The combinations of
C_2, C_4, C_5, C_6   are driven from three other tables
Besides these columns C_1 and  C_7 are also linked to the other tables.
Some joins are equi-joins, some are ranges.
{Yes, this sounds like a fact table and dimension tables, doesn't it ?}

This is, 64-bit.
Statistics are gathered nightly.  I am allowed to modify the gather statistics
job to collect specific histograms.

I find that gathering histogram on C_5 alone doesn't help.
{There are 6- distinct values, and I can specify upto 254 buckets}.
Queries still use the index

Questions :
1.  If I rebuild the index to use C_5 as the leading column would
Oracle be using the Histogram to convert queries for C_5='881'
into FullTableScans ?
2.  Should I gather histograms on all the columns of this index ?

This is not exactly like the "correlated" columns that is used as an
example for 11g MultiColumn Statistics ("Extended Statistics")
as there is no relationship between the column values.
What I am looking for is something like a Histogram on the Index Key
(rather than just "num_rows" and "distinct_keys", I want the density
for different combinations of key values -- a Histogram on the Index !)

Hemant K Chitale

"There is more to life than increasing its speed."
Mohandas Gandhi Quotes : http://www.brainyquote.com/quotes/authors/m/mohandas_gandhi.html


Other related posts: