RE: re "CBO - A Configuration Roadmap" -- Histograms on Non-Indexed Columns

  • From: "Christian Antognini" <Christian.Antognini@xxxxxxxxxxxx>
  • To: <hkchital@xxxxxxxxxxxxxx>
  • Date: Fri, 29 Dec 2006 07:35:20 +0100

Hi Hemant

You probably already got enough information from Mladen and especially
from Wolfgang... Anyway, let me show you the demo I eventually show when
I present that paper...

> Can you explain the second sentence about Histograms on non-indexed
> columns as well ?
> How would they be useful ?  I would understand Histograms on
> non-indexed columns
> as providing _me_  information about the data in those columns and
> which could allow
> me to make a judgement call as to whether I should index the
> columns.  How does the
> optimizer use Histograms on non-indexed columns ?

First of all let's create a table with a primary key (ID) and two
additional columns (COL1 and COL2). For the purpose of this demo
especially COL1 is interesting... In fact it contains one time all the
values from 1 up to 500 and 500 times the value 666.

SQL> CREATE TABLE t (id, col1, col2, CONSTRAINT t1_pk PRIMARY KEY (id))
  2  AS
  3  SELECT rownum, CASE WHEN rownum>500 THEN 666 ELSE rownum END,
rownum
  4  FROM dba_objects WHERE rownum <= 1000;

Then let's gather statistics without histograms.

SQL> BEGIN
  2   dbms_stats.gather_table_stats(
  3     ownname=>user,
  4     tabname=>'T',
  5     cascade=>TRUE,
  6     method_opt=>'for all columns size 1');
  7  END;
  8  /

Now let's check the query optimizer estimations when COL1 is referenced
in the WHERE clause... Since no histograms are available and that the
query optimizer "only" knows min/max value and number of distinct
values, its estimation for the predicate "t2.col1 = 666" (see line 2 of
the execution plan) is that two rows are returned. Why two? Basically
because it multiply the number of rows store in the table (1000) by the
selectivity of the predicate (1/num_distinct = 1/501 = 0.002). The
interesting thing to notice is that this estimation, i.e. the high
selectivity, leads to a nested loop.

SQL> SET AUTOTRACE TRACE EXP
SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

Execution Plan
------------------------------------------------------
Plan hash value: 2076359606

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     2 |
|   1 |  NESTED LOOPS                |       |     2 |
|*  2 |   TABLE ACCESS FULL          | T     |     2 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T     |     1 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |     1 |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T2"."COL1"=666)
   4 - access("T1"."ID"="T2"."ID")

Then let's gather a histogram on COL1...

SQL> BEGIN
  2   dbms_stats.gather_table_stats(
  3     ownname=>user,
  4     tabname=>'T',
  5     cascade=>TRUE,
  6     method_opt=>'for columns id size 1, col1 size 254');
  7  END;
  8  /

Now the estimations have changed. In fact with the histogram on COL1 the
query optimizer knows that the predicate "t2.col1 = 666" has a much
higher selectivity (circa 0.5). And since the number of returned rows is
much higher now the query optimizer decides to use a hash join... 

SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

Execution Plan
-------------------------------------------
Plan hash value: 2135975663

-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |   506 |
|*  1 |  HASH JOIN         |      |   506 |
|*  2 |   TABLE ACCESS FULL| T    |   506 |
|   3 |   TABLE ACCESS FULL| T    |  1000 |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."ID"="T2"."ID")
   2 - filter("T2"."COL1"=666)


So, the aim of this demo is to show that the query optimizer, thanks to
a histogram, doesn't change the access path for a table but the join
method.


Best regards,
Chris
--
//www.freelists.org/webpage/oracle-l


Other related posts: