Re: ** histograms

  • From: "Stephen Booth" <stephenbooth.uk@xxxxxxxxx>
  • To: ajoshi977@xxxxxxxxx
  • Date: Sun, 19 Nov 2006 20:22:58 +0000

On 19/11/06, A Joshi <ajoshi977@xxxxxxxxx> wrote:
Hi,
  About use of histograms : I think histograms are useful for indexes on
columns that are not very selective. However I came across note 1031826.6 on
metalink. About maintenance and space cost. I think space cost is negligible
and can be ignored. About maintenance : does it mean statistics need to be
gather often? Or does it mean some other cost.

Question : Is there any other overhead or any other negative impact of using
histograms?

One downside I've been hearing a lot about over the last couple of
years results from Bind Variable Peeking
(http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#sthref1254).
When a where clause contains bind variables (either explicitly
declared or synthesised due to cursor_sharing=force) the optimizer can
see what they are and use the information to build an execution plan
during the hard parse phase.  If the application executes the same
query again whilst the plan is still cached then it will reuse the
same plan even if the bind variables have different values (no need to
reparse so no peeking).


Suppose you have a table (t) where a field (thefield) has a number of
possible values, one of which  (say 'X') appears in  50% of the
records whilst the others are uniformly distributed over the remaining
50%.  Very highly skewed and obviously you would expect to get a
different plan for the high frequency value (probably a full table
scan) than you would for a low frequency one say 'Y' (probably an
index read anbd access by row id).  You have an index on thefield and
have collected stats with histograms on that table and thefield.

If you don't use bind variables then a queries like:

select [field_list]
from t
where thefield='X';

and

select [field_list]
from t
where thefield='Y';

will both be hard parsed get different execution plans appropriate to
the values used in the where clause.

Now suppose you rewrite the query replaciong the constants with a bind
variable.  The first time the query is parsed the optimizer will use
the value of the bind variable and the histograms to work out the best
execution plan.  The next time it will reuse the same plan.  This is
fine if the values used in the bind variable are of similar
distribution but if they are not then you get issues.

Say the first time through the value is 'X' (appears in 50% of rows)
the optimizer will  peek the bind variable and probably go for a full
table scan as that is the most efficient way to pull back that
proportion of the rows thanks to the wonders of multiblock reads and
read ahead caching in the filesystem/SAN.  The second time through the
value is 'Y' (appears in very few rows), because there is already an
execution plan cached the optimizer won't do the hard parse and so
won't peek the bind variable, it will just use the existing plan (a
full table scan).  A full table scan is a really inefficient way to
read a small proportion of the ows in a table.

Reverse the order of the queries ('Y' first then 'X') and you have the
database doing an index read then access by rowid to retrieve 50% of
the rows in a table.  Hitting at least 50% of the index and at least
50% of the table blocks (probably as single block reads), a really
inefficient way to a large proportion of the rows in a table.




Is it advisable to use histograms just for some tables and some specific
columns or is it OK to just set database wide?


It depends on your app and your data.  If the data is highly skewed
and the app uses bind variables (or you have cursor_sharing=force)
then possibly not.

Stephen
--
It's better to ask a silly question than to make a silly assumption.

http://stephensorablog.blogspot.com/
http://www.linkedin.com/in/stephenboothuk
--
//www.freelists.org/webpage/oracle-l


Other related posts: