Re: artificial values vs null

  • From: "Richard Foote" <richard.foote@xxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Sun, 26 Sep 2004 00:57:02 +1000

----- Original Message -----
From: "Nuno Souto" <nsouto@xxxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Sent: Friday, September 24, 2004 11:38 PM
Subject: Re: artificial values vs null


----- Original Message -----
From: "Niall Litchfield" <niall.litchfield@xxxxxxxxx>

> and use artificial values instead. This wasn't the focus of the
> original discussion, which is why I've renamed the subject line, but
> is an area where not using NULL when one should can seriously mess
> with the optimizer.

> This is where I don't get it.  I'm trying to understand
> where using a specific value in a column that is indexed can
> be worse than using a NULL (thereby for sure causing FTS).

Hi Nuno,

It's because by default, Oracle assumes an even distribution of values
between the low and high values of your (date) column data.

Simplistically, if you have 1000 days worth of data between your lowest and
highest dates, and you perform a date search looking for a range of say 100
days (between 1 Jan 2004 and 10 April 2004 or so), then Oracle calculates
the cardinality to be approx 10% of data:

(high search value - low search value) / (highest value - lowest value) *
100

However, if you insert an artificial high (or low) value that isn't
representative of your data distribution, this "stuffs" up the above formula
and Oracle incorrectly assumes a much smaller % of data to be returned than
is valid. Setting a high date of say year 4000 means that any range scan of
your "real" date distribution will be considered incredibly small (even
potentially when searching for *all* your data) and push Oracle to perhaps
use indexes when FTS are more appropriate.

Sure, your accesses to the year 4000 data may be fine, but all other date
range based scans will not be so great (unless you now look at using
histograms and the various overheads that then implies).

As soon as you start messing with the even distribution of Oracle's
perception of the data, you potentially start introducing all sorts of new
problems. I remember coming across a 1/3 party app that used American's
Independence Day as it's "default" date and it's caused all sorts of these
type of problems (hence the same issue with dates way in the past).

Hope it makes sense !!

Cheers

Richard







--
//www.freelists.org/webpage/oracle-l

Other related posts: