Re: artificial values vs null

  • From: "Nuno Souto" <nsouto@xxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Sat, 25 Sep 2004 01:24:50 +1000

Thanks Richard. Makes clear sense now.
Niall clarified it as well with his example,
offline.

I still think that a high value is not
"messing" with Oracle's perception of even
data distribution. Quite frankly, in this case
I reckon it's wishful thinking of Oracle to
assume the distribution is even to start with.
As Mark put it: it is an implementation detail,
not a mandatory necessity of design. It's Oracle's
perception that is wrong.  Assuming even distribution
of any value is always dangerous.  As is for
example assuming an ordered distribution.

Does anyone recall when low-high value weightig
became part of the optimiser?  I don't remember it
ever being in 7 or 8i, probably missed it.

In my cases the usual problem is someone
slapping a NULL on the end date of a time series
to mean current row. When the most accessed single
row is precisely that one.  So now instead of a FTS
for a range scan (which is perfectly acceptable)
I end up having to do a FTS to find one row: the
current one.  And given that replacing the NULL
with a high value will only make that single row
access now use an index (no range scan), I'm
relatively safe from this problem.

We have to use histograms anyways: only thing
that will avoid problems with all the large joins
we have.  Short of re-writing the app, which is what
I wanted to do. But once again it got over-ruled
in the name of "budgeting".  I wonder if the long
term consequences of these decisions are EVER
included in these budgets?...

Cheers
Nuno Souto
nsouto@xxxxxxxxxxxxxx
----- Original Message ----- >
>
> Hi Nuno,
>
> It's because by default, Oracle assumes an even distribution of values
> between the low and high values of your (date) column data.
>
> Simplistically, if you have 1000 days worth of data between your lowest 
> and
> highest dates, and you perform a date search looking for a range of say 
> 100
> days (between 1 Jan 2004 and 10 April 2004 or so), then Oracle calculates
> the cardinality to be approx 10% of data:
>
> (high search value - low search value) / (highest value - lowest value) *
> 100
>
> However, if you insert an artificial high (or low) value that isn't
> representative of your data distribution, this "stuffs" up the above 
> formula
> and Oracle incorrectly assumes a much smaller % of data to be returned 
> than
> is valid. Setting a high date of say year 4000 means that any range scan 
> of
> your "real" date distribution will be considered incredibly small (even
> potentially when searching for *all* your data) and push Oracle to perhaps
> use indexes when FTS are more appropriate.
>
> Sure, your accesses to the year 4000 data may be fine, but all other date
> range based scans will not be so great (unless you now look at using
> histograms and the various overheads that then implies).
>
> As soon as you start messing with the even distribution of Oracle's
> perception of the data, you potentially start introducing all sorts of new
> problems. I remember coming across a 1/3 party app that used American's
> Independence Day as it's "default" date and it's caused all sorts of these
> type of problems (hence the same issue with dates way in the past).
>
> Hope it makes sense !!
>
> Cheers
>
> Richard
>
>
>
>
>
>
>
> --
> //www.freelists.org/webpage/oracle-l 

--
//www.freelists.org/webpage/oracle-l

Other related posts: