Re: artificial values vs null

  • From: "Richard Foote" <richard.foote@xxxxxxxxxxx>
  • To: <niall.litchfield@xxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Sat, 25 Sep 2004 20:47:05 +1000

Hi Niall,

I've seen on quite a number of occasions the damage that "artificial" values
can do in stuffing up Oracle's assumption of range value distributions and
hence Oracle's ability to select an appropriate execution plan.

Setting end dates to some distant future is full of dangers and is a classic
example.

Beware indeed !!

Cheers

Richard
----- Original Message -----
From: "Niall Litchfield" <niall.litchfield@xxxxxxxxx>
To: <nsouto@xxxxxxxxxxxxxx>
Cc: <oracle-l@xxxxxxxxxxxxx>
Sent: Friday, September 24, 2004 7:37 PM
Subject: artificial values vs null


On Thu, 23 Sep 2004 21:21:40 +1000, Nuno Souto <nsouto@xxxxxxxxxxxxxx>
wrote:
> Whatever the optimiser may do with 31 dec 2099,
> it will be a darn long shot better than a NULL value
> that can't be indexed...

you may regret making that statement.... see below.

>
> Still, the correct design is to add a "status" column
> where the "current" can be indicated.  The high date
> is a poor substitute.  But still better than having to do
> range or full table scans to find the current row of a
> particular category.

For this purpose I entirely agree. Though in the case of current
'logins' you can bet your bottom dollar that a crashed session will
still be shown as current when it isn't. At least then you have
demonstrably incorrect data rather than assigning meaning to
meaningless data :)

I have created a script to demonstrate what the 'optimizer' does (10g)
with the use of various strategies for assigning an artificial value
for what should be a NULL. It isn't pretty. You can see the full
results at my site

The script is available at
http://www.niall.litchfield.dial.pipex.com/scripts/tests/date_distribution.s
ql
 and you are welcome to play around with it on test systems yourself.

What I find is that *in the absence of histograms* then using an
artificial value - especially if you do the other thing app designers
do and use the wrong datatype - gives you woefully wrong estimates of
cardinalities. This may or may not matter too much in the case of a
simple single table query, but if the resulting rowset will be used in
a join and the stats for cardinality can be wrong (as they are in at
least one of my cases) by an order of magnitude then what the CBO will
do will most likely not be sensible at all. histograms do fix this,
but at a non-zero cost.

I do of course have to admit that my range scan was chosen not just
because it runs from christmas to one of my best friends birthdays -
but also because it is a good way to illustrate the perils of bad
datatypes.


--
//www.freelists.org/webpage/oracle-l

Other related posts: