artificial values vs null

  • From: Niall Litchfield <niall.litchfield@xxxxxxxxx>
  • To: nsouto@xxxxxxxxxxxxxx
  • Date: Fri, 24 Sep 2004 10:37:55 +0100

On Thu, 23 Sep 2004 21:21:40 +1000, Nuno Souto <nsouto@xxxxxxxxxxxxxx> wrote:
> Whatever the optimiser may do with 31 dec 2099,
> it will be a darn long shot better than a NULL value
> that can't be indexed...

you may regret making that statement.... see below. 

> 
> Still, the correct design is to add a "status" column
> where the "current" can be indicated.  The high date
> is a poor substitute.  But still better than having to do
> range or full table scans to find the current row of a
> particular category.

For this purpose I entirely agree. Though in the case of current
'logins' you can bet your bottom dollar that a crashed session will
still be shown as current when it isn't. At least then you have
demonstrably incorrect data rather than assigning meaning to
meaningless data :)

I have created a script to demonstrate what the 'optimizer' does (10g)
with the use of various strategies for assigning an artificial value
for what should be a NULL. It isn't pretty. You can see the full
results at my site

The script is available at
http://www.niall.litchfield.dial.pipex.com/scripts/tests/date_distribution.sql
 and you are welcome to play around with it on test systems yourself.

What I find is that *in the absence of histograms* then using an
artificial value - especially if you do the other thing app designers
do and use the wrong datatype - gives you woefully wrong estimates of
cardinalities. This may or may not matter too much in the case of a
simple single table query, but if the resulting rowset will be used in
a join and the stats for cardinality can be wrong (as they are in at
least one of my cases) by an order of magnitude then what the CBO will
do will most likely not be sensible at all. histograms do fix this,
but at a non-zero cost.

I do of course have to admit that my range scan was chosen not just
because it runs from christmas to one of my best friends birthdays -
but also because it is a good way to illustrate the perils of bad
datatypes.


-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
--
//www.freelists.org/webpage/oracle-l

Other related posts: