Re: artificial values vs null

  • From: Niall Litchfield <niall.litchfield@xxxxxxxxx>
  • To: nsouto@xxxxxxxxxxxxxx
  • Date: Fri, 24 Sep 2004 13:59:59 +0100

On Fri, 24 Sep 2004 21:51:33 +1000, Nuno Souto <nsouto@xxxxxxxxxxxxxx> wrote:
> ----- Original Message -----
> From: "Niall Litchfield" <niall.litchfield@xxxxxxxxx>
> >> Whatever the optimiser may do with 31 dec 2099,
> >> it will be a darn long shot better than a NULL value
> >> that can't be indexed...
> >
> > you may regret making that statement.... see below.
> Not at all. My statement stands: it's not a throwaway comment.
> The use of a specific value as opposed to a NULL will
> cause the optimiser to pick an index if one exists and the
> query is a simple one.

For finding values that are 'CURRENT' I think we both agree that
having a status field is exactly the right approach and that using
NULL is incorrect. I was trying to deal with what happens to queries
that operate on data where one has taken the decision to avoid nulls
and use artificial values instead. This wasn't the focus of the
original discussion, which is why I've renamed the subject line, but
is an area where not using NULL when one should can seriously mess
with the optimizer.

> Of course: if there isn't one or you use an index-stop form of
> the predicate, or you use a complex series of predicates where a
> FTS or hash makes more sense, then it might not.
> I did not say it was the perfect solution.  However, it works when
> the perfect solution is just too expensive to implement.  For
> whatever reasons.

I'd go down the histogram route then... 

> What I fail to understand is what the char/number part of your
> script proves.  That predicates involving mismatched data types
> don't use an index?  Known since V4 and it has nothing to do
> with using artificial dates.

Not at all - there shouldn't be any *mismatched* datatypes in my
queries (if there are I'll amend the script) - it was designed to deal
with those apps/design decisions that use character fields or numerics
for dates. Here the problem can get even worse since the range of
valid values for the field in business terms (i.e of the form
YYYYMMDD)  is so much less than the range of allowable values for the

> > What I find is that *in the absence of histograms* then using an
> > artificial value - especially if you do the other thing app designers
> > do and use the wrong datatype
> Oh no they don't!  Not *THIS* app designer.  The correct way of
> implementing this artificial date is to create a stored function
> that returns the correct type and value.  It is the function
> return value that we compare against in the predicate:
> guaranteed data type match. And guaranteed storage of the
> correct high value, with a couple of triggers.

Not sure if you are reacting personally, or to the idea of using wrong
datatypes as a bad idea. I didn't intend to single any individual out
with the post, and I apologize if it reads like I thought you
personally did bad designs.

> > do will most likely not be sensible at all. histograms do fix this,
> > but at a non-zero cost.
> Agreed.  I still think though that if one has the possibility of
> complex joins, histograms are a must (10g possibly excepted?)
> In this I must grant that DKB is correct, IME: use of histograms
> is mandatory to get the optimiser to pick the correct mix
> of indexes in a complex join.  At least in 7 and 8i. With 9i,
> I've already found a few cases where it makes the correct
> choices, histograms or no histograms. It appears dynamic sampling
> works quite well.

I did think about including a note on this, I have observed under 10g
- don't have any 9i databases with willfully missing stats - that
dynamic sampling means calculate some key stats on a sample of the
query you are about to execute. I.E. I *think* that it gets stats
based on your query rather than just randomly sampling tables in the
query with no stats on them [1]. Thus it may be that having no stats
is better than having insufficient stats.
Niall Litchfield
Oracle DBA

[1] It appears that actually it does block sampling on the tables
*and* a sample based on rownum on your query in case your query is
skewed but I don't have anything like a full understanding of dynamic

Other related posts: