RE: Useful Oracle books - C.J. Date theory vs. practicality

  • From: "Lex de Haan" <lex.de.haan@xxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 28 May 2004 20:00:49 +0200

thanks for the debate Dan -- same trick again ...
(nice quoting exercise)

Cheers,
Lex.

---------------------------------------------
visit my website at http://www.naturaljoin.nl
---------------------------------------------

DST: So, how do we handle Spouse_ID for a single person in a Persons table?
DST: I am looking for a theory
DST: that applies well to SQL as actually implemented, and that is directly
DST: useful to developers and DBAs stuck with that implementation.

LDH> I prefer indeed to turn things around -- there is only one NULL,
LDH> so overloading that NULL with multiple meaning in bound to cause
trouble.
LDH> This should be the most important guidance during data modeling;
LDH> You should try to solve these issues in the data model, where the fact
LDH> that someone is married with someone else becomes an entity of its own.
LDH> Or, to quote Hugh Darwen: each table corresponds with exactly one
predicate,
LDH> and the rows are the corresponding propositions evaluating to true.
LDH> in a PERSONS table you store facts about the existence of persons;
LDH> in a MARRIAGES table you store intersection information about persons
with persons.

<snip>

DST: Yes, but what should we do in the meantime, lacking domains? Isn't
using
DST: NULL best, then?

LDH> I can only agree here, unfortunately :-)
LDH> Using nulls is indeed the best thing -- but I repeat my strong opinion:
LDH> as soon as your nulls can have different meanings, you are in trouble.
LDH> try to avoid all nulls of type "inapplicable" with proper data
modeling.

> but wait, sometimes there *is* no value that could never be real for a
given
> column, so we'll have to add whole extra Boolean columns for each such
> no-longer-nullable column to handle specifying whether we really ought to
be
> ignoring the value in that other column over there ...

> LDH> why? can't you represent this by a special value too?
> LDH> (maybe I am missing your point here)
> LDH> anyway, additional logic should NOT be built into the applications,
> LDH> but rather in the methods that are associated with the domains.

DST: Yes, but what should we do in the meantime, lacking domains?

LDH> you got me here ... the only solution I can bring up (again)
LDH> is to change the data model. So far I have not seen a single example
LDH> where I was unable to model all pseudo-nulls away, so the only
remaining
LDH> meaning becomes "unknown". by the way, before you think I am too
fanatic:
LDH> in case you don't care about the reason why information is missing,
LDH> nulls are OK in all cases; but then you should not try to interpret
them!

> but wait, what about the column that starts out seemingly safe for
> designating, say, "-1" as the special "does not apply" value, but ends up
in
> some special case needing "-1" for a *real* value - WHOOPS ...

> LDH> sounds like a serious design mistake to me;
> LDH> don't get me started about serious RDBMS design mistakes :-)

DST: In the real world, with the current number of near-perfect database
DST: designers, can design mistakes be made rare enough that we needn't
worry
DST: about what we'll do when we discover them?

LDH> I assume this is a rhetorical question, right?
LDH> but that's still not an excuse ;-) and certainly not a reaso
LDH> to blame the relational model ot the three-valued logic ;-)

> Imagine that you want data from bar-code readers at a supermarket to go
> straight into a database.
>
> LDH> the "famous" cat food example ;-)
> LDH> I think this is already the wrong way to start;
> LDH> why insist that this data should go straight into the database?
> LDH> this looks like a transaction design mistake to me.
> LDH> what if the customer wants to cancel (forgot his wallet)?

DST: What's wrong with
DST: ROLLBACK;
DST:
DST: ? Seriously, though, for performance reasons, I agree with clumping all
DST: the inserts until they're ready to happen all at once, at the end of
the
DST: transaction, but that still doesn't mean it's necessarily worth the
DST: coding hassle to presort and pre-aggregate (now at the application
layer)
DST: to avoid duplicates, when duplicates are rare enough that the extra
rows
DST: are virtually cost-free, and the rows are never read except as
DST: aggregations. I run into a similar problem when I snapshot performance
DST: data into my own tables, with simple SQL and shell scripts - the data
DST: rarely produces a duplicate row (which really means that the snapshot
saw
DST: the something twice, placing double weight on the event), and it'd be a
DST: hassle (and for what benefit?!) to eliminate the duplicates with such
DST: simple scripting tools, unless I added a purely artificial, useless
DST: Data_Row_ID column, where the data are only ever queried in aggregate.

LDH> duplicates are the biggest mistake ever made in SQL.
LDH> this is why tables are no sets -- so we have to talk about multisets,
LDH> this is why all kind of sound principles suddenly don't hold anymore,
LDH> this is why certain query transformations cannot be performed by the
optimizer,
LDH> this is why we end up with all those exceptions and special case
handling...
LDH> I agree they are virtually cost-free in the context you describe,
LDH> but they can become very expensive in other contexts.
LDH> and again, you are asking for trouble if duplicate rows "sometimes"
happen
LDH> to mean something. if they really mean something, there is an attribute
missing;
LDH> and if they don't mean anything, it is useless and dangerous to store
any row twice.

A very good book about all this (indeed talking about possible future
implementations)
is "the third manifesto" by Chris Date and Hugh Darwen, ISBN 0-201-70928-7.
the first attempts to implement their "tutorial D" language are in the
works.

By the way, I guess we should take further discussion offline,
before guys like Mladen start intervening ;-)

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
BEGIN:VCARD
VERSION:2.1
N:de Haan;Lex
FN:Lex de Haan
ORG:Natural Join B.V.
TEL;WORK;VOICE:+31.30.2515022
TEL;HOME;VOICE:+31.30.2518795
TEL;CELL;VOICE:+31.62.2955714
TEL;WORK;FAX:+31.30.2523366
ADR;WORK:;;Pieter Breughelstraat 10;Utrecht;;3583 SK;Netherlands
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:Pieter Breughelstraat 10=0D=0AUtrecht 3583 
SK=0D=0ANetherlands
URL;WORK:http://www.naturaljoin.nl
EMAIL;PREF;INTERNET:lex.de.haan@xxxxxxxxxxxxxx
REV:20040224T160439Z
END:VCARD

Other related posts: