thanks for the debate Dan -- same trick again ... (nice quoting exercise) Cheers, Lex. --------------------------------------------- visit my website at http://www.naturaljoin.nl --------------------------------------------- DST: So, how do we handle Spouse_ID for a single person in a Persons table? DST: I am looking for a theory DST: that applies well to SQL as actually implemented, and that is directly DST: useful to developers and DBAs stuck with that implementation. LDH> I prefer indeed to turn things around -- there is only one NULL, LDH> so overloading that NULL with multiple meaning in bound to cause trouble. LDH> This should be the most important guidance during data modeling; LDH> You should try to solve these issues in the data model, where the fact LDH> that someone is married with someone else becomes an entity of its own. LDH> Or, to quote Hugh Darwen: each table corresponds with exactly one predicate, LDH> and the rows are the corresponding propositions evaluating to true. LDH> in a PERSONS table you store facts about the existence of persons; LDH> in a MARRIAGES table you store intersection information about persons with persons. <snip> DST: Yes, but what should we do in the meantime, lacking domains? Isn't using DST: NULL best, then? LDH> I can only agree here, unfortunately :-) LDH> Using nulls is indeed the best thing -- but I repeat my strong opinion: LDH> as soon as your nulls can have different meanings, you are in trouble. LDH> try to avoid all nulls of type "inapplicable" with proper data modeling. > but wait, sometimes there *is* no value that could never be real for a given > column, so we'll have to add whole extra Boolean columns for each such > no-longer-nullable column to handle specifying whether we really ought to be > ignoring the value in that other column over there ... > LDH> why? can't you represent this by a special value too? > LDH> (maybe I am missing your point here) > LDH> anyway, additional logic should NOT be built into the applications, > LDH> but rather in the methods that are associated with the domains. DST: Yes, but what should we do in the meantime, lacking domains? LDH> you got me here ... the only solution I can bring up (again) LDH> is to change the data model. So far I have not seen a single example LDH> where I was unable to model all pseudo-nulls away, so the only remaining LDH> meaning becomes "unknown". by the way, before you think I am too fanatic: LDH> in case you don't care about the reason why information is missing, LDH> nulls are OK in all cases; but then you should not try to interpret them! > but wait, what about the column that starts out seemingly safe for > designating, say, "-1" as the special "does not apply" value, but ends up in > some special case needing "-1" for a *real* value - WHOOPS ... > LDH> sounds like a serious design mistake to me; > LDH> don't get me started about serious RDBMS design mistakes :-) DST: In the real world, with the current number of near-perfect database DST: designers, can design mistakes be made rare enough that we needn't worry DST: about what we'll do when we discover them? LDH> I assume this is a rhetorical question, right? LDH> but that's still not an excuse ;-) and certainly not a reaso LDH> to blame the relational model ot the three-valued logic ;-) > Imagine that you want data from bar-code readers at a supermarket to go > straight into a database. > > LDH> the "famous" cat food example ;-) > LDH> I think this is already the wrong way to start; > LDH> why insist that this data should go straight into the database? > LDH> this looks like a transaction design mistake to me. > LDH> what if the customer wants to cancel (forgot his wallet)? DST: What's wrong with DST: ROLLBACK; DST: DST: ? Seriously, though, for performance reasons, I agree with clumping all DST: the inserts until they're ready to happen all at once, at the end of the DST: transaction, but that still doesn't mean it's necessarily worth the DST: coding hassle to presort and pre-aggregate (now at the application layer) DST: to avoid duplicates, when duplicates are rare enough that the extra rows DST: are virtually cost-free, and the rows are never read except as DST: aggregations. I run into a similar problem when I snapshot performance DST: data into my own tables, with simple SQL and shell scripts - the data DST: rarely produces a duplicate row (which really means that the snapshot saw DST: the something twice, placing double weight on the event), and it'd be a DST: hassle (and for what benefit?!) to eliminate the duplicates with such DST: simple scripting tools, unless I added a purely artificial, useless DST: Data_Row_ID column, where the data are only ever queried in aggregate. LDH> duplicates are the biggest mistake ever made in SQL. LDH> this is why tables are no sets -- so we have to talk about multisets, LDH> this is why all kind of sound principles suddenly don't hold anymore, LDH> this is why certain query transformations cannot be performed by the optimizer, LDH> this is why we end up with all those exceptions and special case handling... LDH> I agree they are virtually cost-free in the context you describe, LDH> but they can become very expensive in other contexts. LDH> and again, you are asking for trouble if duplicate rows "sometimes" happen LDH> to mean something. if they really mean something, there is an attribute missing; LDH> and if they don't mean anything, it is useless and dangerous to store any row twice. A very good book about all this (indeed talking about possible future implementations) is "the third manifesto" by Chris Date and Hugh Darwen, ISBN 0-201-70928-7. the first attempts to implement their "tutorial D" language are in the works. By the way, I guess we should take further discussion offline, before guys like Mladen start intervening ;-) ---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html -----------------------------------------------------------------
BEGIN:VCARD VERSION:2.1 N:de Haan;Lex FN:Lex de Haan ORG:Natural Join B.V. TEL;WORK;VOICE:+31.30.2515022 TEL;HOME;VOICE:+31.30.2518795 TEL;CELL;VOICE:+31.62.2955714 TEL;WORK;FAX:+31.30.2523366 ADR;WORK:;;Pieter Breughelstraat 10;Utrecht;;3583 SK;Netherlands LABEL;WORK;ENCODING=QUOTED-PRINTABLE:Pieter Breughelstraat 10=0D=0AUtrecht 3583 SK=0D=0ANetherlands URL;WORK:http://www.naturaljoin.nl EMAIL;PREF;INTERNET:lex.de.haan@xxxxxxxxxxxxxx REV:20040224T160439Z END:VCARD