RE: Useful Oracle books - C.J. Date theory vs. practicality

  • From: "David Kurtz" <info@xxxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 28 May 2004 11:58:06 +0100

PeopleSoft does something similar.  It has the concept of a 'required'
field.  A field where a user must put a value in a field on the
application - ie NOT NULL in the application.

All character and numeric columns are not null, and when a non-required
field does not have a value, PeopleSoft puts a single space in the character
columns and 0 in the numeric columns.  Only Date and long columns that are
not required may be null on the database, all 'required' columns are NOT
NULL on the database.

Indexes contain the 0 or single space values, but histograms can be helpful.
For example, much of the batch processing in Financials is flag or status
driven.  An index on the flag is usually only useful when searching for a
rare value (eg. unposted invoices), a histogram helps the optimiser to make
the right choice.

If you want to create a sparse index where you search for the few not null
values, you have to use a date column.

_________________________
David Kurtz
Go-Faster Consultancy Ltd.
tel: +44 (0)7771 760660
fax: +44 (0)7092 348865
mailto:david.kurtz@xxxxxxxxxxxxxxx
web: www.go-faster.co.uk
PeopleSoft DBA Forum: http://groups.yahoo.com/group/psftdba


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of jo_holvoet@xxxxxxxx
Sent: 28 May 2004 08:28
To: oracle-l@xxxxxxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx; oracle-l-bounce@xxxxxxxxxxxxx
Subject: RE: Useful Oracle books - C.J. Date theory vs. practicality


SAP doesn't use nulls. For char fields they typically use a single space,
for dates (which they store in chars) they use '00000000', etc.etc.
And yes, it is every bit as ugly and horrendous as you describe ... I
assume in their case it is partly a legacy issue and partly a database
independence issue ?

mvg/regards

Jo






Dan Tow <dantow@xxxxxxxxxxxxxx>
Sent by: oracle-l-bounce@xxxxxxxxxxxxx
05/28/2004 02:35
Please respond to oracle-l


        To:     oracle-l@xxxxxxxxxxxxx
        cc:
        Subject:        RE: Useful Oracle books - C.J. Date theory vs.
practicality


I have a few thoughts regarding Date's theory versus practicality:

In all, Date's great, and taught me a lot, all well and good, but I
completely
sympathize with anyone who thinks he at least occasionally neglected
practicality. My main case in point is his position on nulls in _An
Introduction to Database Systems_ (6th Ed.), I quote (p. 123):

"...incorporation of nulls or any similar feature into the model is
premature at
this time."

Date made some good points about how confusing nulls can be, and all the
troubles that relate to them. It is important to understand that null
might
mean "We don't know this property of this entity." OR it might mean "This
property does not *apply* to the entity." (Spouse_ID of a Persons table,
for a
single person, being a good example.) I'd personally like to see
extensions to
SQL to handle these subtleties. The subtleties are there, however, whether
we
forbid nulls or not, and forbidding nulls will only add to the confusion!
Can
any of you actually imagine building a practical RDBMS design without
nulls? (I
admit I haven't read everything Date wrote, so it is possible he
moderated the position elsewhere, or at least addressed the practicalities
he
ignored in the aforementioned text.)

Imagining a null-less database: Two options:

1) Everywhere you have  column that would currently be nullable, declare,
for
that column, some *special* value to handle "we don't know" or "does not
apply"
(or perhaps two values...), and then we'll all diligently remember to keep
track of what that value is for every such column in the DB, and to build
special-case logic throughout our application so that it doesn't do
horrendous
things like sum those special-case values along with the *real* values,
...

but wait, sometimes there *is* no value that could never be real for a
given
column, so we'll have to add whole extra Boolean columns for each such
no-longer-nullable column to handle specifying whether we really ought to
be
ignoring the value in that other column over there, and we build special
logic
into the application to handle refering to these special Boolean
pseudo-null-indicating columns almost everywhere we refer to the
no-longer-allowed-null columns,...

but wait, what about the column that starts out seemingly safe for
designating,
say, "-1" as the special "does not apply" value, but ends up in some
special
case needing "-1" for a *real* value - WHOOPS, I guess we better revert to
another value, or to the special Boolean pseudo-null-indicating columns,
and
we'll have to fix all that special logic that formerly did special
handling on
"-1",...

But wait, what about the column that starts out seeming not to *need*
"does not
apply" or "we don't know", but later is discovered to ocasionally need
this
feature - WHOOPS, I guess we'll have to go back and change all the
application
logic that uses theis column to take account of either some special value
or
some new Boolean pseudo-null-indicating column. (Admittedly, you might
*also*
need to change *some* of your application logic if you made a not-null
column
nullable, but at least *some* of it would likely work without recoding in
that
scenario.)...

Uugghh.

OK, quite likely Date would not have advocated inventing a whole lot of
*new*
values that *mean* null, to replace the old value NULL, so let's look at
option
2:

Everywhere we would currently have a nullable column, let's invent a new
sub-entity (zero-to-one)-to-one with the original entity to avoid nulls.
So, if
we have a Persons table, which originally had a nullable Spouse_ID column,
we'll migrate the Spouse_ID column to a new
Persons_Who_We_Know_Are_Married_And_Whose_Spouse_IDs_We_Know table, which
has
just the columns Person_ID and Spouse_ID, and then we never need a null
Spouse_ID in the table. (We will, however, need a great many outer joins,
which
will assuredly create NULL values in query results, unless we break
current
simple, single queries into vast numbers of separate queries that handle
the
different cases separately!) Imagine a perfectly realistic situation with
30
different columns that can each *independently* hold a null value, today:
You'll need *30* new tables to handle each subset that is not-null for
that
column, and where you would have selected all 30 columns, you'll need a
30-way
outer join (on top of however many joins you'd need for all the *other*
columns
of all the *other* tables!) I can say with some authority that the result
would
be slower and harder to tune, not to say harder to write and maintain! ...

But wait, a column that we think will never need a null value might later
be
found to need nulls, so we'll just have to create these new subset tables
every
time such a case arises, and we'll have to modify our code like crazy to
handle
every such case, as in the first option...

Uugghh, again!

Sure, nulls are confusing, and associated with all sorts of logic
problems, but
those problems are largely *inherent* in the underlying logical problem,
and
pretending they are not helps no one, and taking away the tool of nulls
would
hugely confuse the still-real-and-unavoidable underlying issues!

Did I beat that dead horse sufficiently? Michael mentioned Date's
no-duplicates
rule. For this one, I have much more sympathy. I believe there is a
special
case, though, where duplicates are reasonable and practical, which I'll
illustrate with a concrete example, first, then generalize:

Imagine that you want data from bar-code readers at a supermarket to go
straight
into a database. The natural tuple (to use Date's terminology) would look
something like (Transaction_ID, Bar_Code, Item_Count) (assuming that we
don't
care about details like what order the items were read, or the precise
date-time of each code-reading event, beyond the date-time of the
transaction
as a whole, which we store separately in a Transactions table), but wait,
that
tuple won't handle (without duplicates or special logic) a case where the
cashier fails to group all identical items together and read the code on
one
item while entering the correct Item_Count separately. Again, there seem
to be
two solutions within Date's rules:

#1) Make the application logic (or the RDBMS itself) check, for every row
coming
in, whether the combination (Transaction_ID, Bar_Code) is new, and do an
insert
if it is, else an update (incrementing the Item_Count) if it is not. (Yes,
this
could be handled more gracefully with a new "UPSERT" or extended MERGE
operation that hid the complexity from the coder - nice feature - but the
underlying extra work remains.)

OR

#2) Create a new, purely artificial key Read_Event_ID that is different
for
every code-reading event.

The latter solution is the most common, and is usually a good solution for
this
general kind of problem, but consider this: Here, we truly do not care
about the individual "entities" (Read_Events, I suppose) that this table
represents - we care *only* about the data in *aggregate*, the sum over
the
transaction, the sum over a given Bar_Code, for a given day at a given
store,... Given this, we have no need whatsoever to ever refer to a single
row
of the table, uniquely. Another way to put this (to generalize the
problem) is:

-We have a table of rows that only matter in aggregate.
-The indiviudal rows of the table never require updates.
-There is no detail table (nor is there likely to be) that stores
still-more-detailed data that needs to reference this table as a master
table
with a foreign key.
-It is simpler (or at least faster) to aggregate a few more rows, which
were
inserted with simple INSERT operations, without checking for uniqueness
violations, than to perform UPDATE/INSERT ("UPSERT") operations that
pre-aggregate the rows at the lowest useful level of aggregation. The
example
is claerly like this - the reduction in rowcount if we aggregated on
(Transaction_ID, Bar_Code) would be trivial compared to the cost of
maintaining
uniqueness on (Transaction_ID, Bar_Code) using UPSERT operations, or
adding a
wholly useless Read_Event_ID key that resulted in no rowcount
consolidation at
all.

This general case doesn't come up all that often, I think, but it does
happen.
The main thing is that the entity being tracked must be so trivial that
there
is virtually no chance that you'll *even later* want to point to the rows
individually, or especially to update the rows or to create a details
table
under the current table.

Dan Tow
Author, SQL Tuning, O'Reilly
650-858-1557
www.singingsql.com


Quoting Michael Milligan <Michael.Milligan@xxxxxxxxxxx>:

> Anthony,
>
> Excellent remarks. However, you seem to pitting theory against
practicality
> in a way that does not allow them to coexist. As Date would say, "Theory
is
> practical!" That is actually one of his chapter names. Your example of
> preventing duplicates is a great case in point. What is the meaning of a
> result set with duplicates? What can you do with it? If I see three
> identical rows, meaning there is nothing returned to differentiate them,
how
> can I, or my client use that information? I may as well just do a count.
>
> What you find anal I find brilliant. Every "i" is dotted and every "t"
is
> crossed in Date's logic. It has symmetry and completeness I have rarely
> found elsewhere.
>
> Look at it this way: you may argue that there are times when we should
break
> away from the theoretical to get the practical accomplished. But every
time
> you "break away" from the theory, you making an exception you will have
to
> keep track of and account for. Yes, you can duplicate data to prevent
joins.
> But make sure you have a trigger to handle the extra update, etc., etc.
> Sometimes it's worth it, sometimes it gets out of hand.
>
> I enjoyed your remarks.
>
> Thanks,
>
> Mike
>
> -----Original Message-----
> From: Anthony Molinaro [mailto:amolinaro@xxxxxxxx]
> Sent: Thursday, May 27, 2004 3:54 PM
> To: oracle-l@xxxxxxxxxxxxx
> Subject: RE: Useful Oracle books
>
>
> Mike,
> Date's textbooks are very good, but FUNDAMENTALS OF DATABASE SYSTEMS by
> Navathe and Elmasri is much better (imho). I agree that his relational
> database writings books are excellent, but slightly anal. Some of his
> ideas just would not be practical in the real word. He loves taking
> shots at sql, which is fine, but some of his suggestions are
> unreasonable.
>
> For example, relations, by definition, should not have duplicates.
> Relational algebra operations (on paper!) can perform selection and=20
> projection operations on these relations and do not return duplicate
> tuples.=20
> He would like sql to do the same...=20
> Think about the overhead the sql language would to have to incur to
> always select distinct under the covers, on every table, on every join.
> I agree with his views on nulls but not sure his "special value" or
> "default value" scheme is much better though.
>
> As for Celko, I feel he's more concerned with the practical nature of
> databases/sql, not theory; whereas Date wants to put theory into
> practice. In any case, without the use (or knowledge) of analytics Celko
> has written some nifty sql. =20
>
> Ryan - break out an old math text and brush up on set theory, after that
> you'll find relational algebra a breeze
>
>  - ant
>

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------



----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: