RE: Re[2]: A Cure for Madness -- define the error value

"Mark W. Farnham" <mwf@xxxxxxxx> · Thu, 2 Sep 2004 10:14:14 -0400

Handling nulls consistently is specified in the "rules." (Ted Codd's 12
(which are really 13 if you count zero))

Specification of a value for "some error happened along the way, so I don't
know what the value is and it might not actually be null but it has no
useful meaning in the context you queried it", which I would call "error" is
not defined in any standard for relational rules of which I am aware.

Now this, in my opinion, would be a grand addition to the utility of
relational systems, because it would allow proper definition of functions to
return a defined value rather than an error bail-out when implicit source
domain requirements are violated.

Consider that to_number_x could then be defined to return a value which
could include a number, a null, or error. Then you would get a value set,
and in the case of the original query you could write the predicate to
exclude nulls and errors (which would be definitely the desired action in
the original poster's question and I'm guessing pretty much all the time.)
The existing to_number would continue to just blow up so that existing
program and application suited would continue to function as they do.

(I'm not going to explicate the entire thread where the qualitative and
systematic differences between an error bail-out and a returned error value
are fully considered; suffice it to say one method allows the return of a
well-defined set in cases where other does not. Kemeny and Kurtz provided a
reasonable solution in the original BASIC by defining the basic language
constructs without error handling and subprogram libraries with error
handling. It may have been solved in other languages even prior to that,
although I think the solution in BASIC predated implementing the solution in
LISP, FORTRAN, ALGOL, and COBOL even though those are earlier 3GLs. In
assembly languages I think the treatment was no where systematic or
univerally observed except by conventions.)

Now if you subscribe to the idea that any order of access and processing
*should* return the same value set, then I believe relational algebra as
currently defined requires that you filter the worst case traverse of
candidate rows so that you deliver no value set rather than deliver possibly
different value sets depending on the order of processing. No commercial
implementation is likely to ever implement that notion.

If I understood Chris Date, he was implying that there is an implicit order
of processing depending on the language definition (in this case SQL) that
follows from a natural reading of the query in that language. That is in
conflict with the idea that any order of access and processing *should*
return the save value set, but it does allow for a non worst case traverse
and replaces it with a rule that says "You must return a result set no worse
than the result set from the implicit order of processing of the query."
Hmm. Either that or "You must return the result set that would result from
the implicit order of processing of the query." I don't think he addressed
what you should do if some artificial optimized solution path actually
avoided an error generating case that the natural implicit order would hit.
Either you allow the possibility of different returned value sets based on
order processing or you don't. I'm not sure there is in general a more
efficient way to guarantee exactly the same returned set than actually
executing in the order implicit in the language. But that is in conflict
with the general idea relational queries should not specify the order of
access and in fact should free you from that consideration and the
implementation system is responsible to find a fast way for you.

So anyway, I think SQL is broken in that regard without the notion of an
"error value" universally handled on a par with null. I'm not sure whether
there are other relational query languages that are not similarly broken. In
lieu of a language that implicitly handles domain errors in a defined way, I
think this logically comes down to which erosion of the relational model you
think is less bad: Definition of an implicit required order of execution or
the possibility that some paths of execution can trigger a bail-out error
when other paths of execution would return a result set.

If I got that right, I agree with Wolfgang's insistence that you're
responsible for injecting a dice roll into the situation if you apply a
function to a domain the function is not guaranteed to understand, because I
think it is less destructive to implementation rules for SQL. For Oracle,
that is in fact the way it works, right, and the discussion is about how it
*should* work.

Okay. My head's spinning, so someone help me out here if I wandered into a
false conundrum.

mwf
-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Niyi
Sent: Wednesday, September 01, 2004 6:28 PM
To: oracle-l@xxxxxxxxxxxxx; Wolfgang Breitling
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Re[2]: A Cure for Madness

 WB> I have said it before and I still content that the original problem is
not
> WB> caused by the optimizer incorrectly reordering the sequence of
> operations
> WB> but by the fact that the to_number function is not well defined over
> the
> WB> domain (varchar2). If that was not the case we would not have this
> debate
> WB> as the end result would be the same no matter the order of operations.

I personally think Oracle's to_number is working as it should, throwing an
exception if you try to convert an invalid number, not SILENTLY converting
it to null. The fundamental issue here is with predicate ordering
determining the result of a query. I don't know the solution to this issue,
but I know that  to_number is just one of the scenarios that can run into
it.

For the simple example below, do you suggest that Oracle not throw division
by zero errors too?. I have simulated Jonathan's example, this time with
division by values instead of to_number. I am sure other scenarios can be
easily deduced.

SQL> select *
  2  from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

5 rows selected.

SQL> drop table subtest;

Table dropped.

SQL>
SQL> create table subtest(id_type varchar2(1), divisor number(2));

Table created.

SQL> insert into subtest values('P',1);

1 row created.

SQL>
SQL> insert into subtest values('P',2);

1 row created.

SQL> insert into subtest values('Z',0);

1 row created.

SQL>
SQL> insert into subtest values('Z',0);

1 row created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select *
  2  from subtest
  3  where  1/divisor > 0 and id_type = 'P';

I    DIVISOR
- ----------
P          1
P          2

2 rows selected.

SQL> select *
  2  from subtest
  3  where id_type = 'P' and 1/divisor > 0;
ERROR:
ORA-01476: divisor is equal to zero

no rows selected

SQL>
SQL> select *
  2  from (
  3  select *
  4  from subtest
  5  where id_type = 'P')
  6  where 1/divisor > 0;

I    DIVISOR
- ----------
P          1
P          2

2 rows selected.

SQL>

----- Original Message -----
From: "Jonathan Gennick" <jonathan@xxxxxxxxxxx>
To: "Wolfgang Breitling" <breitliw@xxxxxxxxxxxxx>
Cc: <oracle-l@xxxxxxxxxxxxx>
Sent: Sunday, August 29, 2004 6:54 PM
Subject: Re[2]: A Cure for Madness

> WB> And I am sure everyone would be very upset
> WB> if they implemented the rule that a subquery must be evaluated before
> WB> predicates and selection are applied.
>
> Don't forget that Chris is really arguing that the results
> must be "as if" the view had been materialized. Optimization
> is still ok, and I can think of at least two approaches by
> which the subquery can be merged into the main query while
> retaining that "as if" behavior.
>
> WB> I have said it before and I still content that the original problem is
> not
> WB> caused by the optimizer incorrectly reordering the sequence of
> operations
> WB> but by the fact that the to_number function is not well defined over
> the
> WB> domain (varchar2). If that was not the case we would not have this
> debate
> WB> as the end result would be the same no matter the order of operations.
>
> This is actually an interesting line of thought. I like very
> much your my_to_number example. It dovetails with some
> thinking I've been doing lately about null values. Excellent
> example...
>
> Best regards,
>
> Jonathan Gennick --- Brighten the corner where you are
> http://Gennick.com * 906.387.1698 * mailto:jonathan@xxxxxxxxxxx
>
> Join the Oracle-article list and receive one
> article on Oracle technologies per month by
> email. To join, visit
> http://five.pairlist.net/mailman/listinfo/oracle-article,
> or send email to Oracle-article-request@xxxxxxxxxxx and
> include the word "subscribe" in either the subject or body.
>
>
> Sunday, August 29, 2004, 12:43:59 PM, Wolfgang Breitling
> (breitliw@xxxxxxxxxxxxx) wrote:
> WB> It is probably heresy to contradict the great Chris Date, but I
> disagree
> WB> with this part of his follow-up:
>
> WB> "
> WB> Now, the SQL standard makes it perfectly clear that the result of the
> query
>
> WB> SELECT *
> WB> FROM   R1
> WHERE  NUM >> 0 ;
>
> WB> is defined as follows:
>
> WB> 1.      Evaluate R1.
> WB> 2.      Restrict the result of the previous step to just those rows
> satisfying NUM >> 0.
> WB> 3.      Project the result of the previous step over all of its
> columns
> WB> (which is effectively a no op, of course).
>
> WB> In other words, the inner subquery must be evaluated before the outer
> WHERE
> WB> and SELECT clauses are executed (hence my unequivocal no to the
> question
> WB> "Is this rewriting on the part of the optimizer valid?").  But there's
> WB> still a little more to be said.  "
>
> WB> Maybe that is the SQL standard, I don't know, I don't have the text of
> it,
> WB> but it is certainly not part of the relational theory whose strength
> is the
> WB> fact that the operations selection (where clause), projection (select
> WB> clause) and join can be freely interchanged as long as the rules laid
> out
> WB> by the theory are followed. And none of the relational databases
> implement
> WB> Chris' 1-2-3 order of operation. And I am sure everyone would be very
> upset
> WB> if they implemented the rule that a subquery must be evaluated before
> WB> predicates and selection are applied. Think of the emp table (EMPNO,
> ENAME,
> WB> JOB, MGR, HIREDATE, SAL, COMM, DEPTNO) and create a view that excludes
> the
> WB> sensitive data: emp_vw (select EMPNO, ENAME, JOB, MGR, DEPTNO from
> emp).
> WB> Now, according to Chris' stand, a query like
> WB> select * from emp_vw where empno = 7369 would not be allowed to apply
> the
> WB> predicate to the underlying emp table but would need to evaluate the
> view
> WB> first ( which consists solely of a select , i.e. a temporary rowsource
> WB> consisting of the selected columns from the entire emp table has to be
> WB> built ) before applying the restriction "where empno = 7369". Imagine
> the
> WB> performance.
>
> WB> I have said it before and I still content that the original problem is
> not
> WB> caused by the optimizer incorrectly reordering the sequence of
> operations
> WB> but by the fact that the to_number function is not well defined over
> the
> WB> domain (varchar2). If that was not the case we would not have this
> debate
> WB> as the end result would be the same no matter the order of operations.
> To
> WB> prove it I created a function my_to_number which returns a NULL (i.e.
> WB> undefined) if presented with something that does not represent a
> number:
> WB> create or replace function my_to_number(input in varchar2)
> WB> return number is
> WB> l_num number := NULL;
> WB> begin
> WB>   begin
> WB>    l_num := to_number(input);
> WB>   exception
> WB>    when others then null;
> WB>   end;
> WB>   return l_num;
> WB> end;
>
> SQL>> create table subtest (flag varchar2(1), num varchar2(5));
> WB> Table created.
> SQL>> insert into subtest values('N','123');
> WB> 1 row created.
> SQL>> insert into subtest values('X','123');
> WB> 1 row created.
> SQL>> insert into subtest values('Y','pqr');
> WB> 1 row created.
> SQL>> insert into subtest values('N','456');
> WB> 1 row created.
> SQL>> insert into subtest values('Z','ijk');
> WB> 1 row created.
> SQL>> commit;
> WB> Commit complete.
> SQL>> analyze table subtest compute statistics;
> WB> Table analyzed.
>
> WB> Executing the query as is produces the error:
> SQL>> SELECT SELECT FLAG, NUM
> WB>    2  FROM ( SELECT FLAG, TO_NUMBER ( NUM ) NUM
> WB>    3         FROM   SUBTEST
> WB>    4         WHERE  FLAG = 'N' )
> WB>    5  WHERE  NUM > 0 ;
>
> WB> ERROR:
> WB> ORA-01722: invalid number
>
> WB> Forcing the subquery to be evaluated first gives us the result
> SQL>> SELECT SELECT FLAG, NUM
> WB>    2  FROM ( SELECT rownum, FLAG, TO_NUMBER ( NUM ) NUM
> WB>    3         FROM   SUBTEST
> WB>    4         WHERE  FLAG = 'N' )
> WB>    5  WHERE  NUM > 0 ;
>
> WB> F        NUM
> WB> - ----------
> WB> N        123
> WB> N        456
> WB> 2 rows selected.
>
> WB> Using the extended TO_NUMBER function also gives the same result,
> proving
> WB> that the reordering does not affect the final resultset.
> SQL>> SELECT SELECT FLAG, NUM
> WB>    2  FROM ( SELECT FLAG, my_TO_NUMBER ( NUM ) NUM
> WB>    3         FROM   SUBTEST
> WB>    4         WHERE  FLAG = 'N' )
> WB>    5  WHERE  NUM > 0 ;
>
> WB> F        NUM
> WB> - ----------
> WB> N        123
> WB> N        456
> WB> 2 rows selected.
>
>
> WB> At 03:19 PM 8/27/2004, you wrote:
>
>>>Some of you might have read the article I recently posted on
>>>"Subquery Madness!":
>>>
>>>http://five.pairlist.net/pipermail/oracle-article/2004/000012.html
>>>
>>>Chris Date took an interest in the issue, and he wrote a
>>>fascinating follow-up, "A Cure for Madness," which just went
>>>live today:
>>>
>>>http://www.dbdebunk.com/page/page/1351381.htm
>>>
>>>Please note that Chris is *not* saying that the subquery
>>>results need to be materialized, but only that the final
>>>results need to be "as if" they had been. I can think of at
>>>least two ways to merge the subquery and yet maintain the
>>>required "as if" behavior.
>>>
>>>Chris's article is really clear and well-written. I wish I
>>>could write as well as he does. I think you'll find his
>>>article interesting no matter which side of the issue you
>>>personally fall on.
>
> WB> Regards
>
> WB> Wolfgang Breitling
> WB> Centrex Consulting Corporation
> WB> www.centrexcc.com
>
> WB> ----------------------------------------------------------------
> WB> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> WB> ----------------------------------------------------------------
> WB> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
> WB> put 'unsubscribe' in the subject line.
> WB> --
> WB> Archives are at //www.freelists.org/archives/oracle-l/
> WB> FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
> WB> -----------------------------------------------------------------
>
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
> put 'unsubscribe' in the subject line.
> --
> Archives are at //www.freelists.org/archives/oracle-l/
> FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------

----------------------------------------------------------------
To unsubscribe send an email to: oracle-l-request@xxxxxxxxxxxxx
and put 'unsubscribe' in the subject line.
----------------------------------------------------------------

---
To unsubscribe - mailto:oracle-l-request@xxxxxxxxxxxxx&subject=unsubscribe 
To read recent messages - //freelists.org/archives/oracle-l/09-2004

Other related posts: