Re: Join order and intermediate results

From: Dan Tow <dantow@xxxxxxxxxxxxxx>
To: Smiley John - IL <SMILEYJ@xxxxxxxx>
Date: Sun, 3 Oct 2004 19:22:47 -0500
John,

You're most welcome. I have some responses below, in-line.

Yours,

Dan Tow
650-858-1557
www.singingsql.com


Quoting Smiley John - IL <SMILEYJ@xxxxxxxx>:

> Dan,
>
> Thanks for taking the time to provide such a detailed explanation.  I'm
> probably missing something, but it appears that this explanation of how
> one-at-a-time (linear) join orders can match (or nearly so) the performance
> of other join orders is based on the assumption that the tables have a
> master-detail relationship of some kind.  Queries in a data warehouse rarely
> have this property.

It has not been my experience, for even the largest queries I've dealt with,
that the master-detail rule somehow fails to hold for data-warehouse queries.
Admittedly, most (but not all) of my experience with data-warehouse-type
queries ahs been with very large, batch queries running against mixed-use
databases, rather than pure data-warehouse applications, so my set of
dataopints surely differs from yours. I think I have pretty good theoretical
reason to believe that a well-designed data warehouse application would tend to
avoid these, as well, though, as I decribe below.

>                      In this particular case, joining A to B gave a
> relatively small result set and joining C to D gave a relatively small
> result set, so that joining AB to CD was very efficient.
>
> The other combinations were either very inefficient or had no semantic use:
> *     A cannot be joined to C or D (other than a meaningless Cartesian
> product) because they have no columns in common
> *     D cannot be joined to A or B (other than a meaningless Cartesian
> product) because they have no columns in common
> *     B joined to C without filtering B through A and C through D gave an
> enormous intermediate result set
>

OK, now to the theoretical argument for why we should tend to avoid these
many-to-many joins, using your case as an example. I'll make a couple of guesses
about the undescribed details of your case just to make this concrete, but
alternate examples consistent with the information you provide should look much
the same:

You have a normal join from A to B, which I'll guess is in the direction (toward
the primary key of) B, with a super filter on A, such that the join of A and B
has at most a few hundred rows:

A (Fa<<1)
|
V
B

with a join that looks like A.Fkey1=B.Pkey1. Both A and B are very large (likely
A being largest, since it is the detail table), but the filter condition on A is
so selective that the filter ratio on A is a very small fraction, maybe
Fa=0.0001 or less, resulting in a result set for (AB) that is Fa*Ca, where Ca
is the cardinality of A. (This assumes, as is usually the case, that the join
itself, from A to B is "non-filtering.")

Likewise, you have a normal join from C to D, which I'll assume is in that
direction, which also returns at most a few hundred rows:

C
|
V
D (Fd<<1)


with a join that looks like C.Fkey2=D.Pkey2. Both C and D are very large (likely
C being largest, since it is the detail table), but the filter condition on D is
so selective that the filter ratio on D is a very small fraction, maybe
Fd=0.0001 or less, resulting in a result set for (AB) that is Fd*Cc, where Cc
is the cardinality of C. (This assumes, as is usually the case, that the join
itself, from C to D is "non-filtering.")

Finally, you have a many-to-many join between B and C:

B--C

on some many-to-many match like B.Colx=C.Colx, where the join columns are not
even close to unique on either end of the join:

A (Fa<<1)
|
V
B--C
   |
   V
   D (Fd<<1)

I agree that in this scenario, you're better off doing as you are doing, a
many-to-many join of (AB) and (CD), rather than following a one-at-a-time path
from A to B to C to D (if Fa<Fd, generally), or D to C to B to A (if Fd<Fa,
generally). If I were writing the application, though, I'd opt for a third
alternative:

Let's make the problem concrete by guessing that (AB) and CD), with the great
filters on A and D, each return 30 rows by themselves. If that's the case, then
your full query will return anywhere between 0 rows (if the set of B.Colx values
in (AB) has no overlap with the set of C.Colx values) and 900 rows (if all rows
in (AB) have a single B.Colx value, and it is the same as C.Colx for every row
in (CD)). In every case, the set of combinations (AB) and combinations (CD),
queried separately, and perhaps sorted by Colx for convenience, has no less
information than the result of your full query, but the upper limit of the
total number of rows returned by a query of (AB) separate from a query of (CD)
is 60, in my example, instead of 900, and even if you get lucky, and find few
many-to-many matches after applying the filters, you'll still do just as much
work in the database (if you optimize as you have) querying ABCD together as
you would querying (AB) and (CD) separately.

In the single-query result, for any given value of Colx, you have a Cartesian
product of all the rows in (AB) having that value, and all the rows of (CD)
having that value, but the Cartesian product reveals absolutely nothing more
than the two sets, (AB) and (CD), queried separately, and sorted for
convenience by Colx, with perhaps a post-query step to discard rows having Colx
values not found in the other set.

Any time you have a many-to-many join, you have potential for this sort of
Cartesian-product-generated redundancy, and it is rarely useful, in my
experience. More often, developers fail to even test for a case where the
Cartesian product is really many-to-many (for example, only testing cases where
(AB) or (CD) happen to be unique on Colx after the very powerful filters are
applied. In these cases, developers often assume that, like their test cases,
rows of the Cartesian product will map one-to-one with some entity, for
example, the entity that maps to table A, and will have application logic that
fails in scary ways when it turns out that sometimes the query returns multiple
rows for a given "A" entity/row. When the application explicitly breaks the
logic up to handle (AB) and (CD) as separate queries (which have no failure to
map to a table entity, having only something-to-one joins, and explicitly
handles the combinations of these two sets for any given Colx, then this sort
of logic mistake becomes much less likely.

> One could argue that the data model was not suited to this query, but that
> doesn't solve the problem and the data model may be very well suited to many
> other classes of queries that are useful to end users.
>
> Also, this sentence left me wondering how non-Oracle databases store and
> retrieve data:
>
> "So, if (AB) costs N logical I/Os, it is very unlikely to contain more than
> N rows."
> I assume this is not referring to Oracle databases since it is very easy to
> write meaningful queries in Oracle that return many more rows than the
> number of LIOs needed to retrieve them.  Multiple rows per block and a good
> clustering factor make this a snap.

I was referring to Oracle, but I oversimplified, looking at the worst case,
where clustering doesn't help. As near as I can tell, clustering didn't do a
thing for reducing logical I/O in earlier Oracle versions, though it was very
useful for reducing physical I/O. I have noticed that Oracle seems to take
advantage of clustering for logical I/O reduction, though I don't know
how much CPU (which is what *really* matters) that really saves - whether or not
clustering reduces logical I/O, the database must surely do *some* per-row work
to do a table-access by rowid, and your runtime ratios were consistent with
your logical I/O ratios
>
> And while the next statement is true, I didn't see how I could apply it
> except in unusual cases:
>
> "However, if it contains N rows, and if all the primary keys are indexed,
> then there has to be a nested-loops path from (AB) to C and D
> *individually* that does *at most* 2N index unique scans and 2N table
> accesses by ROWID, ..."
> This is only true if (1) all of the tables follow a master-detail chain and
> (2) I'm joining from the detail rows first (both of which you've stated as
> assumptions).  However, this bounding formula only applies to the class of
> queries where it is more efficient to filter the detail rows first and then
> join to the parents.  It appears to ignore the other very large class of
> useful queries where it is more efficient to start with the master rows
> first and work our way down.

Remember, I set the problem up at the beginning by assuming that the join tree
was "well-formed", my own quote:

> In a normal join tree, one of these 4 tables will be the "detail" table of
> the whole query, and joins will reach the other 3 tables by three
> detail-to-master joins that begin from that table.

I further assumed without loss of generality that said "detail" table was either
A or B. (The argument just substitutes "C" for A and "D" for "B" is you lose
this assumption.) With these assumption, yes, you can reach the rest of the
tables by detail-to-master joins. Now, you're absolutely right that a better
path might be to start from C or D, of one of those has a better filter than
either A or B (and I made no assumption, needing none, about whether (AB)
started from the master or from the detail). However, remember that I only said
that a nested-loops path from (AB) needed at most that many logical I/Os (far
less than your query) to reach C and D, not that this was the best path - I'm
looking at upper limits, here, and the upper limit looked *way* better than the
one-at-a-time path you got. This is explained by your revelation that my
assumption of this being a normal join tree was wrong.

>
>
> Finally, this statement seems to imply that one-at-a-time paths can always
> provide reasonable approximations to optimal join paths in terms of LIOs
> (maybe I'm reading too much into it):
>
> "So, I'm fascinated that you appear to have found a case where the sum of
> the costs of (AB) and (CD) was 85 logical I/Os (implying either very small
> tables or excellent indexed paths to less than 85 rows for both joins), but
> did not find a one-at-a-time path that cost less than (or even close to!)
> 935 logical I/Os (11*85)."
> It seems to me that the one-at-a-time approach has some significant
> drawbacks for certain classes of queries if the best it can achieve is an
> order of magnitude increase in LIOs over other join paths.  The optimal join
> order in this case just leaps right out at you if you look at the schema and
> the data.  It would be very nice if the CBO could see it too and I didn't
> have to resort to trickery such as adding a pseudo-column like ROWNUM in the
> inline views to force it out of the one-at-a-time path.  Not only that,
> one-at-a-time paths are serial.  Individual row source scans and joins can
> be performed in parallel, but the overall structure of the join order is
> serial whereas nonlinear join orders lend themselves to parallel execution
> by their very nature.  This would seem to be a good reason to invest in
> improving the CBO to consider nonlinear join orders in parallel processing
> environments.
>

One-at-a-time join paths do not have to use serial plans, either - nothing
inherently restricts the work from the driving table from being divided up. I
*very* rarely find a query that needs a parallel plan to perform just fine,
however, once it is well tuned. More often, I find parallelism hiding just how
inefficient a non-optimal plan is, by throwing resources at it.

Yes, I am saying that one-at-a-time paths tend to perform roughly as well or
better than more exotic paths, *in my own experience*, and the argument is
based on my assumption of a "normal" join tree. The best counterexamples I've
found have been queries that were wrong from a logical point of view, doing
Cartesian products or many-to-many joins that the application was better off
without. You may have found examples of functionally useful many-to-many joins
(where the "many" on both sides really is "many" not "nearly one"), I grant - I
just haven't seen them in my own experience, save maybe once in a blue moon.

> Regards,
>
> John Smiley

--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: Join order and intermediate results
  - From: jaromir nemec
References:
- Re: Join order and intermediate results
  - From: Smiley John - IL
Re: Join order and intermediate results

Other related posts: