Re: Join order and intermediate results

  • From: Dan Tow <dantow@xxxxxxxxxxxxxx>
  • To: SMILEYJ@xxxxxxxx
  • Date: Fri, 1 Oct 2004 13:14:54 -0500

I see that Mark Farnham already did a very nice job of answering the question of
how to accomplish this, so I'll focus on the question of "Why?". As it turns
out, the need for this sort of unusual join-tree is quite rare. As you already
noticed, John, Oracle has a strong tendency to join tables one-at-a-time,
something like A to B to get (AB), followed by (AB) to C to get (ABC), followed
by (ABC) to D to get (ABCD),..., and with normal hints like ORDERED (without
using inline-view tricks, that is), all you can force is which table is "A"
(first), which is "B" (second), which is "C" (third), etc., in this
one-at-a-time join style.

So, let's examine what normally happens with a 4-way join where you might want
to join A to B to make (AB), C to D to make (CD), followed by a join
(presumably a hash or a sort-merge join, since there can't be a useful
nest-loops path into a pre-joined result) of (AB) to (CD):

In a normal join tree, one of these 4 tables will be the "detail" table of the
whole query, and joins will reach the other 3 tables by three detail-to-master
joins that begin from that table. Since we can assume there's a join between A
and B and another join between C and D, we can assume, without any significant
loss of generality that A is the detail table, and that the joins look either
like

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
A.FKey3=C.PKey3

or

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
B.FKey3=C.PKey3

Now, assuming, again without loss of generality, that A or B is the detail
table, and that table is large enough that optimization is an issue, it is
almost invariably the case that you are going to need an indexed path such that
the number of logical I/Os to achieve (AB) is at least, roughly, as high as (and
quite possibly several times higher than) the number of rows in the join result
(AB) (since each result row will probably map to a specific table-access by
ROWID, not even counting other indexed accesses and the table access for the
other table). So, if (AB) costs N logical I/Os, it is very unlikely to contain
more than N rows. However, if it contains N rows, and if all the primary keys
are indexed, then there has to be a nested-loops path from (AB) to C and D
*individually* that does *at most* 2N index unique scans and 2N table accesses
by ROWID, *without* pre-joining C and D. Since an index unique scan is very
unlikely to require more than 4 (usually less) logical I/Os, and a table access
by ROWID requires precisely 1 logical I/O, the entire worst-case plan that
drives from (AB) to C and D individually should have an upper-bound cost of 11N
logical I/Os if (AB) cost N logical I/Os. Furthermore, the ratio of *runtime* is
probably even better (a lot better, usually) than 11-to-1 between the whole cost
and the cost of (AB), because, A or B being the detail table, and therefore
probably the biggest table, you are more likely to see physical I/O during the
reads of A and B than during the reads of C and D. Even if the join (CD) was
*free*, that would mean that the best one-at-a-time plan in the style
A-to-B-to-C-to-D or A-to-B-to-D-to-C should cost no more than 11 times as much
as the best plan

(A-to-B)-to-(C-to-D)

Keep in mind that all this is true even restricting ourselves nested-loops joins
to C and D. When we allow for hash joins (one at a time, only where they are
best) to C and/or D, the likelihood of dramatic advantages for
(A-to-B)-to-(C-to-D) grows even smaller.

When you factor in all the practical details, the probable ratio of the best
plan in the one-at-a-time style to the best plan that first pairs tables off is
*usually* <= one (one-at-a-time wins or ties), and *very* rarely more than 2 or
3. In fact, I've known of the possibility of plans joining already-joined
results for years, and it's a trick I haven't needed even once (which is why I
didn't talk about it in my book) to get an excellent result in real-world
tuning.

So, I'm fascinated that you appear to have found a case where the sum of the
costs of (AB) and (CD) was 85 logical I/Os (implying either very small tables
or excellent indexed paths to less than 85 rows for both joins), but did not
find a one-at-a-time path that cost less than (or even close to!) 935 logical
I/Os (11*85). Without going into too much more tedious detail, even unusual
join trees that do not have a single detail table, as long as all joins have a
primary key on one end, and point to a single table on both ends, also obey
this 11-to-1 upper bound (between the best one-at-a-time cost and the best
(AB)-to-(CD) cost) if we allow for hash joins to individual tables. I can only
see a few possibilities:

-You're missing at least one primary-key index.

or

-You didn't find the *best* one-at-a-time join order, and the best one would
cost less than 935 logical I/Os, and run in less than a second.

or

-You have an unusual join tree, where at least one of the joins is many-to-many,
or some even more unusual feature/ I discuss most cases of unusual join trees in
my book , and they usually indicate a functional mistake in the query, though
there are of course exceptions to that rule.

Thanks,

Dan Tow
650-858-1557
www.singingsql.com


Quoting Smiley John - IL <SMILEYJ@xxxxxxxx>:

> Suppose we have four tables A, B, C, and D that are all used in a SELECT
> statement.  Let's further suppose that we know that the best way to join
> these tables (best meaning fastest RT, and fewest LIOs) is to join A to B to
> produce row source AB, join C to D to produce row source CD, then join AB to
> CD.
>
> How do I get Oracle to do this?  At first I simply wrote the query as a
> simple SELECT like this:
>
> SELECT ...
> FROM A, B, C, D
> WHERE ...
>
> When that didn't work and I made sure the table, index, and column stats
> were correct, I tried various hints for join order and join method.  When
> that didn't work, I decided to smack Oracle in the face with the answer like
> this:
>
> SELECT ...
> FROM (SELECT ... FROM A, B WHERE ...) AB, (SELECT ... FROM C, D WHERE ...)
> CD
> WHERE ...
>
> However, it stubbornly refused to join in the correct order and I had to
> result to storing the intermediate results in GTTs and then joining the
> GTTs.
>
> This is not a very satisfying means of solving the problem.  Oracle seems to
> doggedly follow a linear path for satisfying the query, which goes something
> like this:  join two row sources to create an intermediate result, pick
> another row source and join it to the intermediate result to produce a new
> intermediate result, repeat until done (in the case of parallel query, each
> PQ slave seems to follow this same approach and then the query coordinator
> merges the results).
>
> Expressed as a tree, it looks like this (where / | and \ are join
> operations):
>
>      D (or C)
>     /
>   C (or D)
>  /
> A
> |
> B
>
> When what I want is this:
>
>   Result
>  /      \
> A        C
> |        |
> B        D
>
>
> In the case I'm describing, this makes the difference between a response
> time of 0.1 seconds with 85 LIOs vs. a response time of 120 seconds with
> 60,000 LIOs.
>
> I have a feeling I'm going to be smacking my forehead on this one, but I
> just don't see what I'm missing.
>
> John Smiley
>
> P.S.  This is 9.2.0.4
>
> --
> //www.freelists.org/webpage/oracle-l
>

--
//www.freelists.org/webpage/oracle-l

Other related posts: