RE: Table Joins

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <gints.plivna@xxxxxxxxx>, <genegurevich@xxxxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 25 May 2006 05:59:35 -0400

That was a pretty slick way to benchmark the comparison filter costs.

For the original poster I'm wondering whether they are looking for the
solution to a particular case or for a development guideline.

For a particular case, I'll echo the "measure it" sense of the thread.

For a development guideline, my prediction is that over a wide range of
joins of a wide range of objects, that numbers will indeed win.

It appears from the results below that number have a little less than a 1
percent advantage on the comparison, and your mileage may vary.
For example, if you used rownum+a quadrillion, then the character
comparisons will probably take longer by a bigger margin. (Gints - if you do
that on exactly your system for comparison that might be useful to the
thread.)

Cross type comparisons, as Gints succintly demonstrated, are significantly
more expensive.

Over the broad case of pre-designed queries and joins, you'll probably have
indexes of proper types designed to support the joins, and the number of
"keys" per block and row to block selectivity will probably dominate.
Numbers are not always shorter than character fields, but usually they are
in cases of large numbers of rows and reasonable row selectivity. If two
indexes of roughly equal row to block cohesion have different depths due to
size at the steady state number of rows you expect to have in the table,
that will make a difference unless the join is traversing the leafs (leaves
if you prefer).

So whatever produces a smaller overall key tends to be good.

Whatever tends to have correspondence to the order new rows are inserted
into the database tends to be good, because that tends to result in good row
to block cohesion. Generating a need for later periodic maintenance of row
to block cohesion with respect to a join key at design time is a bad idea.
(And in some specific expensive cases such periodic maintenance is one of
the few remaining valid justifications for rebuild maintenance, meaning bad
working hours for DBAs, which in turn is bad for your data since sleep
deprivation is a leading cause of operator error.) Sequence generated
artificial numeric keys, aside from any bugs in any versions or data
accidents that may generate tree skewing that is curable in any case, have
historically had excellent row to block cohesion.

Mark Bobak's advice will probably tend to get you good cohesiveness between
row selection and block selection as well. Following the natural logic of a
situation tends to do that, especially with dates. Also notice that if
generating a numeric key as a proxy for a name means the query will be
forced to do an extra lookup and pull an extra table into the join, that is
not likely to be a win for an individual query where you measure joins, but
it might well be an overall benefit to your relational model.

Jared's advice about using the tool probably tends to get you a small key,
but I'm at a loss to predict whether it will tend to produce row to block
cohesion. It certainly might, and in any case you don't always have a reason
to suspect that one or another key will dominate queries in future use and
that tool is unlikely to be worse than neutral.

If you decide to benchmark entire designs, be extra careful about accidents
of sample size and row to block cohesion. If your data and keys may be
frequently updated in your steady state system, the best row to block
cohesion may also have the side effect of tending to generate hot blocks.
Since the general cure for that is reducing the number of rows per block,
making there tend to be more blocks to handle per query, that is not to be
ignored.

All my musings about tendencies, of course, pale beside testing actuals to
determine what is best when you have actuals to test.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]On
Behalf Of Gints Plivna
Sent: Thursday, May 25, 2006 2:42 AM
To: genegurevich@xxxxxxxxxxxxxxxxxxxxx
Cc: oracle-l
Subject: Re: Table Joins

I'd say that for such a comparison you probably don't even need
anything to join.
For my very very simple test below you can see that difference for
comparison of 5'877'112 rows two identical number columns needed
mostly 01.02 sec but for the same comparison for the same values of
varchar columns needed 01.03 sec.

BUT as you can imagine the worst case is to compare number to varchar2
because it takes 2 times more 02.04 sec.

So that's the real comparison that shoul be avoided.

Of course you can use Kyte's nice tool runstats to get more precise
numbers for more indicators and get better picture of all that.

Gints Plivna

SQL> create table big as select rownum rn1, rownum rn2,
to_char(rownum) rn3, to_char(rownum) rn4
  2  from dba_source;

Table created.

Elapsed: 00:00:06.04

SQL> insert into big select * from big;

734639 rows created.

Elapsed: 00:00:04.09
SQL> /

1469278 rows created.

Elapsed: 00:00:09.06
SQL> /

2938556 rows created.

Elapsed: 00:00:19.04
SQL> select count(*) from big where rn1 = rn2;

  COUNT(*)
----------
   5877112

Elapsed: 00:00:02.01
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.02
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.02
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.02
SQL> select count(*) from big where rn3 = rn4;

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.03
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.03
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.03
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:01.03
SQL> select count(*) from big where rn1 = rn3;

  COUNT(*)
----------
   5877112

Elapsed: 00:00:02.05
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:02.04
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:02.05
SQL> /

  COUNT(*)
----------
   5877112

Elapsed: 00:00:02.04


2006/5/25, genegurevich@xxxxxxxxxxxxxxxxxxxxx
<genegurevich@xxxxxxxxxxxxxxxxxxxxx>:
> Hi all:
>
> One of my developers insists that joins by numerical fields result in
> better preformance than the joins by
> character fields. I don't remember reading much on this point. Is there
any
> truth in it?
>
> thank you
>
> Gene Gurevich
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: