Re: Table Joins

  • From: "stephen booth" <stephenbooth.uk@xxxxxxxxx>
  • To: genegurevich@xxxxxxxxxxxxxxxxxxxxx
  • Date: Wed, 24 May 2006 22:48:26 +0100

On 24/05/06, genegurevich@xxxxxxxxxxxxxxxxxxxxx
<genegurevich@xxxxxxxxxxxxxxxxxxxxx> wrote:
One of my developers insists that joins by numerical fields result in
better preformance than the joins by
character fields. I don't remember reading much on this point. Is there any
truth in it?

I don't recall reading anything about that either. I suppose there's a certain logic to it, at the processor level you're comparing numbers and there's probably fewer steps when you're already dealing with numbers rather than character strings. I suspect that, in a real world system, any time savings from using numbers instead of charcaters would be swamped by the other things that take time.

I'd be tempted to say tot eh developer:  "OK, prove it."

It shouldn't be too hard to generate a test case to compare.  Create a
pair of tables each with a common key column and a content column.

e.g.

create table a1
(
thekey varchar2(10),
content varchar2(2000)
);

create table a2
(
thekey varchar2(10),
morecontent varchar2(2000)
);

Make thekey the primary key of table a1 and a forigen key on a2
referencing a1.thekey.  a1.content and a2.content and just for random
junk data to pad it out.  Create some code to load up the tables, so
that some or all of the  rows in a1 have one or more rows with
matching values for thekey in a2 and the column a1.content and
a2.morecontent contain long strings (maybe concatenate the value for
the value of thekey for that row a 20 times), with a large number of
rows.  Maybe use a sequence (and cast to a character string) to
generate the key values?

Query the table:

select a1.content, a2.morecontent
from a1, a2
where a1.thekey=a2.thekey;

and time how long it takes.

Repeat but change the thekey column to type number then compare the times.

If nothing else getting the developer to develop and run a test case
will get them out of your hair for a while.

I can't help wondering if this is really an arguement over natural vs
surrogate keys?  Natural keys tend to be character strings (names,
postal codes &c) where as surrogate keys tend to be numeric (generated
by a sequence or similar).  That's a much bigger debate than I'd want
to get into right now (besides which there are much more real world
relevant factors in that debate than the speed of joining).

Stephen

--
It's better to ask a silly question than to make a silly assumption.

http://stephensorablog.blogspot.com/

'nohup cd /; rm -rf * > /dev/null 2>&1 &'
--
//www.freelists.org/webpage/oracle-l


Other related posts: