RE: Cost of hash join

  • From: "Christian Antognini" <Christian.Antognini@xxxxxxxxxxxx>
  • To: "Jared Still" <jkstill@xxxxxxxxx>
  • Date: Thu, 9 Jun 2005 10:39:37 +0200

Hi Jared
 

Some observations show that the cost is not proportional with the number of 
partitions.

Below you find a simple test. Of course any comment is welcome!!!

 

HTH

Chris

 

 

1) The setup. 

 

CREATE TABLE t1 (id NUMBER, v1 CHAR(1000));

CREATE TABLE t2 (id NUMBER, v1 CHAR(1000));

 

INSERT INTO t1 SELECT rownum, object_name FROM all_objects WHERE rownum <= 10;

COMMIT;

 

INSERT INTO t2 SELECT rownum, object_name FROM all_objects WHERE rownum <= 
10000;

INSERT INTO t2 SELECT rownum, v1 FROM t2;

INSERT INTO t2 SELECT rownum, v1 FROM t2;

INSERT INTO t2 SELECT rownum, v1 FROM t2;

COMMIT;

 

exec dbms_stats.gather_table_stats(user,'t1')

exec dbms_stats.gather_table_stats(user,'t2')

 

2) Two test statements.

 

SELECT t1.v1 FROM t1, t2 WHERE t1.id = t2.id;

SELECT t2.v1 FROM t1, t2 WHERE t1.id = t2.id;

 

3) The execution plan reported by AUTOTRACE. Notice the difference between the 
two hash joins.

 

SQL> SELECT t1.v1 FROM t1, t2 WHERE t1.id = t2.id;

 

Execution Plan

----------------------------------------------------------

SELECT STATEMENT Optimizer=CHOOSE (Cost=1105 Card=20 Bytes=20180)

  HASH JOIN (Cost=1105 Card=20 Bytes=20180)

    TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=10 Bytes=10040)

    TABLE ACCESS (FULL) OF 'T2' (Cost=1101 Card=80000 Bytes=400000)

 

SQL> SELECT t2.v1 FROM t1, t2 WHERE t1.id = t2.id;

 

Execution Plan

----------------------------------------------------------

SELECT STATEMENT Optimizer=CHOOSE (Cost=1139 Card=20 Bytes=20160)

  HASH JOIN (Cost=1139 Card=20 Bytes=20160)

    TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=10 Bytes=30)

    TABLE ACCESS (FULL) OF 'T2' (Cost=1101 Card=80000 Bytes=80400000)

 

4) What does event 10053 say? The cost for the join is different: 2 for the 
first one, 36 for the second one. The different comes, IMO, from the size of 
the inner table set (rcz*cdn): 5*80000=400000 for the first one, 
1005*80000=80400000 for the second one.

 

  Outer table:

    resc: 2  cdn: 10  rcz: 1004  deg: 1  resp: 2

  Inner table: T2

    resc: 1101  cdn: 80000  rcz: 5  deg:  1  resp: 1101

    using join:8 distribution:2 #groups:1

  Hash join one ptn Resc: 2   Deg: 1

      hash_area:  128 (max=128)  buildfrag: 129 probefrag: 167 ppasses: 2

  Hash join   Resc: 1105   Resp: 1105

 

  Outer table:

    resc: 2  cdn: 10  rcz: 3  deg: 1  resp: 2

  Inner table: T2

    resc: 1101  cdn: 80000  rcz: 1005  deg:  1  resp: 1101

    using join:8 distribution:2 #groups:1

  Hash join one ptn Resc: 36   Deg: 1

      hash_area:  128 (max=128)  buildfrag: 129 probefrag: 9932 ppasses: 2

  Hash join   Resc: 1139   Resp: 1139

 

5) What does event 10104 say? The number of partitions is 8 in both cases. 
Therefore this cannot lead to a different cost! What is important, IMO, is the 
size and the number of slots that can be placed in memory. For the first 
statement there's enough place for the whole set, therefore the cost of the 
hash join is very low (2). In the second case we need about 116 slots but only 
17 are available. Therefore lot of them will be written to disk, this lead to 
the cost of 36.

 

Original memory: 1048576

Memory after all overhead: 1024307

Memory for slots: 983040

Calculated overhead for partitions and row/slot managers: 41267

Hash-join fanout: 8

Number of partitions: 8

Number of slots: 15

Multiblock IO: 8

Block size(KB): 8

Cluster (slot) size(KB): 64

 

Original memory: 1048576

Memory after all overhead: 1002495

Memory for slots: 696320

Calculated overhead for partitions and row/slot managers: 306175

Hash-join fanout: 8

Number of partitions: 8

Number of slots: 17

Multiblock IO: 5

Block size(KB): 8

Cluster (slot) size(KB): 40

 

 

>-----Original Message-----

>From: Jared Still [mailto:jkstill@xxxxxxxxx]

>Sent: 08 June 2005 19:34

>To: Christian Antognini

>Subject: Re: Cost of hash join

> 

>Hi Chris,

> 

>Are you sure about that?

> 

>The formula presented in the manual would seem correct.

> 

>That is, the cost is that of scanning A, scanning B, and the cost

>of joining each partition of B (which may be just 1) to A.

> 

>Jared

> 

>On 6/7/05, Christian Antognini <Christian.Antognini@xxxxxxxxxxxx> wrote:

> 

>Hi Naveen

> 

>>cost= (access cost of A * number of hash partitions of

>>B) + access cost of B.

>> 

>>Why is the access cost of A multiplied by no. of hash

>>partitions?

> 

>In my opinion this is not a good one (perhaps for this reason it's no more in 
>the 10g

>documentation...).

> 

>In fact the cost is something like:

> 

>cost = access cost A + access cost B + cost join.

> 

>If the hash table is "small", the cost of the join itself will be close to 0.

>If the hash table is "large", the cost is strongly dependent on the amount of 
>"available" PGA

>memory.


--
//www.freelists.org/webpage/oracle-l

Other related posts: