RE: Optimizer

  • From: "Christian Antognini" <Christian.Antognini@xxxxxxxxxxxx>
  • To: "Freeman Robert - IL" <FREEMANR@xxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 1 Oct 2004 11:18:36 +0200

Hi Robert

Sorry for the previous email... I resend it with the correct title...

>>> For hash joins the optimizer creates the hash table on the smaller=20
>>> row source. In this case it's obviously DEPT.
>
>If this is true, then why isn't the LIO the same regardless of the=20
>order of the rows in the FROM clause with or without the ORDERED hint?=20
>If a hash join ALWAYS uses the smallest table as the hash table,=20
>shouldn't the LIO's be the same in this simple join? It appears to me=20
>that the hint overrides this rule somehow.

The fact that the smallest row source is used to build the hash table =
can be found in the documentation as well, see =
http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10752/optim=
ops.htm#76074.

If you take event 10053 you will see that independently of the order of =
the tables in the from clause the CBO starts to use the smaller table as =
outer table. Since both costs are the same the first one is used.

>As for array size, I don't think that is an issue. While it can be used =

>to reduce LIO's, that is not the point of the question. The point is,=20
>with the array size being the same, why didn't the optimizer take the=20
>better path? What changed when I used the /*+ ORDERED */ hint? Why=20
>didn't Oracle use whatever changed to get me the smaller numbers of=20
>LIO's to begin with.

The point about array size is that the LIO are generated to get the data =
and not to join the tables. Otherwise I cannot explain myself the =
following behavior....

- If I perform the same join *without* getting the rows, the number of =
LIO is the same for both join orders:

SQL> select /*+ leading(emp) */ count(ename) from dept, emp where=20
SQL> emp.deptno =3D dept.deptno;

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D15 Card=3D1 Bytes=3D22)
  SORT (AGGREGATE)
    HASH JOIN (Cost=3D15 Card=3D1562624 Bytes=3D34377728)
      TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 =
Bytes=3D129024)
      TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D4251)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         95  consistent gets
          0  physical reads
          0  redo size
        309  bytes sent via SQL*Net to client
        375  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select /*+ leading(dept) */ count(ename) from dept, emp where=20
SQL> emp.deptno =3D dept.deptno;

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D1 Bytes=3D22)
  SORT (AGGREGATE)
    HASH JOIN (Cost=3D13 Card=3D1562624 Bytes=3D34377728)
      TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D4251)
      TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 =
Bytes=3D129024)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         95  consistent gets
          0  physical reads
          0  redo size
        309  bytes sent via SQL*Net to client
        375  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

- If I get the rows, the number of LIO for the two join orders is =
different:

SQL> select /*+ leading(dept) */ * from dept, emp where emp.deptno =3D=20
SQL> dept.deptno

28672 rows selected.

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D1562624 =
Bytes=3D104695808)
  HASH JOIN (Cost=3D13 Card=3D1562624 Bytes=3D104695808)
    TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D9810)
    TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1990  consistent gets
          0  physical reads
          0  redo size
     957209  bytes sent via SQL*Net to client
      21516  bytes received via SQL*Net from client
       1913  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      28672  rows processed

SQL> select /*+ leading(emp) */ * from dept, emp where emp.deptno =3D=20
SQL> dept.deptno;

28672 rows selected.

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D19 Card=3D1562624 =
Bytes=3D104695808)
  HASH JOIN (Cost=3D19 Card=3D1562624 Bytes=3D104695808)
    TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432)
    TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D9810)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        101  consistent gets
          0  physical reads
          0  redo size
    1214632  bytes sent via SQL*Net to client
      21516  bytes received via SQL*Net from client
       1913  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      28672  rows processed

>This isn't about tuning the statement, it's about understanding why the =

>optimizer does what it does.

I agree on this point... I'm also only interested in understanding the =
CBO!=20
My understanding is the following:
- Oracle reads the outer table and prepares the hash table in memory
- When the first fetch comes the first block of the inner table is read =
and the hash table is probed, as soon as the first rows is found, it is =
sent back to the client
- Then another fetch comes and Oracle has to find out the second row =
which could be or not in the same block as the previous... when it is in =
the same block sometimes it has to read the block again, thus generating =
more LIO on the same block
- The same appends for the subsequent fetches...

Now, when the array size is bigger, Oracle returns all rows associated =
to a single block of the inner table in a single fetch operation, =
therefore only a single LIO for each block is needed! If I look at the =
number of blocks my test tables have, I can see that Oracle has to =
perform at minimum 95 LIO.=20

SQL> select table_name, blocks, num_rows from user_tables where=20
SQL> table_name in ('EMP','DEPT');

TABLE_NAME                         BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
DEPT                                    4          8
EMP                                    91      14336

This is exactly the number of LIO for query 1 and 2, and almost the same =
for query 4. For query 4 the blocks of the table DEPT are probably =
accessed a couple of times. But the outer table, i.e. EMP, only once to =
build the hash table. Therefore, if I use a larger array size, also =
query 3 will generate almost the same number of LIO.

SQL> set arraysize 5000
SQL> select /*+ leading(dept) */ * from dept, emp where emp.deptno =3D=20
SQL> dept.deptno;

28672 rows selected.

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D28672 =
Bytes=3D1634304)
  HASH JOIN (Cost=3D13 Card=3D28672 Bytes=3D1634304)
    TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D8 Bytes=3D160)
    TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        101  consistent gets
          0  physical reads
          0  redo size
     804729  bytes sent via SQL*Net to client
        550  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      28672  rows processed


Therefore, in my opinion, the optimizer correctly costs the join, but, =
since it has no idea which array size is used, it has no possibility to =
correctly cost the retrieval of the rows. Then, it simply minimizes the =
memory need for the hash table by choosing the smaller row source.


Chris

--
//www.freelists.org/webpage/oracle-l

Other related posts: