Hi Robert Sorry for the previous email... I resend it with the correct title... >>> For hash joins the optimizer creates the hash table on the smaller=20 >>> row source. In this case it's obviously DEPT. > >If this is true, then why isn't the LIO the same regardless of the=20 >order of the rows in the FROM clause with or without the ORDERED hint?=20 >If a hash join ALWAYS uses the smallest table as the hash table,=20 >shouldn't the LIO's be the same in this simple join? It appears to me=20 >that the hint overrides this rule somehow. The fact that the smallest row source is used to build the hash table = can be found in the documentation as well, see = http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10752/optim= ops.htm#76074. If you take event 10053 you will see that independently of the order of = the tables in the from clause the CBO starts to use the smaller table as = outer table. Since both costs are the same the first one is used. >As for array size, I don't think that is an issue. While it can be used = >to reduce LIO's, that is not the point of the question. The point is,=20 >with the array size being the same, why didn't the optimizer take the=20 >better path? What changed when I used the /*+ ORDERED */ hint? Why=20 >didn't Oracle use whatever changed to get me the smaller numbers of=20 >LIO's to begin with. The point about array size is that the LIO are generated to get the data = and not to join the tables. Otherwise I cannot explain myself the = following behavior.... - If I perform the same join *without* getting the rows, the number of = LIO is the same for both join orders: SQL> select /*+ leading(emp) */ count(ename) from dept, emp where=20 SQL> emp.deptno =3D dept.deptno; Execution Plan ---------------------------------------------------------- SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D15 Card=3D1 Bytes=3D22) SORT (AGGREGATE) HASH JOIN (Cost=3D15 Card=3D1562624 Bytes=3D34377728) TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 = Bytes=3D129024) TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D4251) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 95 consistent gets 0 physical reads 0 redo size 309 bytes sent via SQL*Net to client 375 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select /*+ leading(dept) */ count(ename) from dept, emp where=20 SQL> emp.deptno =3D dept.deptno; Execution Plan ---------------------------------------------------------- SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D1 Bytes=3D22) SORT (AGGREGATE) HASH JOIN (Cost=3D13 Card=3D1562624 Bytes=3D34377728) TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D4251) TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 = Bytes=3D129024) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 95 consistent gets 0 physical reads 0 redo size 309 bytes sent via SQL*Net to client 375 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed - If I get the rows, the number of LIO for the two join orders is = different: SQL> select /*+ leading(dept) */ * from dept, emp where emp.deptno =3D=20 SQL> dept.deptno 28672 rows selected. Execution Plan ---------------------------------------------------------- SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D1562624 = Bytes=3D104695808) HASH JOIN (Cost=3D13 Card=3D1562624 Bytes=3D104695808) TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D9810) TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1990 consistent gets 0 physical reads 0 redo size 957209 bytes sent via SQL*Net to client 21516 bytes received via SQL*Net from client 1913 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 28672 rows processed SQL> select /*+ leading(emp) */ * from dept, emp where emp.deptno =3D=20 SQL> dept.deptno; 28672 rows selected. Execution Plan ---------------------------------------------------------- SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D19 Card=3D1562624 = Bytes=3D104695808) HASH JOIN (Cost=3D19 Card=3D1562624 Bytes=3D104695808) TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432) TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D327 Bytes=3D9810) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 101 consistent gets 0 physical reads 0 redo size 1214632 bytes sent via SQL*Net to client 21516 bytes received via SQL*Net from client 1913 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 28672 rows processed >This isn't about tuning the statement, it's about understanding why the = >optimizer does what it does. I agree on this point... I'm also only interested in understanding the = CBO!=20 My understanding is the following: - Oracle reads the outer table and prepares the hash table in memory - When the first fetch comes the first block of the inner table is read = and the hash table is probed, as soon as the first rows is found, it is = sent back to the client - Then another fetch comes and Oracle has to find out the second row = which could be or not in the same block as the previous... when it is in = the same block sometimes it has to read the block again, thus generating = more LIO on the same block - The same appends for the subsequent fetches... Now, when the array size is bigger, Oracle returns all rows associated = to a single block of the inner table in a single fetch operation, = therefore only a single LIO for each block is needed! If I look at the = number of blocks my test tables have, I can see that Oracle has to = perform at minimum 95 LIO.=20 SQL> select table_name, blocks, num_rows from user_tables where=20 SQL> table_name in ('EMP','DEPT'); TABLE_NAME BLOCKS NUM_ROWS ------------------------------ ---------- ---------- DEPT 4 8 EMP 91 14336 This is exactly the number of LIO for query 1 and 2, and almost the same = for query 4. For query 4 the blocks of the table DEPT are probably = accessed a couple of times. But the outer table, i.e. EMP, only once to = build the hash table. Therefore, if I use a larger array size, also = query 3 will generate almost the same number of LIO. SQL> set arraysize 5000 SQL> select /*+ leading(dept) */ * from dept, emp where emp.deptno =3D=20 SQL> dept.deptno; 28672 rows selected. Execution Plan ---------------------------------------------------------- SELECT STATEMENT Optimizer=3DALL_ROWS (Cost=3D13 Card=3D28672 = Bytes=3D1634304) HASH JOIN (Cost=3D13 Card=3D28672 Bytes=3D1634304) TABLE ACCESS (FULL) OF 'DEPT' (Cost=3D2 Card=3D8 Bytes=3D160) TABLE ACCESS (FULL) OF 'EMP' (Cost=3D10 Card=3D14336 Bytes=3D530432) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 101 consistent gets 0 physical reads 0 redo size 804729 bytes sent via SQL*Net to client 550 bytes received via SQL*Net from client 7 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 28672 rows processed Therefore, in my opinion, the optimizer correctly costs the join, but, = since it has no idea which array size is used, it has no possibility to = correctly cost the retrieval of the rows. Then, it simply minimizes the = memory need for the hash table by choosing the smaller row source. Chris -- //www.freelists.org/webpage/oracle-l