Re: Hint for self-join connect by

From: "Alberto Dell'Era" <alberto.dellera@xxxxxxxxx>
To: "amit poddar" <amit.poddar@xxxxxxxx>
Date: Tue, 1 May 2007 23:13:30 +0200

Amit,

I'm researching as well, the algorithm is not 100% clear to me,
and I have yet to RTFM in full and look around for "prior art" :)

I have created as well a minitable this afternoon, and I have
an interesting variant to share (9.2.0.6):

insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('_MERGED_',2,1);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',3,2);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',4,3);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',5,4);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('_MERGED_',6,99);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',7,6);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',8,7);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',9,8);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',10,9);
insert into 
dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID)
values ('',11,10);

using your DDL:

create index test_index on dch_work_surr_mgmt_ordered (acct_id);
create index test_index2 on dch_work_surr_mgmt_ordered (surrogate_id,
forward_surrogate_id);

exec dbms_stats.gather_table_stats (user,
'dch_work_surr_mgmt_ordered', cascade=>true, method_opt=>'for all
columns size 1', estimate_percent=>100);

explain plan for
SELECT SUBSTR(sys_connect_by_path(surrogate_id, '|'), 2) tree
                       , LEVEL AS lev
                         FROM dch_work_surr_mgmt_ordered
                        START WITH acct_id = '_MERGED_'
                      CONNECT BY PRIOR forward_surrogate_id = surrogate_id;

select * from table (dbms_xplan.display);

---------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name                       |
Rows  | Bytes | Cost  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                             |
  1 |     8 |     2 |
|*  1 |  CONNECT BY WITH FILTERING    |                             |
    |       |       |
|   2 |   NESTED LOOPS                |                             |
    |       |       |
|*  3 |    INDEX RANGE SCAN           | TEST_INDEX                  |
  2 |     6 |     1 |
|   4 |    TABLE ACCESS BY USER ROWID | DCH_WORK_SURR_MGMT_ORDERED  |
    |       |       |
|   5 |   NESTED LOOPS                |                             |
    |       |       |
|   6 |    BUFFER SORT                |                             |
  1 |     8 |       |
|   7 |     CONNECT BY PUMP           |                             |
    |       |       |
|   8 |    TABLE ACCESS BY INDEX ROWID| DCH_WORK_SURR_MGMT_ORDERED  |
  1 |     8 |     2 |
|*  9 |     INDEX RANGE SCAN          | TEST_INDEX2                 |
  1 |       |     1 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  1 - filter("DCH_WORK_SURR_MGMT_ORDERED"."ACCT_ID"='_MERGED_')
  3 - access("DCH_WORK_SURR_MGMT_ORDERED"."ACCT_ID"='_MERGED_')
  9 - access("DCH_WORK_SURR_MGMT_ORDERED"."SURROGATE_ID"=NULL)

This shows the algorithm as I expect it to be:
(a) 3+4 gets the starting rows, feed them to the row sources 5-9.
(b) 7 gets the rows, 6 orders them (probably to minimize the number
   of blocks to read), sends to 8+9 that retrieves the first level of the
   hierarchy
(c) rows retrieved are sent back to (b) until no rows found.

I cannot understand row source 8 (FTS) in your plan, why
it should FTS ... unless (wild shot in the dark) it keeps in "memory"
the rowids only while doing (b)+(c) and then FTS to retrieve the remaining
columns.

Alberto

On 5/1/07, amit poddar <amit.poddar@xxxxxxxx> wrote:

Alberto from your reply it seems that you understand the connect by
runtime algorithm quite well.
Can you please clarify my confusion ?

I have create the table dch_work_surr_mgmt_ordered and created two indexes
test_index (acct_id) and test_index2(surrogate_id, forward_surrogate_id)

My question is:

1. Step 2 and 3 in the explain plan are for getting the rows for the
start with clause (first selection)

which step is the hierarchy visit you mention (probably step 7) ?

amit

SQL> explain plan for
  2  SELECT SUBSTR(sys_connect_by_path(surrogate_id, '|'), 2) tree
                             , LEVEL AS lev
                          FROM dch_work_surr_mgmt_ordered
                         START WITH acct_id = '&ACCT_MERGE'
                       CONNECT BY PRIOR forward_surrogate_id =
surrogate_id  3    4    5    6
  7  /
Enter value for acct_merge: aa
old   5:                          START WITH acct_id = '&ACCT_MERGE'
new   5:                          START WITH acct_id = 'aa'

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3651564746

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                       |
Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                            |
10M|   247M|     1   (0)| 00:00:01 |
|*  1 |  CONNECT BY WITH FILTERING   |
|       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| DCH_WORK_SURR_MGMT_ORDERED |
10M|   495M|     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | TEST_INDEX                 |
4000K|       |     1   (0)| 00:00:01 |
|   4 |   NESTED LOOPS               |
|       |       |            |          |
|   5 |    BUFFER SORT               |
|       |       |            |          |
|   6 |     CONNECT BY PUMP          |
|       |       |            |          |
|*  7 |    INDEX RANGE SCAN          | TEST_INDEX2                |
10M|   247M|     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS FULL          | DCH_WORK_SURR_MGMT_ORDERED |
1000M|    36G| 93987 (100)| 00:05:49 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SURROGATE_ID"=PRIOR "FORWARD_SURROGATE_ID")
   3 - access("ACCT_ID"=TO_NUMBER('aa'))
   7 - access("SURROGATE_ID"=PRIOR "FORWARD_SURROGATE_ID")

22 rows selected.



Alberto Dell'Era wrote:
> That doesn't seem to tally with any of your plans - neither
> has both id 5 and 6 with an asterisk. May you please
> check it out and repost both plans with the predicate infos ?
>
> It would also interesting to know num_distinct, num_null and density
> (from dba_tab_columns) for the columns
> acct_id, forward_surrogate_id and surrogate_id.
>
> 450k out of 70M - that's 0.6%. Maybe an index on
> acct_id, forward_surrogate_id
> may help the first selection, and another on
> surrogate_id, forward_surrogate_id
> *might* help the hierarchy visit (or at least turn the FTS
> into an index FFS). Why not giving it a shot.



--
Alberto Dell'Era
"dulce bellum inexpertis"
--
//www.freelists.org/webpage/oracle-l

References:
- Re: Hint for self-join connect by
  - From: Alberto Dell'Era
- RE: Hint for self-join connect by
  - From: Herring Dave - dherri
- Re: Hint for self-join connect by
  - From: Alberto Dell'Era
- Re: Hint for self-join connect by
  - From: amit poddar

Re: Hint for self-join connect by

Other related posts: