Removing duplicate subtrees from CONNECT-BY query

  • From: Riku Räsänen <riku.rasanen@xxxxxxxxxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Sun, 07 Oct 2007 21:50:08 +0300

Hello,

Certain application has a table with huge hierarchy, and a subtree of the hierarchy is allowed to exist in several places in the tree. No loops are allowed though.

The problem is that this hierarchy has to be searched effectively. Querying a certain top-level hierarchy returns over 600 000 DISTINCT nodes. With the duplicate subtrees, the query returns almost 6 000 000 nodes, where certain node appears 2000 times in the result.

Of course the requirement is that the search has to be effective and even this worst case of 6M rows should be handled in reasonable time (this is an OLTP application). This "subtree allowed to appear several times" is completely new case for me in the land of CONNECT BY's. Did RTFM, did "use the Google, Luke" etc, but did not find anything sensible.

So the question is: What is the most efficient way of removing the duplicates of this resultset? Currently there is no way to identify duplicate subtrees from the data except that the same ID appears multiple times. Is there a way to make CONNECT BY -operator to identify and prune the duplicate subtrees?

Example setup:

CREATE TABLE tree_hierarchy (
  id        NUMBER (20)
 ,parent_id NUMBER (20)
);

INSERT INTO tree_hierarchy (id, parent_id) VALUES (1, NULL);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (4, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (5, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (6, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (9, 4);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (10, 4);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (3, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (7, 3);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (8, 3);
-- Here is the duplicate:
INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 8);

COMMIT;

COL id FOR A15


SELECT LPAD (' ', (LEVEL - 1) * 2) || TO_CHAR (id, '999') id
     , parent_id
  FROM tree_hierarchy
START WITH id = 1
CONNECT BY PRIOR id = parent_id
;


ID               PARENT_ID
--------------- ----------
   1
     2                   1
       4                 2
         9               4
        10               4
       5                 2
       6                 2
     3                   1
       7                 3
       8                 3
         2               8              -- The duplicate subtree starts here
           4             2
             9           4
            10           4
           5             2
           6             2




--
Riku
--
//www.freelists.org/webpage/oracle-l


Other related posts: