Re: Removing duplicate subtrees from CONNECT-BY query

Riku,

Great question. Unfortunately, CONNECT merely refers to a particular way of traversing the rows, and there is, to my knowledge, nothing to tell Oracle to prune a subdirectory it has already visited - it would mean remembering all the nodes, ouch. Duplicates are just a natural thing in trees. We all have ancestors who appear several times in our family tree. You could of course try to manage it yourself with a cursor, a PL/SQL table and a stored procedure but I think you mentioned performance ...

If I understand well your problem, the question is "what are the descendants of <some thing that identify a row here> (condition #1) that also verify <something that tells you you have what you want> (condition #2). What about reversing the problem? If condition #2 is selective enough, perhaps it would be simpler to search for condition #2 first, and check that there is a row among the "ancestors" of the row found that verify condition #1? I assume implicitly, of course, that you don't want to do this for millions of rows, because then all things considered a brutal DISTINCT on the result set may well be the most efficient ...

Short of that, I would question the design ...

HTH

Stéphane Faroult

Riku Räsänen wrote:
Hello,

Certain application has a table with huge hierarchy, and a subtree of the hierarchy is allowed to exist in several places in the tree. No loops are allowed though.

The problem is that this hierarchy has to be searched effectively. Querying a certain top-level hierarchy returns over 600 000 DISTINCT nodes. With the duplicate subtrees, the query returns almost 6 000 000 nodes, where certain node appears 2000 times in the result.

Of course the requirement is that the search has to be effective and even this worst case of 6M rows should be handled in reasonable time (this is an OLTP application). This "subtree allowed to appear several times" is completely new case for me in the land of CONNECT BY's. Did RTFM, did "use the Google, Luke" etc, but did not find anything sensible.

So the question is: What is the most efficient way of removing the duplicates of this resultset? Currently there is no way to identify duplicate subtrees from the data except that the same ID appears multiple times. Is there a way to make CONNECT BY -operator to identify and prune the duplicate subtrees?


--
http://www.freelists.org/webpage/oracle-l


Other related posts: