Re: Removing duplicate subtrees from CONNECT-BY query
- From: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
- To: riku.rasanen@xxxxxxxxxxxxxxxx
- Date: Sun, 07 Oct 2007 22:24:16 +0200
Riku,
Great question. Unfortunately, CONNECT merely refers to a particular
way of traversing the rows, and there is, to my knowledge, nothing to
tell Oracle to prune a subdirectory it has already visited - it would
mean remembering all the nodes, ouch. Duplicates are just a natural
thing in trees. We all have ancestors who appear several times in our
family tree. You could of course try to manage it yourself with a
cursor, a PL/SQL table and a stored procedure but I think you mentioned
performance ...
If I understand well your problem, the question is "what are the
descendants of <some thing that identify a row here> (condition #1) that
also verify <something that tells you you have what you want> (condition
#2).
What about reversing the problem? If condition #2 is selective enough,
perhaps it would be simpler to search for condition #2 first, and check
that there is a row among the "ancestors" of the row found that verify
condition #1? I assume implicitly, of course, that you don't want to do
this for millions of rows, because then all things considered a brutal
DISTINCT on the result set may well be the most efficient ...
Short of that, I would question the design ...
HTH
Stéphane Faroult
Riku Räsänen wrote:
Hello,
Certain application has a table with huge hierarchy, and a subtree of
the hierarchy is allowed to exist in several places in the tree. No
loops are allowed though.
The problem is that this hierarchy has to be searched effectively.
Querying a certain top-level hierarchy returns over 600 000 DISTINCT
nodes. With the duplicate subtrees, the query returns almost 6 000 000
nodes, where certain node appears 2000 times in the result.
Of course the requirement is that the search has to be effective and
even this worst case of 6M rows should be handled in reasonable time
(this is an OLTP application). This "subtree allowed to appear several
times" is completely new case for me in the land of CONNECT BY's. Did
RTFM, did "use the Google, Luke" etc, but did not find anything sensible.
So the question is: What is the most efficient way of removing the
duplicates of this resultset? Currently there is no way to identify
duplicate subtrees from the data except that the same ID appears
multiple times. Is there a way to make CONNECT BY -operator to
identify and prune the duplicate subtrees?
--
http://www.freelists.org/webpage/oracle-l
- Follow-Ups:
- Re: Removing duplicate subtrees from CONNECT-BY query
- From: Riku Räsänen
- References:
- Removing duplicate subtrees from CONNECT-BY query
- From: Riku Räsänen
Other related posts:
- » Removing duplicate subtrees from CONNECT-BY query
- » Re: Removing duplicate subtrees from CONNECT-BY query
- » Re: Removing duplicate subtrees from CONNECT-BY query
- » Re: Removing duplicate subtrees from CONNECT-BY query
Hello,Certain application has a table with huge hierarchy, and a subtree of the hierarchy is allowed to exist in several places in the tree. No loops are allowed though.
The problem is that this hierarchy has to be searched effectively. Querying a certain top-level hierarchy returns over 600 000 DISTINCT nodes. With the duplicate subtrees, the query returns almost 6 000 000 nodes, where certain node appears 2000 times in the result.
Of course the requirement is that the search has to be effective and even this worst case of 6M rows should be handled in reasonable time (this is an OLTP application). This "subtree allowed to appear several times" is completely new case for me in the land of CONNECT BY's. Did RTFM, did "use the Google, Luke" etc, but did not find anything sensible.
So the question is: What is the most efficient way of removing the duplicates of this resultset? Currently there is no way to identify duplicate subtrees from the data except that the same ID appears multiple times. Is there a way to make CONNECT BY -operator to identify and prune the duplicate subtrees?
- Re: Removing duplicate subtrees from CONNECT-BY query
- From: Riku Räsänen
- Removing duplicate subtrees from CONNECT-BY query
- From: Riku Räsänen