Re: de-dup process

From: A Ebadi <ebadi01@xxxxxxxxx>
To: mgogala@xxxxxxxxxxx, tboss@xxxxxxxxxxxxxxxxxx
Date: Wed, 13 Dec 2006 14:38:01 -0800 (PST)

Biggest problem we've faced in coming up with a solution is none of the 
solutions so far scale.  In other words, things are fine if we have a 20 
million row table with 2-3 million duplicates - runs in 10-15 minutes.  
However, trying it for 100+ million row table - it runs for hrs!

  We've even had another tool (Informatica) select out the ROWIDs of the 
duplicates into a separate table then we are using PL/SQL cursor to delete 
those rows from the large table, but this doesn't scale either!

  I'm currently looking at some of the suggestions I got from this list and 
seeing if any of them will work with us.

Thanks for all who replied and I'll let you know how it goes!  

Mladen Gogala <mgogala@xxxxxxxxxxx> wrote:

On 12/12/2006 08:42:38 PM, tboss@xxxxxxxxxxxxxxxxxx wrote:
> >From asktom, the best way I've found is to use Tom's little code snippet 
> >below:
> 
> delete from table your_huge_table
> where rowid in
> (select rid
> from
> (select rowid rid,
> row_number() over
> (partition by varchar_that_defines_duplicates
> order by rowid ) rn
> from your_huge_table
> )
> where rn <> 1
> )
> /

Good luck with that if cardinality is +60M rows. I would also add a condition 
like
WHERE ROWID in (SELECT ROW_ID from EXCEPTIONS) to your query. You only need to 
populate
exceptions table with duplicates and if those duplicates are a small percentage 
of the
total number of records, your task will be done two order of magnitude faster 
then without
the exceptions table.
-- 
Mladen Gogala
http://www.mladen-gogala.com

--
//www.freelists.org/webpage/oracle-l

---------------------------------
Everyone is raving about the all-new Yahoo! Mail beta.

Follow-Ups:
- Re: de-dup process
  - From: Tony van Lingen
- Re: de-dup process
  - From: Tim Gorman
- Re: de-dup process
  - From: Gints Plivna

References:
- Re: de-dup process
  - From: Mladen Gogala

Re: de-dup process

Other related posts: