Re: de-dup process

  • From: A Ebadi <ebadi01@xxxxxxxxx>
  • To: Oracle-L@xxxxxxxxxxxxx, Tony van Lingen <tony.vanlingen@xxxxxxxxxxxxxx>
  • Date: Mon, 22 Jan 2007 09:51:46 -0800 (PST)

Thanks to everyone's suggestions a few weeks ago regarding this de-dup issue.  
I just wanted to let everyone know that the solution we decided on was to go 
with hourly partitioning instead of daily which reduced our subset of data 
which had to be de-duped by 24.  The app had to be modified slightly to 
accomplish this.  Now, w are able delete out the dups (i.e. CTAS out the data 
we are keeping).  We are able to run many of these de-dups in parallel also.
   
  Again, I understand this doesn't scale to infinity, but will get us by for 
12-18 months based on the volume estimates.
   
  Thanks,
  Abdul

A Ebadi <ebadi01@xxxxxxxxx> wrote:
    Putting a unique key constraint on it and loading it direct-path with dups 
will insert the dups into the table and make the index unusable, so I don't 
know how this could help us?  
  I don't want the dups inserted.
   
  Thanks.

Tony van Lingen <tony.vanlingen@xxxxxxxxxxxxxx> wrote:
  It may even be easier... You say "We are doing direct path load so no unique 
key indexes can be put on the table to take care of the duplicates". The 
utility guide (10gR2) however explicitly names unique constraints as a 
constraint that can be enforced during direct path loads: 
    Integrity Constraints  All integrity constraints are enforced during direct 
path loads, although not necessarily at the same time. NOT NULL constraints are 
enforced during the load. Records that fail these constraints are rejected.
  UNIQUE constraints are enforced both during and after the load. A record that 
violates a UNIQUE constraint is not rejected (the record is not available in 
memory when the constraint violation is detected).
(Utilities, B14215-01 chapter 11).

Did you actually try this?

  Cheers,
Tony



A Ebadi wrote:     We have a huge table (> 160 million rows) which has about 20 
million duplicate rows that we need to delete.  What is the most efficient way 
to do this as we will need to do this daily?
  A single varchar2(30) column is used to identified duplicates.  We could 
possibly have > 2 rows of duplicates.
   
  We are doing direct path load so no unique key indexes can be put on the 
table to take care of the duplicates.
   
  Platform: Oracle 10G RAC (2 node) on Solaris 10.
   
  Thanks!
  
  
---------------------------------
  Need a quick answer? Get one in minutes from people who know. Ask your 
question on Yahoo! Answers. 


  ___________________________
  Disclaimer
  
   
  WARNING: This e-mail (including any attachments) has originated from a 
Queensland Government department and may contain information that is 
confidential, private, or covered by legal professional privilege, and may be 
protected by copyright. 
  
   
  You may use this e-mail only if you are the person(s) it was intended to be 
sent to and if you use it in an authorised way. No one is allowed to use, 
review, alter, transmit, disclose, distribute, print or copy this e-mail 
without appropriate authority. If you have received this e-mail in error, 
please inform the sender immediately by phone or e-mail and delete this e-mail, 
including any copies, from your computer system network and destroy any 
hardcopies.
  
   
  Unless otherwise stated, this e-mail represents the views of the sender and 
not the views of the Environmental Protection Agency.
  
   
  Although this e-mail has been checked for the presence of computer viruses, 
the Environmental Protection Agency provides no warranty that all viruses have 
been detected and cleaned. Any use of this e-mail could harm your computer 
system. It is your responsibility to ensure that this e-mail does not contain 
and is not affected by computer viruses, defects or interference by third 
parties or replication problems (including incompatibility with your computer 
system).
  
   
  E-mails sent to and from the Environmental Protection Agency will be 
electronically stored, managed and may be audited, in accordance with the law 
and Queensland Government Information Standards (IS31, IS38, IS40, IS41 and 
IS42) to the extent they are consistent with the law.
  
   
  ___________________________
   

    
---------------------------------
  Want to start your own business? Learn how on Yahoo! Small Business.


 
---------------------------------
Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates.

Other related posts: