Hi All
I have a MySQL table and it contains around 20,000,00 records.
id(promary key and autogenerate),review(text),hash(int)) is the table
structure
There is a column called "hash",which has the hash value (generated
programatcally)
I am pretty sure that there is duplicate records in my table.Thats y i
generated a hash.
Now i would like to dedupe the table using hash.
This is the query i used for the said purpose
Positive : table name
id : auto generated id
hash : hash value
review : reviews
delete Positive
from Positive,
(
select MIN(id) minIdent, hash s
from Positive m
group by hash
having count(1) > 1
) as derived
where Positive.hash= derived.s
and id > minIdent
The above dedupe query is working.I checked it in a tabel which contains
10,000 records.All the duplicate hash values are removed.
But my problem is while trying the same query in large table (20,000,00),it
takes too long.
On a test run the query runs 24 hours and not completed
Is there anything which is wrong. Because I am not that much expert in DB
--
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog
*ILUGCBE*
http://ilugcbe.techstud.org