So far I have done .... CASE WHEN :new.domain_name LIKE '%.imageshack.us' THEN 'imageshack.us' -- we need to collapse these WHEN :new.domain_name LIKE '%.adtexh.de' THEN 'adtech.de' -- we need to collapse these WHEN domain_name LIKE '%.echo.cx' THEN 'echo.cx' -- we need to collapse these WHEN domain_name LIKE '%.exs.cx' THEN 'exs.cx' -- we need to collapse these WHEN domain_name LIKE '%.bigoo.ws' THEN 'bigoo.ws' -- we need to collapse these -- check if the domain name is just numeric WHEN regexp_like(trim(domain_name),'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$') THEN domain_name -- if it is .com/.name/.org etc get last two pieces WHEN length(szlastpiece) > 2 THEN substrdomain_name,pos1+1) -- if last piece is a country tld, then get the last 3 pieces ELSE substr(domain_name,pos2+1) END szLast piece is the piece after the last '.' in the domain name and pos1, pos2 refer to positions of '.' (second from last and 3rd from last respectively). Any gotcha's, missed rules are welcome. The results so far are pretty good ... a sample query on 4m+ rows so far shows reliable output. there will always be caveats, but I am happy with 99% "hit ratio" ... though any improvements are always welcome. uh oh ... I said the H word ... I am marked now ... Raj On 4/10/06, Gus Spier <gspier@xxxxxxxxxxx> wrote: > Raj, > It looks to me like you're going to have to do some rule based ETL. Start > by parsing your URIs on the dots into varrays and then examinining data .... > if seg. first == 'www' and seg.last == "com" then harvest seg.length-1 ... > > if seg.last='uk' and seg.length-1='co' then harvest seg.length-2 ... > > et cetera ad endless nauseam. > > But I don't think you can build a script that will reliably trundle out > there and correctly get what you want first try. > > Good luck > Gus -- ---------------------------------------------- Got RAC? -- //www.freelists.org/webpage/oracle-l