Hi Raj, If the exception list grows I think you could improve your procedure putting the domain exceptions in a table and looping on that table to find the domain you need. the domain table could have this structure: create table DOMAINS ( LIKE_PATTERN varchar2(102), DOMAIN_NAME varchar2(100), DOMAIN_ORDER number ); with a function like create or replace function getdomain(full_domain in varchar2) return varchar2 is begin for domain_match in (select * from DOMAINS order by DOMAIN_ORDER) loop if full_domain LIKE domain_match.like_pattern then return domain_match.domain_name; end if; end loop; return null; end getdomain; you could manage your exception and even try the more probable ones first (which could lead to less resources usage).This is not production code but I think it may give you an idea of my suggestion. Hope it helps, Giovanni On 4/10/06, rjamya <rjamya@xxxxxxxxx> wrote: > So far I have done .... > > CASE > WHEN :new.domain_name LIKE '%.imageshack.us' THEN 'imageshack.us' > -- we need to collapse these > WHEN :new.domain_name LIKE '%.adtexh.de' THEN 'adtech.de' -- > we need to collapse these > WHEN domain_name LIKE '%.echo.cx' THEN 'echo.cx' -- we need to > collapse these > WHEN domain_name LIKE '%.exs.cx' THEN 'exs.cx' -- we need to collapse > these > WHEN domain_name LIKE '%.bigoo.ws' THEN 'bigoo.ws' -- we need to > collapse these > -- check if the domain name is just numeric > WHEN > regexp_like(trim(domain_name),'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$') > THEN domain_name > -- if it is .com/.name/.org etc get last two pieces > WHEN length(szlastpiece) > 2 THEN substrdomain_name,pos1+1) > -- if last piece is a country tld, then get the last 3 pieces > ELSE substr(domain_name,pos2+1) > END > > szLast piece is the piece after the last '.' in the domain name and > pos1, pos2 refer to positions of '.' (second from last and 3rd from > last respectively). > > Any gotcha's, missed rules are welcome. The results so far are pretty > good ... a sample query on 4m+ rows so far shows reliable output. > there will always be caveats, but I am happy with 99% "hit ratio" ... > though any improvements are always welcome. > > uh oh ... I said the H word ... I am marked now ... > Raj > > On 4/10/06, Gus Spier <gspier@xxxxxxxxxxx> wrote: > > Raj, > > It looks to me like you're going to have to do some rule based ETL. Start > > by parsing your URIs on the dots into varrays and then examinining data .... > > if seg. first == 'www' and seg.last == "com" then harvest seg.length-1 ... > > > > if seg.last='uk' and seg.length-1='co' then harvest seg.length-2 ... > > > > et cetera ad endless nauseam. > > > > But I don't think you can build a script that will reliably trundle out > > there and correctly get what you want first try. > > > > Good luck > > Gus > -- > ---------------------------------------------- > Got RAC? > -- > //www.freelists.org/webpage/oracle-l > > > -- -------------------------------------------------------------------- Another free oracle resource profiler http://sourceforge.net/projects/oraresprof/ Now version 0.9 -- //www.freelists.org/webpage/oracle-l