Algorithm or ideas wanted for creative text parsing

  • From: rjamya <rjamya@xxxxxxxxx>
  • To: "Oracle Discussion List" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 10 Apr 2006 12:51:54 -0400

Basically I am looking to isolate just the (distinct) domain name from
fully qualified domain names that you'd normally see in web-surfing.

I am working on couple of techniques, but it gets complicated since
TLDs differ in format and there is only so much you can do with
substr().

sample data ...

a836.v8519e.c8519.g.vm.akamaistream.net
a705.l1923962123.c19239.n.lm.akamaistream.net
db.c7.bf.a0.top.list.ru
a1657.l1923962104.c19239.n.lm.akamaistream.net
a1181.v21080b.c21080.g.vm.akamaistream.net
dl1.games.vip.scd.yahoo.com
lcp.mud.us.music.yahoo.com
www.celhs.osceola.k12.fl.us
www.celhs.osceola.k12.fl.us
www.celhs.osceola.k12.fl.us
w.s0.gc.sj.ipixmedia.com
w.s0.gc.sj.ipixmedia.com
v.s0.gc.sj.ipixmedia.com
us.1.p6.webhosting.yahoo.com
p1.music.vip.sc5.yahoo.com
lib1.store.vip.sc5.yahoo.com
www.twingroves.district96.k12.il.us
www.twingroves.district96.k12.il.us
www.the-simpsons.hpg.ig.com.br
www.schools.pinellas.k12.fl.us
www.rails4days.pwp.blueyonder.co.uk
www.rails4days.pwp.blueyonder.co.uk
www.garrp.dhr.state.ga.us
www.celhs.osceola.k12.fl.us
www.williamrobertson.pwp.blueyonder.co.uk
www.williamrobertson.pwp.blueyonder.co.uk
lcp.mud.us.music.yahoo.com
c.s0.gc.sj.ipixmedia.com
c.s0.gc.sj.ipixmedia.com
ax.phobos.apple.com
ax.phobos.apple.com
0982660.1206.feed.yellowpagecity.com
0982660.1207.feed.yellowpagecity.com

and by some magic the output should be ....

akamaistream.net
apple.com
yahoo.com
fl.us
ipixmedia.com
il.us
ig.com.br
blueyonder.co.uk
ga.us
yellowpagecity.com

Any ideas, thoughts?  I'd prefer to do this in SQL if possible, else
I'd prefer plsql. The data is already in a 10.1.0.4 database.

Thanks in advance
Raj
----------------------------------------------
Got RAC?
--
//www.freelists.org/webpage/oracle-l


Other related posts: