[hadoop-taiwan] Fw:ANNOUNCE: ParBASH 0.1 release - Hadoop through BASH

From: Yi-Kai Tsai <yikai@xxxxxxxxxxxxx>
To: "hadoop-taiwan@xxxxxxxxxxxxx" <hadoop-taiwan@xxxxxxxxxxxxx>
Date: Wed, 22 Jul 2009 10:45:32 +0800

Hello,

I'd like to announce the release of the 0.1 version of ParBASH. Using
ParBASH, it is possible to write bash scripts that are automatically
translated into Hadoop Streaming Jobs.

Here is an example script to find top 10 references for Barack Obama
pages on wikipedia using Amazon EC2:

wiki.sh:

cat hdfs:/wikipedia-out/* | grep Obama | \
perl -ne 'while (/<link type="external" href="([^"]+)">/g) { print
"$1\n"; }' |\perl -ne 'if (/http:\/\/([^\/]+)(\/|$)/) { print "$1\n"; }' |\
perl -ne '
if (/([^\.]\.)+([^\.]+\.[a-zA-Z]{2,3}\.[^\.]+)$/) { print "$2\n";}
else if (/([^\.]+\.[a-zA-Z]{2,3}\.[^\.]+)$/) { print "$1\n";}
else if (/([^\.]\.)*([^\.]+\.[^\.]+)$/) { print "$2\n"; }' |\
sort | uniq -c > hdfs:/out

How and why of wiki.sh and parbash on
http://cloud-dev.blogspot.com/2009/06/introduction-to-parbash.html

Source code and more examples:
http://code.google.com/p/parbash

If anyone is interested in trying it out, I can help you get started.

Thanks,
Milenko

-- 
Yi-Kai Tsai (cuma) <yikai@xxxxxxxxxxxxx>, Asia Search Engineering.

[hadoop-taiwan] Fw:ANNOUNCE: ParBASH 0.1 release - Hadoop through BASH

Other related posts: