[Ilugc] extract sentences from a xml file in bash
- From: suraj@xxxxxxxxxxxxx (Suraj Kumar)
- Date: Wed, 7 Mar 2012 20:33:51 +0530
On Wed, Mar 7, 2012 at 1:53 PM, Shrinivasan T <tshrinivasan at gmail.com>wrote:
The same block continues as multiple times in the file, but in records
are in different order.
Is there a fixed set of 'fields' per record? If yes, the solution could be
a lot simpler and possibly done using awk or a combination of shell tools.
If not, try this perl "one liner":
sample input file:
$ cat file
name=ravi
phone=101
email=a at b.c
phone=011
email=c at b.a
name=vira
email=a at a.com
name=crap
email=s at s.com
name=supercrap
The solution (which I've formatted here for ease of readability):
$ cat file | perl -MYAML -ni -e '
BEGIN {
@db = ();
$rec = {};
%keycounter = ();
$next_rec_at = 1;
}
chomp;
($k, $v) = $_ =~ m/^(.*?)\s*=\s*(.*)/gms;
$kcount = ( defined $keycounter{$k} ? $keycounter{$k} : 0 );
if ($kcount >= $next_rec_at) {
push(@db, $rec);
$rec = {};
$new_rec_at++;
}
$rec->{$k} = $v;
$keycounter{$k}++;
END {
push(@db, $rec);
print YAML::Dump(\@db); # ... or do your report generation here
}'
output:
- email: a at b.c
name: ravi
phone: 101
- email: c at b.a
name: vira
phone: 011
- email: a at a.com
name: crap
- email: s at s.com
name: supercrap
cheers,
-Suraj
--
Career Gear - Industry Driven Talent Factory
http://careergear.in/
Other related posts: