--- On Tue, 3/11/09, Siva Subramanian <elpostssv@xxxxxxxxxxxxxx> wrote:
I am into my 1st BSc Statistics and into a project for an
MNC. I am
trying my first hand at linux and was recently introduced
to gawk.
I am having trouble processing a statistical dump that is
provided to us in the
form of a csv file. The format of the file is given below
C_ID, ID_NO, stat1, vol2, amount3,...
There are around 40 fields and the csv file has close to a
million records
The C_ID, is the customer id and is only way to identify
the customer.
The ID_NO field is the premium plan the customer is in
stat1, vol2, amounts are all numbers
I can write a query that uses a few if statements and gets
the details if the
ID_NO is sequential.
1.
However,
there are over 1000 different ID_NO (legacy) and we need to
add the stat1, vol2,
amount3 for each of the ID_NOs separately (I need to group
the ID_NO, and sum
of the fields)
2.
If
I have the C_ID of the customers in a separate csv, is it
possible to compare
the C_ID with that of the C_ID in the dump and determine
the sum of stat1,
vol2, amount3? (sum of the fields only
for a set of customers and not for the ID_NOs in whole)
We are extensively using
MS-Access for this and it has been a pain. A friend
suggested that I try my
hand at using tools in linux.
I am not sure if this is the
right mailing list for this.
Will really appreciate any help
in this regard.