[program-l] Re: Scripting or ot: Consolidating a large number of CSV files and Excel sheets

  • From: "Pranav Lal" <pranav.lal@xxxxxxxxx>
  • To: <program-l@xxxxxxxxxxxxx>
  • Date: Sat, 15 Jun 2019 13:35:40 +0530

Hi Nick and all,

Indeed, python seems to be the way to go. The pandas module has a lot of
what I need.

This is my script so far.
import os
import glob
import pandas as pd
import numpy as np

path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))


np_array_list = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    np_array_list.append(df.as_matrix())

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)
big_frame.drop_duplicates(subset='content', inplace=True)
big_frame.to_csv('out.csv', index=False)


Pranav

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

Other related posts: