In part 1 the file was opened and read line by line. Note there are print statements that can be uncommented to trace what is happening. Once the file is opened each line is processed. The string object method split is used to split the line into a list of words at each whitespace location. wordMap is a data structure called a dictionary. A dictionary is sort of a list that is accessed by a key (such as a word in this example) rather than by an index. So if our text file has the word ‘the’ in it 5 times: wordMap[‘the’] would give 5. So I use a dictionary with the words of the text file as the keys to count how many of each word there are. Another for loop processes each word in the list. The Words not already in the wordMap are added with a count of 1 and existing members are incremented. The get method tries to get the count of its first argument and if it is not in the wordMap the second argument is returned. So the statement: wordMap[word] = wordMap(word, 0) + 1 Has the same result as: if word not in wordMap: wordMap[word] = 0 wordMap[word] = wordMap[word] + 1 At the end of the file the result is printed out in no particular order. Note the if __name__ stuff at the end of the file is True when the script is run directly and False when imported into another script. So including a section like this is a good place to put some testing code. So make up a file with some text in it and either name it words.txt or change the code to match the name. Then run this thing. Now this still needs some work since capitalized words are different from not and punctuation appended to some words changes counts. But we will look at that next version.
# wordCount1.py count the words in a file (no word cleanup) """ This script shows how to: - open a text file - read file by line - split line into words - count words - return dictionary of words->count """ # create an empty dictionary for words and their counts wordMap = {} def counter(file): # Open file and count words/frequency in file # open file for reading inFile = open(file, 'r') # read file by lines for line in inFile: #print 'line', line.strip() # split lines into words words = line.split() # print 'words', words # put word into dictionary incrementing count for word in words: #print 'word', word # update word's count or initialize if first time wordMap[word] = wordMap.get(word, 0) + 1 # end for word loop # end for line loop # close open file inFile.close() return wordMap # end wordCount function if __name__ == '__main__': file = 'words.txt' map = counter(file) #exit() print 'list of words and counts:' for k, v in map.iteritems(): print '%s %d' % (k, v)