[pythonvis] Re: Part 2: How to split line into words and count them

  • From: "Richard Dinger" <rrdinger@xxxxxxxxxx>
  • To: <pythonvis@xxxxxxxxxxxxx>
  • Date: Thu, 22 May 2014 10:32:30 -0700

Part 1 should be in the archives for this list. In a sense though, part 1 is included in part 2 since part 2 adds the code in the first loop after #print 'line' and wordMap = {} just before def counter.





-----Original Message----- From: Robert Spangler
Sent: Thursday, May 22, 2014 9:23 AM
To: pythonvis@xxxxxxxxxxxxx
Subject: [pythonvis] Re: Part 2: How to split line into words and count them

Hello,

I accidentally deleted part 1.  Where may I find it?

Robert

On 5/22/2014 12:03 PM, Richard Dinger wrote:
In part 1 the file was opened and read line by line.  Note there are
print statements that can be uncommented to trace what is happening.
Once the file is opened each line is processed.  The string object
method split is used to split the line into a list of words at each
whitespace location.
wordMap is a data structure called a dictionary.  A dictionary is sort
of a list that is accessed by a key (such as a word in this example)
rather than by an index.  So if our text file has the word ‘the’ in it 5
times:
wordMap[‘the’]
would give 5.  So I use a dictionary with the words of the text file as
the keys to count how many of each word there are.
Another for loop processes each word in the list.  The Words not already
in the wordMap are added with a count of 1 and existing members are
incremented.  The get method tries to get the count of its first
argument and if it is not in the wordMap the second argument is
returned.  So the statement:
wordMap[word] = wordMap(word, 0) + 1
Has the same result as:
if word not in wordMap:
   wordMap[word] = 0
wordMap[word] = wordMap[word] + 1
At the end of the file the result is printed out in no particular order.
Note the if __name__ stuff at the end of the file is True when the
script is run directly and False when imported into another script.  So
including a section like this is a good place to put some testing code.
So make up a file with some text in it and either name it words.txt or
change the code to match the name.  Then run this thing.
Now this still needs some work since capitalized words are different
from not and punctuation appended to some words changes counts.  But we
will look at that next version.
List web page is
//www.freelists.org/webpage/pythonvis

To unsubscribe, send email to
pythonvis-request@xxxxxxxxxxxxx with "unsubscribe" in the Subject field. List web page is //www.freelists.org/webpage/pythonvis

To unsubscribe, send email to pythonvis-request@xxxxxxxxxxxxx with "unsubscribe" in the Subject field.

Other related posts: