[COMP] Re: C program to count the number of words in a text file

  • From: John Madden <weez@xxxxxxxxxxxxx>
  • To: computers@xxxxxxxxxxxxxxxxx
  • Date: Mon, 30 Apr 2001 15:25:19 -0500

On Monday 30 April 2001 14:45, you wrote:
> Hi guys
>
> I have a small problem.  A friend needs help writing a
>
> C program which has to read a text file, and count
> the number of occurences of each word in the file.
> I know this can be done using Linked-Lists, but his
> version needs to use arrays.
>
> The input file has the following:
> TO BE OR NOT TO BE

Hmm.  This shouldn't be *too* difficult, using something of a state 
machine that uses a space (obviously) as a word delimiter.  There's no 
need for a linked list, but doing this with arrays is sort of... funny. :)

Use two arrays: one to store the words, another to store the count.  The 
quick-n-dirty but inefficient way of doing this would be to have a 
function to search for the word in the array and return its index:

int findword(char target[])
{
        int i;
        for (i=0; i<total_word_count; i++)  // the count is global
                if(strcmp(word_array[i], target)==0) // word array is global
                        return index;  //loc of word in the array
        return -1; // word not in the array
}

So if you call this and get a -1, you know you have to add the word to the 
array: 

void addword(char word[])
{
        word_array[total_word_count++] = word;
}

That'll put all of the words into the array.  Then put a 1 inside 
word_count]total_word_count], the other global array, to hold the count 
for this word.

And if you don't get a -1 from findword(), you'll have to use the index 
returned to increment that location in word_count: word_count["returned 
int"]++;

I think you can figure out the rest. :)

Anyway, the much-faster way of doing this would be to sort the words 
before you add them to the array -- this program will be really slow with 
large (like more than a couple hundred words) files -- but that increases 
the complexity of addword() and findword() by quite a bit.

John



-- 
# John Madden  weez@xxxxxxxxxxxxx ICQ: 2EB9EA
# FreeLists, Free mailing lists for all: //www.freelists.org
# UNIX Systems Engineer, Ivy Tech State College: http://www.ivy.tec.in.us
# Linux, Apache, Perl and C: All the best things in life are free!
========================================
Avenir Web's Computers Mailing List

List Modes, Subscription, and General Info:
Go to //www.freelists.org/cgi-bin/webpage?webpage_id=11 
List Archives: //www.freelists.org/archives/computers
Administrative Contact: weez@xxxxxxxxxxxxx

Get computer help: http://avenir.dhs.org
========================================

Other related posts: