Tino APCS

Lab 19.3 CountWords

Background

  1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:

    Special Cases Explanation
    hyphenated words (i.e., sixty-three) Count as one word
    hyphenated words with blank spaces on each side of hyphen (i.e., joyous - sparkling) Count as two words
    apostrophed words (i.e., 'tis, or can't) Count as one word
    upper and lower case (i.e., The and the) Both count as occurrences of the word 'the'. Convert any capital letters to lower case before counting such words.
  2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:

    Data Structures Algorithms
    arrays
    String class
    ArrayList class
    A Custom class
    sorting
    searches
    text file processing

Assignment

  1. Here is a sample data file to analyze (lincoln.txt). Parse the file and print out the following statistical results:
    • Total number of unique words used in the file. (Unique words disregards duplicates. For example, the list { "apple", "banana", "apple" } contains 3 total words, but only 2 unique words.)
    • Total number of words in a file.
    • The top 30 words which occur the most frequently, sorted in descending order by count.

  2. Not that although the output examples given on this page are DOUBLY sorted (by frequency, then alphabetically) doing so is not a requirement. The only requirement is that your results are sorted by frequency.

  3. Example output for lincoln.txt

     1  13  the
     2  12  that
     3  10  we
     4   8  here
     5   8  to
     6   7  a
     7   6  and
     8   5  for
     9   5  have
    10   5  it
    11   5  nation
    12   5  of
    
    ... rest of top 30 words ...
    
    Number of unique words used = 139
    Total # of words = 269
    
  4. If you have multiple classes, put them into a single file as usual and submit your code using the submission form below.

Resources

  1. Here are three sample data files to analyze.
  2. Here are output answers to the data files.
  3. Finally, here is the unabridged list of words for each file. To compare these to your program output you may need to write your data to a file (instead of System.out.println) or increase the buffer size of the Console window.

You must Sign In to submit to this assignment

Last modified: February 16, 2024

Back to Lab 19.2 Search

Dark Mode

Outline