811312A 数据结构

811312A Data Structures and Algorithms 2020-2021 Assignment, Instructions
This document describes both general and specific requirements for the assignment of the course
Data Structures and Algorithms.

  1. General Instructions
    Everyone shall return one assignment implemented in C. Return the solution to course's Moodle
    workspace. The returned work will be graded as approved/disproved. If not approvable, a returned
    solution can be revised one time, according to inspector's advice.
    The assignment is an individual work. You are not allowed to copy code from others.
    However, the course materials, for instance exercise solutions and example programs can be used
    freely. When utilizing already made solutions, referencing practices need to be followed and
    explanation of how it has been modified. The solution must be provided with the standard features
    of C language. The use of extra code libraries is not allowed.
    The solution shall contain
  2. program code and
  3. a report
    in one zipped file.
    The report must be written in English and it shall contain
  4. description of the solution and
  5. analysis of the solution program.
    Analysis contains both analysis of the solution algorithm and measurements of program
    running times. Based on this, estimates for the performance of the solution algorithm should
    be provided. The code should be commented to improve its readability, but no other
    documentation is required. The following description of the assignment describes the contents of
    the report in more detail.
    The assignment for this instance of the course shall be returned at latest 9.4.2021 23.59
  6. Description of the Assignment
    In this assignment, you shall find the 100 most frequently occurring words from a large text
    file. The program shall be implemented in C language. A continuous string of characters a..z and
    A..Z, with possible apostrophes ’, is considered a word. Words with uppercase and lowercase
    letters are considered equal. For example, in the text
    Herman Melville’s book Moby Dick starts, as we all know, with the sentence ”Call me Ishmael”.
    the words are herman, melville’s, book, moby, dick, starts, as, we, all, know, with, the, sentence,
    call, me, and ishmael
    The name of the input text file shall be given as an input from the user. The program shall print the
  7. most frequently occurring words and their frequencies. The words are printed in descending
    order according to their frequencies.
    Because the file can be very large, you need a suitable data structure to store words and their
    frequencies. This can be, for instance, a hash table or a binary search tree. You can find the most
    frequent words by sorting the structure with some fast algorithm.
    There are examples of input files and corresponding outputs in Moodle. All the files are texts
    written in English. You should use these to test your program. The size of the input is
    measured as the number of words in the text file. Measure the running times of your program
    with given test input files and make an estimate, how many words there can be in a file that you can
    process in 15 seconds. Based on this, estimate the maximum size (in megabytes) of files that your
    program can process in the mentioned time, when we know that the average word length in English
    is 5.1 letters.
    In addition to program code, you shall return a report that describes
  8. the solution, and
  9. analysis of program’s performance as mentioned above.
    Analysis contains thus the measurements of program’s running times and the mentioned estimates
    for number of words and file size. The code shall be commented to improve readability, but no
    other documentation is required. The report shall naturally contain student’s name and student
    number. If you wish, you can also give feedback in your report.
    Remember to zip the files before returning.

你可能感兴趣的:(后端)