Assignment 7: Pollution Lookup
Revisited
CSE30: Computer Organization and Systems Fall 2021
Instructors: Bryan Chin and George Obaido
Due: Monday, March 7th, 2022 @ 11:59PM
Please read over the entire assignment before starting to get a sense of what you will need to
get done in the next week. REMEMBER: Everyone procrastinates but it is important to know
that you are procrastinating and still leave yourself enough time to finish. Start early. You
MUST run the assignment on the pi cluster. You HAVE to SSH: You will not be able to
compile or run the assignment otherwise.
Please read the FAQ and search the existing questions on Edstem before asking for help.
This reduces the load on the teaching staff, and clutter/duplicate questions on Edstem.
Additionally, staff support is not guaranteed to be available after 9pm on Wednesdays.
Table of Contents
- Table of Contents
- Choose your Teammate
- Learning Goals
- Assignment Overview
- Getting Started
a. Developing Your Program
b. Running Your Program - How the Program Works
a. Pollution Lookup Arguments
b. Output Examples - Program Implementation
a. Functions and Behaviors to Implement
i. node* node_lookup()
ii. void print_info() - Midpoint
- Submission and Grading
a. Submitting
b. Grading Breakdown [5 + 5 + 40 pts]
i. Checking For Exact Output Match
Choose your Teammate
This HW may be on the longer side compared to previous HWs and you will be optionally
allowed to choose one teammate with whom you will work on this HW. There will be no extra
credit if you work alone, but at the same time we will not be responsible if your teammate
doesn’t do as much work as you’d want them to do. Choose your teammate wisely and divide
your work carefully. Changing teammates will not be allowed once you have submitted the
form. Both members of the team receive the same grade. No discussion is allowed with
members not in your team with regard to AI policy.
If you decide to work with a teammate, ONE of you has to submit this Form before Sunday 2/27
11.59 PM PST.
Learning Goals
● Programming in ARM assembly
● Passing parameters to assembly functions
● Iterating through linked lists and arrays in assembly
● Working with a teammate
Assignment Overview
The animals in the jungle were able to get their pollution statistics with your help in A5.
However, humans learnt C and were able to intercept their communications. Now, the animals
have no choice but to look up the pollution statistics using ARM. They need your help to do it.
For this assignment, you will be given the pollution data of a city provided in the form of a CSV
(comma separated value) file. The database will contain the following fields. All the fields are
integers this time (notice the float iws field is no longer there)
year month day hour pm2.5 TEMP
Your job is to implement the functions print_info() and node_lookup() from A5 in ARM.
For a reminder of how hashed chaining works, watch this 5 minute video from Prof. Leo Porter
about hash tables. Video on Google Drive
Getting Started
Developing Your Program
For this class, you MUST compile and run your programs on the pi-cluster. To access the
server, you will need your cs30wi22xx student account, and know how to SSH into the cluster.
Need help or instructions? See Edstem FAQ. (Do NOT wait until the end to try this. There will
be limited or no ETS support on the weekends!)
We’ve provided you with the starter code at the following link:
https://github.com/cse30-wi22... - Download the files in the repository.
a. You can either use
git clone https://github.com/cse30-wi22...
directly from the pi-cluster, or download the repo locally and scp the folder to
pi-cluster if that doesn’t work. - Fill out the fields in the README before turning in.
Running Your Program
We’ve provided you with a Makefile so compiling your program should be easy!
Additionally, we’ve placed the reference solution binary at:
/home/linux/ieng6/cs30wi22/public/bin/poll_rv-a7-ref
You can directly use poll_rv-a7-ref as a command.
Makefile: The Makefile provided will create a poll_lookup executable from the source files
provided. Compile it by typing make into the terminal. By default, it will run with warnings turned
on. To compile without warnings, run make WARN= instead. Run make clean to remove files
generated by make.
How the Program Works
This section is the same as A5 (except that the iws column is removed - so there are 6
columns now). The program shall take a date to query, and the filename of the CSV as
arguments. It also takes some optional arguments: a flag to indicate printing of hash table
metadata, and a hash table size.
Inputs:
● Date
● Flag to print stats corresponding to a date
● Filename of the CSV
● Optional: Flag to print metadata
● Optional: Hash table size
● Optional: Flag which indicates if you need to remove all entries corresponding to a date
Outputs:
● The minimum, maximum, and average parameters of the query date, OR an error
message if the date couldn’t be found if the flag indicates you to print stats (The error
message is already handled for you).
● Optional: Hash table metadata
Pollution Lookup Arguments
Format for calling this executable with arguments (the flags cannot be grouped with each other,
e.g. -itc):
./poll_lookup [-i] [-t tablesize] <-d date> [-r date]
Argument(s) Description
-i
(User-optional flag)
Prints descriptive metadata about the hash table and its linked lists chains
after the query (as shown in the output examples).
-t tablesize
(User-optional flag)
tablesize (hash table size) is an integer specifying the size to be used to
create the hash table. If -t is not specified, the default is the constant
TABLE_SIZE.
If the argument to -t is smaller than the constant MIN_TABLE_SIZE, an
error message is printed with the usage message and the program quits.
-d date date is a string specifying the date in yyyy-mm-dd format. You need to
calculate the stats corresponding to date
-r date date is a string specifying the date in yyyy-mm-dd format. You need to
remove the hash table entries for date
filename This is the path to the input CSV. It must have six columns, in this order:
Year, month, day, hour, pm2.5, TEMP
We will not give malformed input CSVs to your program. The input path will
always be valid and be a properly-formatted CSV.
You do not have to write the option parsing: It is done for you in the provided function
parse_opts(). Do not edit parse_opts()!
If the mandatory arguments are not provided, their error messages are printed, followed by the
usage message. You can assume that a valid CSV will be included in any test command.
Output Examples
$ ./poll_lookup -i -d 2013-01-01 pollution.csv
Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
Minimum temp: -12 Maximum temp: -3 Average temp: -7
Total size: 1873
Total entries: 43824
Longest chain: 168
Shortest chain: 24
Empty buckets: 881
$ ./poll_lookup -i -d 2013-01-01 -r 2013-02-02 pollution.csv
Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
Minimum temp: -12 Maximum temp: -3 Average temp: -7
Total size: 1873
Total entries: 43800
Longest chain: 168
Shortest chain: 24
Empty buckets: 881
$ ./poll_lookup -i -t 13 -d 2013-01-01 pollution.csv
Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
Minimum temp: -12 Maximum temp: -3 Average temp: -7
Total size: 13
Total entries: 43824
Longest chain: 3552
Shortest chain: 3240
Empty buckets: 0
$ ./poll_lookup -i -t 13 -d 2013-01-01 -r 2013-02-02 pollution.csv
Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
Minimum temp: -12 Maximum temp: -3 Average temp: -7
Total size: 13
Total entries: 43800
Longest chain: 3552
Shortest chain: 3240
Empty buckets: 0
$ ./poll_lookup -i -t 1331 -d 2015-01-01 pollution.csv
Unable to find any data for the date 2015-1-1.
Total size: 1331
Total entries: 43824
Longest chain: 192
Shortest chain: 24
Empty buckets: 510
Program Implementation
Functions and Behaviors to Implement
node* node_lookup()
This has the same signature as the node_lookup() function in A5.
Arguments:
node* front → The current head of the linked list chain
int year → year
int month → month
int day → day
int hour → hour
Operation:
● Searches the linked list chain for a node that matches the year year, the month
month, the day day and hour hour.
● Returns the pointer to the node with matching data, otherwise, return NULL.
Simplifying node_lookup():
● The function node_lookup() takes 5 parameters. The first four parameters will be in
R0 to R3. The fifth parameter is stored by the caller just above your stackframe (fp). So
you can access the fifth parameter as [fp, NCOLS_OFFSET]. What will be the value
of NCOLS_OFFSET?
Stack Frame with some addresses for illustration purposes:
0x7fff_fff4
+4 5th parameter
node_lookup
stack frame
FP->
0x7fff_fff0 - caller’s FP
0x7fff_ffec
-4 caller’s LR
0x7fff_ffe8
-8 callee saved registers
-12 callee saved registers
... local variables + parameter saved
local variables + parameter saved
SP->
0x7fff_ffxx
local variables + parameter saved
0x7fff_ffxx
● If you run out of registers for variables, you can declare local variables and store and
load their values from the stack as needed. Define a fixed offset from the frame pointer
with .equ to do this.
● Here is a sample struct to make you understand how structures are stored in memory.
The values in red indicate sample memory address.
struct pride_lands
{
char c;
int x;
};
struct pride_lands p[10]; - 2001 2002 2003 2004 2005 2006 2007 2008
p[0].c unused unused unused p[0].x p[1].c
Each element of the struct is allocated 4 bytes of memory. However, as char takes 1
byte, 3 bytes are unused. However, int uses all 4 bytes. Notice where the next element
in the array starts. How is it related to the size of the struct?
Can you use the above example to answer the following questions?
a. Given node* front, how would you load front->year, front->month,
front->day and front->hour from memory?
b. How are they stored in memory relative to front?
c. How would you do front = front->next? What is the size of the structure?
● What will be the loop condition? What is the value of NULL?
void print_info()
This has the same signature as the print_info() function in A5.
Arguments:
node** table → pointer to hash table
unsigned long size → hash table size
Operation:
● Walk the hash table chain by chain.
● Output the following:
○ The size of the table (Table size)
○ The total number of nodes in the table (Total entries)
○ The length of longest and shortest non-empty chains (Longest chain, Shortest
chain)
○ The number of empty buckets in the table (Empty buckets)
● Prints all this information to stdout using the following format strings, in this order:
"Table size: %lu\n"
"Total entries: %lu\n"
"Longest chain: %lu\n"
"Shortest chain: %lu\n"
"Empty buckets: %lu\n"
Simplifying print_info():
● You have to iterate through the table for each entry, iterate through the linked list for that
entry. Given table and i, how would you find table[i] in ARM?
● How will you initialize the variables for minimum chain length and maximum chain
length?
● How would you call printf() in ARM to print to standard output?
● Use the same logic as you used in node_lookup() for iterating through a linked list.
Allowed ARM Instructions
You are only allowed to use the instructions provided in the ARM ISA Green Card. Failure to
comply will result in a score of 0 on the assignment.
Style Requirements
Points WILL be given for style, and teaching staff won't be able to provide assistance or
regrades unless code is readable. Please follow these Style Guidelines for ARM assembly.
Midpoint
This part of the assignment is due earlier than the full assignment, on
Wednesday 3/2 at 11:59 pm PST. There are no late submissions on
the Midpoint.
Complete the Gradescope assignment “A7: Midpoint”, an Online Assignment that is done
entirely through Gradescope. This assignment consists of a few quiz questions and a
free-response question where you will document the pseudocode or C code of both functions.
This is a planning document and does not need to reflect your final implementation, although
you are encouraged to keep it up to date. Answers in plain English will receive 0 points.
Discuss your implementations of the following functions: node_lookup and print_info.
Submission and Grading
Submitting - Submit your files to Gradescope under the assignment titled “A7: Pollution Lookup
Revisited”. As this is a group submission, you need to make only one submission per
group from any one of your gradescope accounts. After submitting, add your
teammate’s email id to your submission.
You will submit the following files:
node_lookup.s
print_info.s
README.md
You can upload multiple files to Gradescope by holding CTRL (⌘ on a Mac) while you
are clicking the files. You can also hold SHIFT to select all files between a start point
and an endpoint. Alternatively, you can place all files in a folder and upload the folder to
the assignment. Gradescope will upload all files in the folder. You can also zip all of the
files and upload the .zip to the assignment. Ensure that the files you submit are not in
a nested folder. - After submitting, the autograder will run a few tests:
a. Check that all required files were submitted.
b. Check that your files compile without warnings.
c. Runs some tests on the resulting poll_lookup executable.
Grading Breakdown [5 + 5 + 40 pts]
Make sure to check the autograder output after submitting! We will be running
additional tests after the deadline passes to determine your final grade. Also, throughout this
course, make sure to write your own test cases. It is bad practice to rely on the minimal
public autograder tests.
To encourage you to write your own tests, we are not providing any public tests that have
not already been detailed in this writeup.
The assignment will be graded out of 50 points and will be allocated as follows:
● Midpoint writeup: 5 points. This part of the assignment is due earlier than the full
assignment, on Wednesday 3/2 at 11:59 pm PST. Complete the Gradescope
assignment “A7: Midpoint”.
● Style: 5 points
● Public tests with the provided examples.
● Private tests with hidden test cases.
NOTE: The end to end tests expect an EXACT output match with the reference binary.
There will be NO partial credit for any differences in the output. Test your code - DO NOT
RELY ON THE AUTOGRADER FOR PROGRAM VALIDATION (READ THIS SENTENCE
AGAIN).
Make sure your assignment compiles correctly through the provided Makefile on the
pi-cluster without warnings. Any assignment that does not compile will receive 0 credit.
Checking For Exact Output Match
A common technique is to redirect the outputs to files and compare against the reference
solution
1
:
./your-program args > output
our-reference args > ref
diff output ref
If the second command outputs nothing, there is no difference. Otherwise, it will output lines that
differ with a < or > in front to tell which file the line came from.
END OF INSTRUCTIONS, PLEASE RE-READ! - You might want to check out vimdiff on the pi-cluster (https://www.tutorialspoint.co...).