W4111 -- Introduction to Databases

W4111 -- Introduction to Databases
Homework 3
Spring 2019, Sections 03, V03, H03
Introduction
This is the specification of for homework 3 for W4111 - Introduction to Databases, section 03,
H03, V03 for spring semester 2019. This document is always the current version of the
specification. Developers are responsible for continuously reviewing the document for changes.
Document Control
Roles
Author UNI Role
Ferguson, Donald, F. dff9 Instructor
Approver UNI Role
Ferguson, Donald, F. dff9 Instructor
Reviewer UNI Role
Dalchand, Samantha sd2995 Assistant Instructor
Dhillon, Kirit ksd2142 Assistant Instructor
Gandikota, Chandana cg3111 Assistant Instructor
Gorrela, Meghna mg3740 Assistant Instructor
Huang, Rose rh2805 Assistant Instructor
Hudson, Alysha alh2202 Assistant Instructor
Karasev, Mikhail mak2257 Assistant Instructor
Peterson, Ara alp2210 Assistant Instructor
Saosun, Tahsina ts2931 Assistant Instructor
Swaroop, Vatsala vs2671 Assistant Instructor
Tan, Xinyue xt2215 Assistant Instructor
Change Log
Change No. Date Document
Version
Changes
Change Process
Students should post clarification requests on this Piazza thread. The current version of this
document and the change log will note changes/clarifications. There will not be any other source
documenting changes or clarifications.
Overview
This project has two parts:

  1. Implement indexes on top of CSV data in CSVDataTable implementations.
  2. Query optimization:
  3. Add a JOIN function to the CSVDataTable and implement query optimizations.
  4. Use Access Paths based on index selection to optimize find_by_template() and
    join().
    Allowed Frameworks/Libraries
    ● You MAY only use libraries that are part of the core Python environment, e.g. csv, json,
    etc.
    ● You MUST NOT use Pandas.
    Indexes and File
  5. The CSVDataTable manages dictionaries (also known as maps, name value pairs). An
    individual dictionary represents a row. The CSVDataTable as a whole represents all of
    the rows in a CSV file.
  6. A template is also a dictionary. A row matches a template if for each key in the
    dictionary, the row has a key that identifies a piece of data with exactly the same value
    as in the template.
  7. Your CSVDataTable must implement the following operations:
  8. insert(row)
  9. find_by_template(template, field_list, index_allowed)
  10. This must return a CSVDataTable.
  11. The table contains dictionaries that match the template and contain the
    requested fields (dictionary keys and values).
  12. If index_allowed is True, the find may use an index if one is supports the
    template.
  13. delete(template) deletes all rows matching a template.
  14. add_index(name, kind, column_list):
  15. name is a caller defined name.
  16. kind is one of “PRIMARY”, “UNIQUE”, “INDEX”.
  17. “UNIQUE” means that at most one row may exist in the table for a
    set of column values specified by the column_list.
  18. “PRIMARY” has the same behavior as unique, but there can be
    only one “PRIMARY” index.
  19. “INDEX” allows duplicate values.
  20. column_list is the set of column names that comprise the index definition.
  21. import(rows): rows is a list of dictionaries. This operation inserts the rows into the
    CSVDataTable.
  22. save(): This function saves the CSVDataTable data (rows) and index information
    to a single file.
  23. load(): This function loads the rows and index information from a single file.
  24. Data file behavior:
  25. Index information/state must persist between a save() and load(). You may not
    rebuild indexes on data load.
  26. load() loads the entire data and indexes.
  27. save() saves all of the data and index information.
  28. Your implementation should perform input validation on methods.
  29. Indexes only need to support equality comparisons.
    Join and Query Optimization
    Your CSVDataTable implementation must support the following operation:
    join(other_table, on_columns, where_template, field_list):
    ● Other table is a reference to a CSVDataTable.
    ● on_columns is a list of column names common to both tables. The join() function
    implements an equi-join using these columns.
    ● where_template is a dictionary. The only comparison operator is “==” and the resulting
    rows must match the template completely. They keys in the dictionary are of the form:
    ○ table_name.column_name
    ○ This specified the name of the table and column to which the template element
    applies.
    ● field_list is of columns of the same form as the template column definition. The query
    only returns the requested columns.
    The join() function returns a CSVDataTable, which supports all CSVDataTable operations.
    Your implementation of MUST implement at least three optimizations that are analogous to SQL
    optimizations covered in lectures.
    Your implementation should test for obvious error conditions and demonstrate improvements
    from your optimizations.
    Submission Format
    You will submit your homework as a zip file. The file will have the following directory structure:
    ● /src: Your implementation code.
    ● /tests: Your test code.
    ● /test_output: The console output from running the tests.
    ● /CSVFile: Contains any CSV files you use in testing
    ● /DB: Contains the holding tables/indexes.
    You must name the zip file following guidelines previously posted on Piazza.

WX:codehelp

你可能感兴趣的:(vue.js)