Princeton Alogorithm COS226 Week6 Hash Table

in previous lecture, we learn symbol table implemented by bst. and have lg n time complexity in worst case.
Can we do better?
yes, but with different access to the data.
it is the famous data structure, hash table.
the key of hash table is the hash function, the issue of hash function is the hash collision. so when implement hash table, we need to handle two keys that hash to the same array index.
so i will introduce hash function.

hash function

hash function is a function convert any object to a int.
idealistic goal is that scramble the keys uniformly to produce a table index


Princeton Alogorithm COS226 Week6 Hash Table_第1张图片
image.png
Princeton Alogorithm COS226 Week6 Hash Table_第2张图片
image.png
Princeton Alogorithm COS226 Week6 Hash Table_第3张图片
image.png

Princeton Alogorithm COS226 Week6 Hash Table_第4张图片
image.png

uniform hash assumption

Princeton Alogorithm COS226 Week6 Hash Table_第5张图片
image.png

separate chainning

Princeton Alogorithm COS226 Week6 Hash Table_第6张图片
image.png

Princeton Alogorithm COS226 Week6 Hash Table_第7张图片
image.png

linear probing

open addressing, when a new key collides, find next empty slot, and put it there


Princeton Alogorithm COS226 Week6 Hash Table_第8张图片
image.png

knuth's parking problem
cars arrive at one-way street with M parking spaces. each desire a random space i: if space i is taken, try i+1, i+2 etc
what is mean displacement of a car?


Princeton Alogorithm COS226 Week6 Hash Table_第9张图片
image.png

Princeton Alogorithm COS226 Week6 Hash Table_第10张图片
image.png

algorithmic complexity attack on java

goal: find family of strings with the same hash code.


Princeton Alogorithm COS226 Week6 Hash Table_第11张图片
image.png

so we need a one-way hash function, it is hard to find a key that will hash to a desired value(or two keys that hash to some value)

spearate chaining vs linear probing

Princeton Alogorithm COS226 Week6 Hash Table_第12张图片
image.png

QUIZ

Q1

image.png

we assume we have uniform hashing assumption, then we can use hashmap to record all the n^2 pair , then we can use O(N) to find the result, be care find the two pair with some same indices.

Q2

Princeton Alogorithm COS226 Week6 Hash Table_第13张图片
image.png

if you override hashCode() but not equals(), only same reference will be dedup in hashmap.

if you override equals() but not hashCode().
only same reference will be dedup, but equal may return false on same reference

if you override hashCode() but implement wrong equals signature

public boolean equals(OlympicAthletethat)
have no use in dedup of hashmap.

Searching application

exception filter application

read in a list of words from one file, print out all words from standard input that are {in, not in} the list


Princeton Alogorithm COS226 Week6 Hash Table_第14张图片
image.png

and dictionary lookup


Princeton Alogorithm COS226 Week6 Hash Table_第15张图片
image.png

File indexing

goal: given a list of files specified, create an index so that you can efficiently find all files containing a given query string


Princeton Alogorithm COS226 Week6 Hash Table_第16张图片
image.png

Concordance

goal: preprocess a text corpus to support concordance queries: given a word, find all occurrences with their immediate contexts


Princeton Alogorithm COS226 Week6 Hash Table_第17张图片
image.png

Sparse matrix-vector multiplication

assumption: matrix dimension is 10,000; average nonzeros per row ~10


Princeton Alogorithm COS226 Week6 Hash Table_第18张图片
image.png
Princeton Alogorithm COS226 Week6 Hash Table_第19张图片
image.png

Princeton Alogorithm COS226 Week6 Hash Table_第20张图片
image.png

course quiz and project : https://github.com/yixuaz/algorithm4-princeton-cos226

你可能感兴趣的:(Princeton Alogorithm COS226 Week6 Hash Table)