1. Purpose of Hash Tables : maintain a (possibly evolving) set of stuff.
Supported Operations:
a) Insert: add new record
b) Delete: delete existing record
c) Lookup: check for a particular record
All operations in O(1) time
2. Application: De-Duplication
Given: a “stream” of objects. (Linear scan of a huge file or objects arriving in real time)
Goal: remove duplicates
Solution: when new object x arrives -- lookup x in hash table H if not found, Insert x into H
3. Application: The 2-SUM Problem
Input: unsorted array A of n integers. Target sum t.
Goal: determine whether or not there are two numbers x,y in A with x + y = t
Naive Solution: O(n^2) time via exhaustive search
Better Solution: 1) sort A ( O(nlogn) time)
2) for each x in A, look for t-x in A via binary search (O(nlogn) time)
Amazing: 1) insert all elements of A to hash table H (O(n) time)
2) for each x in A, Lookup t-x in H ( O(n) time)
4. High-Level Idea of Implementation:
Setup: universe U (generally, really big)
Goal: want to maintain evolving set S subset of U (generally, of reasonable size)
Solution: 1) pick n = # of “buckets” with n ~ |S| (for simplicity assume |S| doesn’t vary much)
2) choose a hash function
3) use array A of length n, store x in A[h(x)]
5. Resolving Collisions
Collision: distinct x, y in U such that h(x) = h(y)
Solution #1 : (separate) chaining
- keep linked list in each bucket
- given a key/object x, perform Insert/Delete/Lookup in the list in A[h(x)]
Solution #2 : open addressing (only one object per bucket)
- Hash function now specifies probe sequence h1(x),h2(x),.. (keep trying till find open slot)
- Examples : linear probing (look consecutively), double hashing
6. Properties of a “Good” Hash function
1) Should lead to good performance => i.e., should “spread data out”
(gold standard – completely random hashing)
2) Should be easy to store/ very fast to evaluate.
7. Quick-and-Dirty Hash Functions:
Object U --> (hash code) Integers --> (compression function like mod n function) buckets {0, 1, ..., n-1}
How to choose n = # of buckets:
a) Choose n to be a prime ( within constant factor of # of objects in table)
b) Not too close to a power of 2
c) Not too close to a power of 10