Introduction to Algorithms (Table Doubling, Karp-Rabin)

How Large should Table be?

  • want m = Θ(n) at all times

Idea

Start small (constant) and grow (or shrink) at necessary

Rehashing

To grow or shrink table hash function must change

  • must rebuild hash table from scratch
  • Θ(n + m) time = Θ(n), if m = Θ(n)

How fast to grow

When n reaches m, say

  • m += 1, rebuild every step, n inserts cost Θ(n^2)
  • m *= 2, rebuild at insertion 2^i, n inserts cost Θ(n)
  • a few inserts cost linear time, but Θ(1) “on average”

Amortized Analysis

This is a common technique in data structures 

  • an operation has amortized cost T(n) if k operations cost ≤ k · T(n)
  • “T(n) amortized” roughly means T(n) “on average”, but averaged over all ops.
  • e.g. inserting into a hash table takes O(1) amortized time.

Back to hashing

Maintain m = Θ(n) =⇒ α = Θ(1) =⇒ support search in O(1) expected time (assuming simple uniform or universal hashing)

Deletion

Also, O(1) expected as is.

  • space can get big with respect to n e.g. n× insert, n× delete
  • solution: when n decreases to m/4, shrink to half the size =⇒ O(1) amortized cost for both insert and delete

Resizable Arrays

list.append and list.pop in O(1) amortized

String Matching

Given two strings s & t: does s occur as a substring of t

Simple Algorithm:

any(s == t[i : i + len(s)] for i in range(len(t) − len(s)))

O(|s|) time for each substring comparison

O(|s| · (|t| − |s|)) time = O(|s| · |t|) potentially quadratic

Karp-Rabin Algorithm

Rolling Hash ADT:

Maintain string x subject to

  • r(): reasonable hash function h(x) on string x
  • r.append(c): add letter c to end of string x
  • r.skip(c): remove the front letter from string x, assuming it is c

Karp-Rabin Application:

for c in s: 
    rs.append(c)
for c in t[:len(s)]:
    rt.append(c)
if rs() == rt(): ...
                                        O(|s|)
for i in range(len(s), len(t)):
    rt.skip(t[i-len(s)])
    rt.append(t[i])
    if rs() == rt(): ...
                                        O(|t|) + O(#matches*|s|)

Data Structure:

Treat string x as a multi-digit number u in base a where a denotes the alphabet size, e.g., 256

  • r() = u mod p for (ideally random) prime p ≈ |s| or |t| (division method)
  • r stores u mod p and |x| (really a^{|x|} ), not u ⇒ smaller and faster to work with (u mod p fits in one machine word)
  • r.append(c): (u·a + ord(c)) mod p = [(u mod p) · a + ord(c)] mod p
  • r.skip(c): [u − ord(c) · (a^{|u|-1}mod p)] mod p = [(u mod p) − ord(c) · (a^{|x|-1} mod p)] mod p

你可能感兴趣的:(Introduction to Algorithms (Table Doubling, Karp-Rabin))