Bloom Filters

1.  Bloom Filters: Supported Operations

     Fast Inserts and Lookups.

 

2.  Comparison to Hash Tables:

     Pros: more space efficient.

     Cons:

    1) can’t store an associated object

    2) No deletions

    3) Small false positive probability (i.e., might say x has been inserted even though it hasn’t been)

 

 

3.  Applications :

    Original: early spellcheckers. (insert valid words into BF)

    Canonical: list of forbidden passwords

    Modern: network routers - Limited memory, need to be super-fast

 

4.  Bloom Filter Implementation

    Ingredients: 1) array of n bits ( So n/|S| = # of bits per object in data set S)

 

                        2) k hash functions h1,…..,hk (k = small constant)

     Insert(x): for i = 1,2,…,k 

                       set A[h (x)]=1 (whether or not bit already set to 1)

     Lookup(x): return true if A[hi(x)] = 1 for every I = 1,2,….,k.

     Note: no false negatives. (if x was inserted, Lookup (x) guaranteed to succeed)

               But false positive if all k hi(x)’s already set to 1 by other insertions.

 

5.  Heuristic Analysis

     Intuition: should be a trade-off between space and error (false positive) probability.

     Assume: all hi(x)’s uniformly random and independent (across different i’s and x’s).

     Setup: n bits, insert data set S into bloom filter.

     Note: for each bit of A, the probability it’s been set to 1 is (under above assumption):

       1- (1- 1/n) ^ (|S|k) <= 1- e^ (-|S|k/n) = 1- e^(-k/b),  b is the number of bits used per object

       So under assumption, for x not in S, false positive probability is <=(1- e^(-k/b)) ^ k

        For fixed b, error rate is minimized by setting k to about (ln2)b 

        So the min error rate is about 1/2 ^((ln2) b) or b is about 1.44 log2 (1/min error rate)

     Example: with b = 8, choose k = 5 or 6 , error probability only approximately 2%. 

        

     

 

 

你可能感兴趣的:(Hash table,Bloom Filters,false positive,error rate)