解:
DIRECT-ADDRESS-FINDMAX(T)
for i = T.length - 1 to 0
if T[i] != NIL
return T[i]
最坏情况 O ( m ) O(m) O(m)。
思路:1代表存在,0代表不存在;插入置位,删除复位。
思路:可以将寻址表的每一个元素指向包含相同关键字的一个双向循环链表。再使用第10章的相关知识完成。
解(来自参考答案):
We denote the huge array by T T T and, taking the hint from the book, we also have a stack implemented by an array S S S. The size of S S S equals the number of keys actually stored, so that S S S should be allocated at the dictionary’s maximum size. The stack
has an attribute S . t o p S.top S.top, so that only entries S [ 1.. S . t o p ] S[1..S.top] S[1..S.top] are valid.
The idea of this scheme is that entries of T T T and S S S validate each other. If key k k k is
actually stored in T T T, then T [ k ] T[k] T[k] contains the index, say j j j, of a valid entry in S S S, and
S [ j ] S[j] S[j] contains the value k k k. Let us call this situation, in which 1 ≤ T [ k ] ≤ S . t o p 1 \le T[k] \le S.top 1≤T[k]≤S.top, S [ T [ k ] ] = k S[T[k]] = k S[T[k]]=k, and T [ S [ j ] ] = j T[S[j]] = j T[S[j]]=j, a validating cycle.
Assuming that we also need to store pointers to objects in our direct-address table, we can store them in an array that is parallel to either T T T or S S S. Since S S S is smaller than T T T, we’ll use an array S ′ S' S′, allocated to be the same size as S S S, for these pointers. Thus, if the dictionary contains an object x x x with key k k k, then there is a validating cycle and S ′ [ T [ k ] ] S'[T[k]] S′[T[k]] points to x x x.
The operations on the dictionary work as follows:
Initialization: Simply set S . t o p = 0 S.top = 0 S.top=0, so that there are no valid entries in the stack.
SEARCH: Given key k k k, we check whether we have a validating cycle, i.e., whether 1 ≤ T [ k ] ≤ S . t o p 1 \le T [k] \le S.top 1≤T[k]≤S.top and S [ T [ k ] ] = k S[T[k]] = k S[T[k]]=k. If so, we return S ′ [ T [ k ] ] S'[T[k]] S′[T[k]], and otherwise we return NIL \text{NIL} NIL.
INSERT: To insert object x x x with key k k k, assuming that this object is not already in the dictionary, we increment S . t o p S.top S.top, set S [ S . t o p ] = k S[S.top] = k S[S.top]=k, set S ′ [ S . t o p ] = x S'[S.top] = x S′[S.top]=x, and set T [ k ] = S . t o p T[k] = S.top T[k]=S.top.
DELETE: To delete object x x x with key k k k, assuming that this object is in the dictionary, we need to break the validating cycle. The trick is to also ensure that we don’t leave a “hole” in the stack, and we solve this problem by moving the top entry of the stack into the position that we are vacating-and then fixing up that entry’s validating cycle. That is, we execute the following sequence of assignments:
S [ T [ k ] ] = S [ S . t o p ] S ′ [ T [ k ] ] = S ′ [ S . t o p ] T [ S [ T [ k ] ] ] = T [ k ] T [ k ] = 0 S . t o p = S . t o p − 1 \begin{aligned} & S[T[k]] = S[S.top] \\ & S'[T[k]] = S'[S.top] \\ & T[S[T[k]]] = T[k] \\ & T[k] = 0 \\ & S.top = S.top - 1 \end{aligned} S[T[k]]=S[S.top]S′[T[k]]=S′[S.top]T[S[T[k]]]=T[k]T[k]=0S.top=S.top−1
Each of these operation - initialization, SEARCH \text{SEARCH} SEARCH, INSERT \text{INSERT} INSERT, and DELETE \text{DELETE} DELETE-takes O ( 1 ) O(1) O(1) time.
解:对不相同的kl组合求1/m的和,可得 n ( n − 1 ) 2 m \frac{n(n-1)}{2m} 2mn(n−1)。
解:中间过程略,最后结果是
0
1→10→19→28
2→20
3→12
4
5→5
6→33→15
7
8→17
解:
查找:期望时间不变,但查找的值越大所需时间越多(如果是单链表升序排列的话)
插入:期望时间不变,所需时间略多(需要执行一次时间复杂度是 O ( 1 ) O(1) O(1)的插入操作)
删除:期望时间不变,但删除的值越大所需时间越大(如果是单链表升序排列的话)
思路:标志位用来标志该槽位是否被占用,如果没有被占用,两个指针分别指向前一个和后一个空槽位(如同一个双向链表);如果被占用,一个指针指向保存的元素。
解(来自参考答案):
The flag in each slot will indicate whether the slot is free.
(每个插槽中的标志将指示插槽是否空闲。)
A free slot is in the free list, a doubly linked list of all free slots in the table. The slot thus contains two pointers.
A used slot contains an element and a pointer (possibly NIL \text{NIL} NIL) to the next element that hashes to this slot. (Of course, that pointer points to another slot in the table.)
(空闲插槽位于空闲列表中,空闲列表是表中所有空闲插槽的双向链表。因此,槽包含两个指针。已使用的插槽包含一个元素和一个指向下一个散列到此插槽的元素的指针(可能是 NIL \text {NIL} NIL)。(当然,该指针指向表中的另一个插槽。))
Operations(操作)
Insertion(插入):
If the element hashes to a free slot, just remove the slot from the free list and store the element there (with a NIL \text{NIL} NIL pointer). The free list must be doubly linked in order for this deletion to run in O ( 1 ) O(1) O(1) time.
If the element hashes to a used slot j j j, check whether the element x x x already there “belongs” there (its key also hashes to slot j j j).
If so, add the new element to the chain of elements in this slot. To do so, allocate a free slot (e.g., take the head of the free list) for the new element and put this new slot at the head of the list pointed to by the hashed-to slot ( j j j).
If not, E E E is part of another slot’s chain. Move it to a new slot by allocating one from the free list, copying the old slot’s ( j j j's) contents (element x x x and pointer) to the new slot, and updating the pointer in the slot that pointed to j j j to point to the new slot. Then insert the new element in the now-empty slot as usual.
To update the pointer to j j j, it is necessary to find it by searching the chain of elements starting in the slot x x x hashes to.
Deletion(删除):
Let j j j be the slot the element x x x to be deleted hashes to.
If x x x is the only element in j j j ( j j j doesn’t point to any other entries), just free the slot, returning it to the head of the free list.
If x x x is in j j j but there’s a pointer to a chain of other elements, move the first pointed-to entry to slot j j j and free the slot it was in.
If x x x is found by following a pointer from j j j, just free x x x's slot and splice it out of the chain (i.e., update the slot that pointed to x x x to point to x x x's successor).
Searching(查找):
Check the slot the key hashes to, and if that is not the desired element, follow the chain of pointers from the slot.
All the operations take expected O ( 1 ) O(1) O(1) times for the same reason they do with the version in the book: The expected time to search the chains is O ( 1 + α ) O(1 + \alpha) O(1+α) regardless of where the chains are stored, and the fact that all the elements are stored in the table means that α ≤ 1 \alpha \le 1 α≤1. If the free list were singly linked, then operations that involved removing an arbitrary slot from the free list would not run in O ( 1 ) O(1) O(1) time.
这不是很显然吗。。 ∣ U ∣ m > n \frac{|U|}{m}>n m∣U∣>n必然至少有一个槽中有多于n个的元素,鸽笼原理?
思路:最长链长度为L,共有m条链,可以看成一个m行L列的矩阵,只要调用RANDOM(1, m)和RANDOM(1, L),直到找到一个包含元素的位置,需要mL/n(即L/α)次,再查找该元素即可。
思路:比较链表中元素的散列值和给定关键字的散列值。
解:
sum = 0
for i = 1 to r
sum = (sum *128 + s[i]) mod m // 使用sum作为散列值
解(来自参考答案):
First, we observe that we can generate any permutation by a sequence of interchanges of pairs of characters. One can prove this property formally, but informally, consider that both heapsort and quicksort work by interchanging pairs of elements and that they have to be able to produce any permutation of their input array. Thus, it suffices to show that if string x x x can be derived from string y y y by interchanging a single pair of characters, then x x x and y y y hash to the same value.
(首先,我们观察到我们可以通过一系列字符交换生成任何排列。可以正式地证明这个属性,但是非正式地,考虑堆排序和快速排序都可以通过交换元素对来工作,并且他们必须能够产生输入数组的任何排列。因此,足以证明如果字符串 x x x可以通过交换一对字符从字符串 y y y派生,那么 x x x和 y y y将散列到相同的值。)
Let us denote the i i ith character in x x x by x i x_i xi, and similarly for y y y. The interpretation of x x x in radix 2 p 2^p 2p is ∑ i = 0 n − 1 x i 2 i p \sum_{i = 0}^{n - 1} x_i 2^{ip} ∑i=0n−1xi2ip, and so h ( x ) = ( ∑ i = 0 n − 1 x i 2 i p ) m o d    ( 2 p − 1 ) h(x) = (\sum_{i = 0}^{n - 1} x_i 2^{ip}) \mod (2^p - 1) h(x)=(∑i=0n−1xi2ip)mod(2p−1). Similarly, h ( y ) = ( ∑ i = 0 n − 1 y i 2 i p ) m o d    ( 2 p − 1 ) h(y) = (\sum_{i = 0}^{n - 1} y_i 2^{ip}) \mod (2^p - 1) h(y)=(∑i=0n−1yi2ip)mod(2p−1).
Suppose that x x x and y y y are identical strings of n n n characters except that the characters in positions a a a and b b b are interchanged: x a = y b x_a = y_b xa=yb and y a = x b y_a = x_b ya=xb. Without loss of generality, let a > b a > b a>b. We have
h ( x ) − h ( y ) = ( ∑ i = 0 n − 1 x i 2 i p ) m o d    ( 2 p − 1 ) − ( ∑ i = 0 n − 1 y i 2 i p ) m o d    ( 2 p − 1 ) . h(x) - h(y) = \Big(\sum_{i = 0}^{n - 1} x_i 2^{ip}\Big) \mod (2^p - 1) - \Big(\sum_{i = 0}^{n - 1} y_i 2^{ip}\Big) \mod (2^p - 1). h(x)−h(y)=(i=0∑n−1xi2ip)mod(2p−1)−(i=0∑n−1yi2ip)mod(2p−1).
Since 0 ≤ h ( x ) 0 \le h(x) 0≤h(x), h ( y ) < 2 p − 1 h(y) < 2^p - 1 h(y)<2p−1, we have that − ( 2 p − 1 ) < h ( x ) − h ( y ) < 2 p − 1 -(2^p - 1) < h(x) - h(y) < 2^p - 1 −(2p−1)<h(x)−h(y)<2p−1. If we show that ( h ( x ) − h ( y ) ) m o d    ( 2 p − 1 ) = 0 (h(x) - h(y)) \mod (2^p - 1) = 0 (h(x)−h(y))mod(2p−1)=0, then h ( x ) = h ( y ) h(x) = h(y) h(x)=h(y).
Since the sums in the hash functions are the same except for indices a a a and b b b, we have
( h ( x ) − h ( y ) ) m o d    ( 2 p − 1 ) = ( ( x a 2 a p + x b 2 b p ) − ( y a 2 a p + y b 2 b p ) ) m o d    ( 2 p − 1 ) = ( ( x a 2 a p + x b 2 b p ) − ( x b 2 a p + x a 2 b p ) ) m o d    ( 2 p − 1 ) = ( ( x a − x b ) 2 a p − ( x a − x b ) 2 b p ) m o d    ( 2 p − 1 ) = ( ( x a − x b ) ( 2 a p − 2 b p ) ) m o d    ( 2 p − 1 ) = ( ( x a − x b ) 2 b p ( 2 ( a − b ) p − 1 ) ) m o d    ( 2 p − 1 ) . \begin{aligned} (h(x) - h(y)) \mod (2^p - 1) & = ((x_a 2^{ap} + x_b 2^{bp}) - (y_a 2^{ap} + y_b 2^{bp})) \mod (2^p - 1) \\ & = ((x_a 2^{ap} + x_b 2^{bp}) - (x_b 2^{ap} + x_a 2^{bp})) \mod (2^p - 1) \\ & = ((x_a - x_b)2^{ap} - (x_a - x_b) 2^{bp}) \mod (2^p - 1) \\ & = ((x_a - x_b)(2^{ap} - 2^{bp})) \mod (2^p - 1) \\ & = ((x_a - x_b)2^{bp}(2^{(a - b)p} - 1)) \mod (2^p - 1). \end{aligned} (h(x)−h(y))mod(2p−1)=((xa2ap+xb2bp)−(ya2ap+yb2bp))mod(2p−1)=((xa2ap+xb2bp)−(xb2ap+xa2bp))mod(2p−1)=((xa−xb)2ap−(xa−xb)2bp)mod(2p−1)=((xa−xb)(2ap−2bp))mod(2p−1)=((xa−xb)2bp(2(a−b)p−1))mod(2p−1).
By equation (A.5) \text{(A.5)} (A.5),
∑ i = 0 a − b − 1 2 p i = 2 ( a − b ) p − 1 2 p − 1 , \sum_{i = 0}^{a - b - 1} 2^{pi} = \frac{2^{(a - b)p} - 1}{2^p - 1}, i=0∑a−b−12pi=2p−12(a−b)p−1,
and multiplying both sides by s p − 1 s^p - 1 sp−1, we get 2 ( a − b ) p − 1 = ( ∑ i = 0 a − b − 1 2 p i ) ( 2 p − 1 ) 2^{(a - b)p} - 1 = \big(\sum_{i = 0}^{a - b - 1} 2^{pi}\big)(2^p - 1) 2(a−b)p−1=(∑i=0a−b−12pi)(2p−1). Thus,
( h ( x ) − h ( y ) ) m o d    ( 2 p − 1 ) = ( ( x a − x b ) 2 b p ( ∑ i = 0 a − b − 1 2 p i ) ( 2 p − 1 ) ) m o d    ( 2 p − 1 ) = 0 , \begin{aligned} (h(x) - h(y))\mod(2^p - 1) & = \Bigg((x_a - x_b)2^{bp}\Bigg(\sum_{i = 0}^{a - b - 1} 2^{pi}\Bigg)(2^p - 1)\Bigg) \mod (2^p - 1) \\ & = 0, \end{aligned} (h(x)−h(y))mod(2p−1)=((xa−xb)2bp(i=0∑a−b−12pi)(2p−1))mod(2p−1)=0,
since one of the factors is 2 p − 1 2^p - 1 2p−1.
We have shown that ( h ( x ) − h ( y ) ) m o d    ( 2 p − 1 ) = 0 (h(x) - h(y)) \mod (2^p - 1) = 0 (h(x)−h(y))mod(2p−1)=0, and so h ( x ) = h ( y ) h(x) = h(y) h(x)=h(y).
解:
h ( 61 ) = 700 h(61) = 700 h(61)=700
h ( 62 ) = 318 h(62) = 318 h(62)=318
h ( 63 ) = 936 h(63) = 936 h(63)=936
h ( 64 ) = 554 h(64) = 554 h(64)=554
h ( 65 ) = 172 h(65) = 172 h(65)=172
解(来自参考答案):
Let b = ∣ B ∣ b = |B| b=∣B∣ and u = ∣ U ∣ u = |U| u=∣U∣. We start by showing that the total number of collisions is minimized by a hash function that maps u / b u / b u/b elements of U U U to each of the b b b values in B B B. For a given hash function, let u j u_j uj be the number of elements that map to j ∈ B j \in B j∈B. We have u = ∑ j ∈ B u j u = \sum_{j \in B} u_j u=∑j∈Buj. We also have that the number of collisions for a given value of j ∈ B j \in B j∈B is ( u j 2 ) = u j ( u j − 1 ) / 2 \binom{u_j}{2} = u_j(u_j - 1) / 2 (2uj)=uj(uj−1)/2.
Lemma
The total number of collisions is minimized when u j = u / b u_j = u / b uj=u/b for each j ∈ B j \in B j∈B.
Proof
If u j ≤ u / b u_j \le u / b uj≤u/b, let us call j j j underloaded, and if u j ≥ u / b u_j \ge u / b uj≥u/b, let us call j j j overloaded. Consider an unbalanced situation in which u j ≠ u / b u_j \ne u / b uj̸=u/b for at least one value j ∈ B j \in B j∈B. We can think of converting a balanced situation in which all u j u_j uj equal u / b u / b u/b into the unbalanced situation by repeatedly moving an element that maps to an underloaded value to map instead to an overloaded value. (If you think of the values of B B B as representing buckets, we are repeatedly moving elements from buckets containing at most u / b u / b u/b elements to buckets containing at least u / b u / b u/b elements.)
We now show that each such move increases the number of collisions, so that all the moves together must increase the number of collisions. Suppose that we move an element from an underloaded value j j j to an overloaded value k k k, and we leave all other elements alone. Because j j j is underloaded and k k k is overloaded, u j ≤ u / b ≤ u k u_j \le u / b\le u_k uj≤u/b≤uk. Considering just the collisions for values j j j and k k k, we have u j ( u j − 1 ) / 2 + u k ( u k − 1 ) / 2 u_j(u_j - 1) / 2 + u_k(u_k - 1) / 2 uj(uj−1)/2+uk(uk−1)/2 collisions before the move and ( u j − 1 ) ( u j − 2 ) / 2 + ( u k + 1 ) u k / 2 (u_j - 1)(u_j - 2) / 2 + (u_k + 1)u_k / 2 (uj−1)(uj−2)/2+(uk+1)uk/2 collisions afterward. We wish to show that
u j ( u j − 1 ) / 2 + u k ( u k − 1 ) / 2 < ( u j − 1 ) ( u j − 2 ) / 2 + ( u k + 1 ) u k / 2. u_j(u_j - 1) / 2 + u_k(u_k - 1) / 2 < (u_j - 1)(u_j - 2) / 2 + (u_k + 1)u_k / 2. uj(uj−1)/2+uk(uk−1)/2<(uj−1)(uj−2)/2+(uk+1)uk/2.
We have the following sequence of equivalent inequalities:
u j < u k + 1 2 u j < 2 u k + 2 − u k < u k − 2 u j + 2 u j 2 − u j + u k 2 − u k < u j 2 − 3 u j + 2 + u k 2 + u k u j ( u j − 1 ) + u k ( u k − 1 ) < ( u j − 1 ) ( u j − 2 ) + ( u k + 1 ) u k u j ( u j − 1 ) / 2 + u k ( u k − 1 ) / 2 < ( u j − 1 ) ( u j − 2 ) / 2 + ( u k + 1 ) u k / 2. \begin{aligned} u_j & < u_k + 1 \\ 2u_j & < 2u_k + 2 \\ -u_k & < u_k - 2u_j + 2 \\ u_j^2 - u_j + u_k^2 - u_k & < u_j^2 - 3u_j + 2 + u_k^2 + u_k \\ u_j(u_j - 1) + u_k(u_k - 1) & < (u_j - 1)(u_j - 2) + (u_k + 1)u_k \\ u_j(u_j - 1) / 2 + u_k(u_k - 1) / 2 & < (u_j - 1)(u_j - 2) / 2 + (u_k + 1)u_k / 2. \end{aligned} uj2uj−ukuj2−uj+uk2−ukuj(uj−1)+uk(uk−1)uj(uj−1)/2+uk(uk−1)/2<uk+1<2uk+2<uk−2uj+2<uj2−3uj+2+uk2+uk<(uj−1)(uj−2)+(uk+1)uk<(uj−1)(uj−2)/2+(uk+1)uk/2.
Thus, each move increases the number of collisions. We conclude that the number of collisions is minimized when u j = u / b u_j = u / b uj=u/b for each j ∈ B j \in B j∈B.
By the above lemma, for any hash function, the total number of collisions must be at least b ( u / b ) ( u / b − 1 ) / 2 b(u / b)(u / b - 1) / 2 b(u/b)(u/b−1)/2. The number of pairs of distinct elements is ( u 2 ) = u ( u − 1 ) / 2 \binom{u}{2} = u(u - 1) / 2 (2u)=u(u−1)/2. Thus, the number of collisions per pair of distinct elements must be at least
b ( u / b ) ( u / b − 1 ) / 2 u ( u − 1 ) / 2 = u / b − 1 u − 1 > u / b − 1 u = 1 b − 1 u . \begin{aligned} \frac{b(u / b)(u / b - 1) / 2}{u(u - 1) / 2} & = \frac{u / b - 1}{u - 1} \\ & > \frac{u / b - 1}{u} \\ & = \frac{1}{b} - \frac{1}{u}. \end{aligned} u(u−1)/2b(u/b)(u/b−1)/2=u−1u/b−1>uu/b−1=b1−u1.
Thus, the bound on the probability of a collision for any pair of distinct elements can be no less than 1 / b − 1 / u = 1 / ∣ B ∣ − 1 / ∣ U ∣ 1 / b - 1 / u = 1 / |B| - 1 / |U| 1/b−1/u=1/∣B∣−1/∣U∣.
证明(来自参考答案):
Fix b ∈ Z p b \in \mathbb Z_p b∈Zp. By exercise 31.4-4, h b ( x ) h_b(x) hb(x) collides with h b ( y ) h_b(y) hb(y) for at most n − 1 n - 1 n−1 other y ∈ U y \in U y∈U. Since there are a total of p p p possible values that h b h_b hb takes on, the
probability that h b ( x ) = h b ( y ) h_b(x) = h_b(y) hb(x)=hb(y) is bounded from above by n − 1 p \frac{n - 1}{p} pn−1, since this holds for any value of b b b, H \mathcal H H is ( ( n − 1 ) / p ) ((n - 1 ) /p) ((n−1)/p)-universal.
解:
线性探查:
h ( k , i ) = ( k + i ) m o d    11 T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 0 m o d    11 22 22 22 22 22 22 22 22 1 m o d    11 88 88 2 m o d    11 3 m o d    11 4 m o d    11 4 4 4 4 4 4 5 m o d    11 15 15 15 15 15 6 m o d    11 28 28 28 28 7 m o d    11 17 17 17 8 m o d    11 59 9 m o d    11 31 31 31 31 31 31 31 10 m o d    11 10 10 10 10 10 10 10 10 10 \begin{array}{r|ccccccccc} h(k, i) = (k + i) \mod 11 & T_0 & T_1 & T_2 & T_3 & T_4 & T_5 & T_6 & T_7 & T_8 \\ \hline 0 \mod 11 & & 22 & 22 & 22 & 22 & 22 & 22 & 22 & 22 \\ 1 \mod 11 & & & & & & & & 88 & 88 \\ 2 \mod 11 & & & & & & & & & \\ 3 \mod 11 & & & & & & & & & \\ 4 \mod 11 & & & & 4 & 4 & 4 & 4 & 4 & 4 \\ 5 \mod 11 & & & & & 15 & 15 & 15 & 15 & 15 \\ 6 \mod 11 & & & & & & 28 & 28 & 28 & 28 \\ 7 \mod 11 & & & & & & & 17 & 17 & 17 \\ 8 \mod 11 & & & & & & & & & 59 \\ 9 \mod 11 & & & 31 & 31 & 31 & 31 & 31 & 31 & 31 \\ 10 \mod 11 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 \end{array} h(k,i)=(k+i)mod110mod111mod112mod113mod114mod115mod116mod117mod118mod119mod1110mod11T010T12210T2223110T32243110T4224153110T522415283110T62241528173110T7228841528173110T822884152817593110
二次探查:
h ( k , i ) = ( k + i + 3 i 2 ) m o d    11 T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 0 m o d    11 22 22 22 22 22 22 22 22 1 m o d    11 2 m o d    11 88 88 3 m o d    11 17 17 17 4 m o d    11 4 4 4 4 4 4 5 m o d    11 6 m o d    11 28 28 28 28 7 m o d    11 59 8 m o d    11 15 15 15 15 15 9 m o d    11 31 31 31 31 31 31 31 10 m o d    11 10 10 10 10 10 10 10 10 10 \begin{array}{r|ccccccccc} h(k, i) = (k + i + 3i^2) \mod 11 & T_0 & T_1 & T_2 & T_3 & T_4 & T_5 & T_6 & T_7 & T_8 \\ \hline 0 \mod 11 & & 22 & 22 & 22 & 22 & 22 & 22 & 22 & 22 \\ 1 \mod 11 & & & & & & & & & \\ 2 \mod 11 & & & & & & & & 88 & 88 \\ 3 \mod 11 & & & & & & & 17 & 17 & 17 \\ 4 \mod 11 & & & & 4 & 4 & 4 & 4 & 4 & 4 \\ 5 \mod 11 & & & & & & & & & \\ 6 \mod 11 & & & & & & 28 & 28 & 28 & 28 \\ 7 \mod 11 & & & & & & & & & 59 \\ 8 \mod 11 & & & & & 15 & 15 & 15 & 15 & 15 \\ 9 \mod 11 & & & 31 & 31 & 31 & 31 & 31 & 31 & 31 \\ 10 \mod 11 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 \end{array} h(k,i)=(k+i+3i2)mod110mod111mod112mod113mod114mod115mod116mod117mod118mod119mod1110mod11T010T12210T2223110T32243110T4224153110T522428153110T62217428153110T7228817428153110T822881742859153110
双重散列:
h ( k , i ) = ( k + i ( 1 + k m o d    10 ) ) m o d    11 T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 0 m o d    11 22 22 22 22 22 22 22 22 1 m o d    11 2 m o d    11 59 3 m o d    11 17 17 17 4 m o d    11 4 4 4 4 4 4 5 m o d    11 15 15 15 15 15 6 m o d    11 28 28 28 28 7 m o d    11 88 88 8 m o d    11 9 m o d    11 31 31 31 31 31 31 31 10 m o d    11 10 10 10 10 10 10 10 10 10 \begin{array}{r|ccccccccc} h(k, i) = (k + i(1 + k \mod 10)) \mod 11 & T_0 & T_1 & T_2 & T_3 & T_4 & T_5 & T_6 & T_7 & T_8 \\ \hline 0 \mod 11 & & 22 & 22 & 22 & 22 & 22 & 22 & 22 & 22 \\ 1 \mod 11 & & & & & & & & & \\ 2 \mod 11 & & & & & & & & & 59 \\ 3 \mod 11 & & & & & & & 17 & 17 & 17 \\ 4 \mod 11 & & & & 4 & 4 & 4 & 4 & 4 & 4 \\ 5 \mod 11 & & & & & 15 & 15 & 15 & 15 & 15 \\ 6 \mod 11 & & & & & & 28 & 28 & 28 & 28 \\ 7 \mod 11 & & & & & & & & 88 & 88 \\ 8 \mod 11 & & & & & & & & & \\ 9 \mod 11 & & & 31 & 31 & 31 & 31 & 31 & 31 & 31 \\ 10 \mod 11 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 \end{array} h(k,i)=(k+i(1+kmod10))mod110mod111mod112mod113mod114mod115mod116mod117mod118mod119mod1110mod11T010T12210T2223110T32243110T4224153110T522415283110T62217415283110T7221741528883110T822591741528883110
解:
HASH-DELETE(T, k)
i = 0
repeat
j = h(k, i)
if T[j] == k
T[j] = DELETED
return j
i = i + 1
until T[j] == NIL or i == m
error "element not exist"
HASH-INSERT(T, k)
i = 0
repeat
j = h(k, i)
if T[j] == NIL or T[j] == DELETED
T[j] = k
return j
else i = i + 1
until i == m
error "hash table oveflow"
解:
α = 3 / 4 \alpha=3/4 α=3/4:不成功4次,成功约1.848次
α = 7 / 8 \alpha=7/8 α=7/8:不成功8次,成功约2.377次
证明:假设 d = gcd ( m , h 2 ( k ) ) d = \gcd(m, h_2(k)) d=gcd(m,h2(k)), 最小公倍数 l = m ⋅ h 2 ( k ) / d l = m \cdot h_2(k) / d l=m⋅h2(k)/d。
因为 d ∣ h 2 ( k ) d | h_2(k) d∣h2(k),有 m ⋅ h 2 ( k ) / d m o d    m = 0 ⋅ ( h 2 ( k ) / d m o d    m ) = 0 m \cdot h_2(k) / d \mod m = 0 \cdot (h_2(k) / d \mod m) = 0 m⋅h2(k)/dmodm=0⋅(h2(k)/dmodm)=0,因此 ( l + i h 2 ( k ) ) m o d    m = i h 2 ( k ) m o d    m (l + ih_2(k)) \mod m = ih_2(k) \mod m (l+ih2(k))modm=ih2(k)modm,意味着 i h 2 ( k ) m o d    m ih_2(k) \mod m ih2(k)modm有周期 m / d m / d m/d。
解:
1 1 − α = 2 ⋅ 1 α ln 1 1 − α α = 0.71533. \begin{aligned} \frac{1}{1 - \alpha} & = 2 \cdot \frac{1}{\alpha} \ln\frac{1}{1 - \alpha} \\ \alpha & = 0.71533. \end{aligned} 1−α1α=2⋅α1ln1−α1=0.71533.
证明(来自参考答案): p ( n , m ) = m m ⋅ m − 1 m ⋯ m − n + 1 m = m ⋅ ( m − 1 ) ⋯ ( m − n + 1 ) m n . \begin{aligned} p(n, m) & = \frac{m}{m} \cdot \frac{m - 1}{m} \cdots \frac{m - n + 1}{m} \\ & = \frac{m \cdot (m - 1) \cdots (m - n + 1)}{m^n}. \end{aligned} p(n,m)=mm⋅mm−1⋯mm−n+1=mnm⋅(m−1)⋯(m−n+1).
( m − i ) ⋅ ( m − n + i ) = ( m − n 2 + n 2 − i ) ⋅ ( m − n 2 − n 2 + i ) = ( m − n 2 ) 2 − ( i − n 2 ) 2 ≤ ( m − n 2 ) 2 . \begin{aligned} (m - i) \cdot (m - n + i) & = (m - \frac{n}{2} + \frac{n}{2} - i) \cdot (m - \frac{n}{2} - \frac{n}{2} + i) \\ & = (m - \frac{n}{2})^2 - (i - \frac{n}{2})^2 \\ & \le (m - \frac{n}{2})^2. \end{aligned} (m−i)⋅(m−n+i)=(m−2n+2n−i)⋅(m−2n−2n+i)=(m−2n)2−(i−2n)2≤(m−2n)2.
p ( n , m ) ≤ m ⋅ ( m − n 2 ) n − 1 m n = ( 1 − n 2 m ) n − 1 . \begin{aligned} p(n, m) & \le \frac{m \cdot (m - \frac{n}{2})^{n - 1}}{m^n} \\ & = (1 - \frac{n}{2m}) ^ {n - 1}. \end{aligned} p(n,m)≤mnm⋅(m−2n)n−1=(1−2mn)n−1.
基于式 (3.12) \text{(3.12)} (3.12), e x ≥ 1 + x e^x \ge 1 + x ex≥1+x,
p ( n , m ) ≤ ( e − n / 2 m ) n − 1 = e − n ( n − 1 ) / 2 m . \begin{aligned} p(n, m) & \le (e^{-n / 2m})^{n - 1} \\ & = e^{-n(n - 1) / 2m}. \end{aligned} p(n,m)≤(e−n/2m)n−1=e−n(n−1)/2m.
(以下来自参考答案)
Since we assume uniform hashing, we can use the same observation as is used in Corollary 11.7: that inserting a key entails an unsuccessful search followed by placing the key into the first empty slot found. As in the proof of Theorem 11.6, if we let X X X be the random variable denoting the number of probes in an unsuccessful search, then Pr { X ≥ i } ≤ α i − 1 \Pr\{X \ge i\} \le \alpha^{i - 1} Pr{X≥i}≤αi−1. Since n ≤ m / 2 n \le m / 2 n≤m/2, we have α ≤ 1 / 2 \alpha \le 1 / 2 α≤1/2. Letting i = k + 1 i = k + 1 i=k+1, we have Pr { X > k } = Pr { X ≥ k + 1 } ≤ ( 1 / 2 ) ( k + 1 ) − 1 = 2 − k \Pr\{X > k\} = \Pr\{X \ge k + 1\} \le (1 / 2)^{(k + 1) - 1} = 2^{-k} Pr{X>k}=Pr{X≥k+1}≤(1/2)(k+1)−1=2−k.
Substituting k = 2 lg n k = 2\lg n k=2lgn into the statement of part (a) yields that the probability that the i i ith insertion requires more than k = 2 lg n k = 2\lg n k=2lgn probes is at most 2 − 2 lg n = ( 2 lg n ) − 2 = n − 2 = 1 / n 2 2^{-2\lg n} = (2^{\lg n})^{-2} = n^{-2} = 1 / n^2 2−2lgn=(2lgn)−2=n−2=1/n2.
We must deal with the possibility that 2 lg n 2\lg n 2lgn is not an integer, however. Then the event that the i i ith insertion requires more than 2 lg n 2\lg n 2lgn probes is the same as the event that the i i ith insertion requires more than ⌊ 2 lg n ⌋ \lfloor 2\lg n \rfloor ⌊2lgn⌋ probes. Since ⌊ 2 lg n ⌋ > 2 lg n − 1 \lfloor 2\lg n \rfloor > 2\lg n - 1 ⌊2lgn⌋>2lgn−1, we have that the probability of this event is at most 2 − ⌊ 2 lg n ⌋ < 2 − ( 2 lg n − 1 ) = 2 / n 2 = O ( 1 / n 2 ) 2^{-\lfloor 2\lg n \rfloor} < 2^{-(2\lg n - 1)} = 2 / n^2 = O(1 / n^2) 2−⌊2lgn⌋<2−(2lgn−1)=2/n2=O(1/n2).
Let the event A A A be X > 2 lg n X > 2\lg n X>2lgn, and for i = 1 , 2 , … , n i = 1, 2, \ldots, n i=1,2,…,n, let the event A i A_i Ai be X i > 2 lg n X_i > 2\lg n Xi>2lgn. In part (b), we showed that Pr { A i } = O ( 1 / n 2 ) \Pr\{A_i\} = O(1 / n^2) Pr{Ai}=O(1/n2) for i = 1 , 2 , … , n i = 1, 2, \ldots, n i=1,2,…,n. From how we defined these events, A = A 1 ∪ A 2 ∪ ⋯ ∪ A n A = A_1 \cup A_2 \cup \cdots \cup A_n A=A1∪A2∪⋯∪An. Using Boole’s inequality, (C.19) \text{(C.19)} (C.19), we have
Pr { A } ≤ Pr { A 1 } + Pr { A 1 } + ⋯ + Pr { A n } ≤ n ⋅ O ( 1 / n 2 ) = O ( 1 / n ) . \begin{aligned} \Pr\{A\} & \le \Pr\{A_1\} + \Pr\{A_1\} + \cdots + \Pr\{A_n\} \\ & \le n \cdot O(1 / n^2) \\ & = O(1 / n). \end{aligned} Pr{A}≤Pr{A1}+Pr{A1}+⋯+Pr{An}≤n⋅O(1/n2)=O(1/n).
We use the definition of expectation and break the sum into two parts:
E [ X ] = ∑ k = 1 n k ⋅ Pr { X = k } = ∑ k = 1 ⌈ 2 lg n ⌉ k ⋅ Pr { X = k } + ∑ ⌈ 2 lg n ⌉ + 1 n k ⋅ Pr { X = k } ≤ ∑ k = 1 ⌈ 2 lg n ⌉ ⌈ 2 lg n ⌉ ⋅ Pr { X = k } + ∑ ⌈ 2 lg n ⌉ + 1 n n ⋅ Pr { X = k } = ⌈ 2 lg n ⌉ ∑ k = 1 ⌈ 2 lg n ⌉ Pr { X = k } + n ∑ ⌈ 2 lg n ⌉ + 1 n Pr { X = k } . \begin{aligned} \text E[X] & = \sum_{k = 1}^n k \cdot \Pr\{X = k\} \\ & = \sum_{k = 1}^{\lceil 2\lg n \rceil} k \cdot \Pr\{X = k\} + \sum_{\lceil 2\lg n \rceil + 1}^n k \cdot \Pr\{X = k\} \\ & \le \sum_{k = 1}^{\lceil 2\lg n \rceil} \lceil 2\lg n \rceil \cdot \Pr\{X = k\} + \sum_{\lceil 2\lg n \rceil + 1}^n n \cdot \Pr\{X = k\} \\ & = \lceil 2\lg n \rceil \sum_{k = 1}^{\lceil 2\lg n \rceil} \Pr\{X = k\} + n \sum_{\lceil 2\lg n \rceil + 1}^n \Pr\{X = k\}. \end{aligned} E[X]=k=1∑nk⋅Pr{X=k}=k=1∑⌈2lgn⌉k⋅Pr{X=k}+⌈2lgn⌉+1∑nk⋅Pr{X=k}≤k=1∑⌈2lgn⌉⌈2lgn⌉⋅Pr{X=k}+⌈2lgn⌉+1∑nn⋅Pr{X=k}=⌈2lgn⌉k=1∑⌈2lgn⌉Pr{X=k}+n⌈2lgn⌉+1∑nPr{X=k}.
Since X X X takes on exactly one value, we have that ∑ k = 1 ⌈ 2 lg n ⌉ Pr { X = k } = Pr { X ≤ ⌈ 2 lg n ⌉ } ≤ 1 \sum_{k = 1}^{\lceil 2\lg n \rceil} \Pr\{X = k\} = \Pr\{X \le \lceil 2\lg n \rceil\} \le 1 ∑k=1⌈2lgn⌉Pr{X=k}=Pr{X≤⌈2lgn⌉}≤1 and ∑ k = ⌈ 2 lg n ⌉ + 1 n Pr { X = k } ≤ Pr { X > 2 lg n } = O ( 1 / n ) \sum_{k = \lceil 2\lg n \rceil + 1}^n \Pr\{X = k\} \le \Pr\{X > 2\lg n\} = O(1 / n) ∑k=⌈2lgn⌉+1nPr{X=k}≤Pr{X>2lgn}=O(1/n), by part ©. Therefore,
E [ X ] ≤ ⌈ 2 lg n ⌉ ⋅ 1 + n ⋅ O ( 1 / n ) = ⌈ 2 lg n ⌉ + O ( 1 ) = O ( lg n ) . \begin{aligned} \text E[X] & \le \lceil 2\lg n \rceil \cdot 1 + n \cdot O(1 / n) \\ & = \lceil 2\lg n \rceil + O(1) \\ & = O(\lg n). \end{aligned} E[X]≤⌈2lgn⌉⋅1+n⋅O(1/n)=⌈2lgn⌉+O(1)=O(lgn).
A particular key is hashed to a particular slot with probability 1 / n 1 / n 1/n. Suppose we select a specific set of k k k keys. The probability that these k k k keys are inserted into the slot in question and that all other keys are inserted elsewhere is
( 1 n ) k ( 1 − 1 n ) n − k . \Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k}. (n1)k(1−n1)n−k.
Since there are ( n k ) \binom{n}{k} (kn) ways to choose our k k k keys, we get
Q k = ( 1 n ) k ( 1 − 1 n ) n − k ( n k ) . Q_k = \Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k} \binom{n}{k}. Qk=(n1)k(1−n1)n−k(kn).
For i = 1 , 2 , … , n i = 1, 2, \ldots, n i=1,2,…,n, let X i X_i Xi be a random variable denoting the number of keys that hash to slot i i i, and let A i A_i Ai be the event that X i = k X_i = k Xi=k, i.e., that exactly k k k keys hash to slot i i i. From part (a), we have Pr { A } = Q k \Pr\{A\} = Q_k Pr{A}=Qk. Then,
P k = Pr { M = k } = Pr { ( max 1 ≤ i ≤ n X i ) = k } = Pr { there exists i such that X i = k and that X i ≤ k for i = 1 , 2 , … , n } ≤ Pr { there exists i such that X i = k } = Pr { A 1 ∪ A 2 ∪ ⋯ ∪ A n } ≤ Pr { A 1 } + Pr { A 2 } + ⋯ + Pr { A n } (by inequality (C.19)) = n Q k . \begin{aligned} P_k & = \Pr\{M = k\} \\ & = \Pr\Big\{\Big(\max_{1 \le i \le n} X_i\Big) = k\Big\} \\ & = \Pr\{\text{there exists $i$ such that $X_i = k$ and that $X_i\le k$ for $i = 1, 2, \ldots, n$}\} \\ & \le \Pr\{\text{there exists $i$ such that $X_i = k$}\} \\ & = \Pr\{A_1 \cup A_2 \cup \cdots \cup A_n\} \\ & \le \Pr\{A_1\} + \Pr\{A_2\} + \cdots + \Pr\{A_n\} \qquad \text{(by inequality (C.19))} \\ & = nQ_k. \end{aligned} Pk=Pr{M=k}=Pr{(1≤i≤nmaxXi)=k}=Pr{there exists i such that Xi=k and that Xi≤k for i=1,2,…,n}≤Pr{there exists i such that Xi=k}=Pr{A1∪A2∪⋯∪An}≤Pr{A1}+Pr{A2}+⋯+Pr{An}(by inequality (C.19))=nQk.
We start by showing two facts. First, 1 − 1 / n < 1 1 - 1 / n < 1 1−1/n<1, which implies ( 1 − 1 / n ) n − k < 1 (1 - 1 / n)^{n - k} < 1 (1−1/n)n−k<1. Second, n ! / ( n − k ) ! = n ⋅ ( n − 1 ) ⋅ ( n − 2 ) ⋯ ( n − k + 1 ) < n k n! / (n - k)! = n \cdot (n - 1) \cdot (n - 2) \cdots (n - k + 1) < n^k n!/(n−k)!=n⋅(n−1)⋅(n−2)⋯(n−k+1)<nk. Using these facts, along with the simplification k ! > ( k / e ) k k! > (k / e)^k k!>(k/e)k of equation (3.18) \text{(3.18)} (3.18), we have
Q k = ( 1 n ) k ( 1 − 1 n ) n − k n ! k ! ( n − k ) ! < n ! n k k ! ( n − k ) ! ( ( 1 − 1 / n ) n − k < 1 ) < 1 k ! ( n ! / ( n − k ) ! < n k ) < e k k k . ( k ! > ( k / e ) k ) \begin{aligned} Q_k & = \Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k} \frac{n!}{k!(n - k)!} \\ & < \frac{n!}{n^k k! (n - k)!} & ((1 - 1 / n)^{n - k} < 1) \\ & < \frac{1}{k!} & (n! / (n - k)! < n^k) \\ & < \frac{e^k}{k^k}. & (k! > (k / e)^k) \end{aligned} Qk=(n1)k(1−n1)n−kk!(n−k)!n!<nkk!(n−k)!n!<k!1<kkek.((1−1/n)n−k<1)(n!/(n−k)!<nk)(k!>(k/e)k)
Notice that when n = 2 n = 2 n=2, lg lg n = 0 \lg\lg n = 0 lglgn=0, so to be precise, we need to assume that n ≥ 3 n \ge 3 n≥3.
In part ©, we showed that Q k < e k / k k Q_k < e^k / k^k Qk<ek/kk for any k k k; in particular, this inequality holds for k 0 k_0 k0. Thus, it suffices to show that e k 0 / k 0 k 0 < 1 / n 3 e^{k_0} / k_0^{k_0} < 1 / n^3 ek0/k0k0<1/n3 or, equivalently, that n 3 < k 0 k 0 / e k 0 n^3 < k_0^{k_0} / e^{k_0} n3<k0k0/ek0.
Taking logarithms of both sides gives an equivalent condition:
3 lg n < k 0 ( lg k 0 − lg e ) = c lg n lg lg n ( lg c + lg lg n − lg lg lg n − lg e ) . \begin{aligned} 3\lg n & < k_0(\lg k_0 - \lg e) \\ & = \frac{c\lg n}{\lg\lg n}(\lg c + \lg\lg n - \lg\lg\lg n - \lg e). \end{aligned} 3lgn<k0(lgk0−lge)=lglgnclgn(lgc+lglgn−lglglgn−lge).
Dividing both sides by lg n \lg n lgn gives the condition
3 < c lg lg n ( lg c + lg lg n − lg lg lg n − lg e ) = c ( 1 + lg c − lg e lg lg n − lg lg lg n lg lg n ) . \begin{aligned} 3 & < \frac{c}{\lg\lg n} (\lg c + \lg\lg n - \lg\lg\lg n - \lg e) \\ & = c \Big(1 + \frac{\lg c - \lg e}{\lg\lg n} - \frac{\lg\lg\lg n}{\lg\lg n}\Big). \end{aligned} 3<lglgnc(lgc+lglgn−lglglgn−lge)=c(1+lglgnlgc−lge−lglgnlglglgn).
Let x x x be the last expression in parentheses:
x = ( 1 + lg c − lg e lg lg n − lg lg lg n lg lg n ) . x = \Big(1 + \frac{\lg c - \lg e}{\lg\lg n} - \frac{\lg\lg\lg n}{\lg\lg n}\Big). x=(1+lglgnlgc−lge−lglgnlglglgn).
We need to show that there exists a constant c > 1 c > 1 c>1 such that 3 < c x 3 < cx 3<cx.
Noting that lim n → ∞ x = 1 \lim_{n \to \infty} x = 1 limn→∞x=1, we see that there exists n 0 n_0 n0 such that x ≥ 1 / 2 x \ge 1 / 2 x≥1/2 for all n ≥ n 0 n \ge n_0 n≥n0. Thus, any constant c > 6 c > 6 c>6 works for n ≥ n 0 n \ge n_0 n≥n0.
We handle smaller values of n n n—in particular, 3 ≤ n < n 0 3 \le n < n_0 3≤n<n0—as follows. Since n n n is constrained to be an integer, there are a finite number of n in the range 3 ≤ n < n 0 3 \le n < n_0 3≤n<n0. We can evaluate the expression x x x for each such value of n n n and determine a value of c c c for which 3 < c x 3 < cx 3<cx for all values of n n n. The final value of c c c that we use is the larger of
6 6 6, which works for all n ≥ n 0 n \ge n_0 n≥n0, and
max 3 ≤ n ≤ n 0 { c : 3 < c x } \max_{3 \le n \le n_0}\{c: 3 < cx\} max3≤n≤n0{c:3<cx}, i.e., the largest value of c c c that we chose for the range 3 ≤ n < n 0 3 \le n < n_0 3≤n<n0.
Thus, we have shown that Q k 0 < 1 / n 3 Q_{k_0} < 1 / n^3 Qk0<1/n3, as desired.
To see that P k < 1 / n 2 P_k < 1 / n^2 Pk<1/n2 for k ≥ k 0 k \ge k_0 k≥k0, we observe that by part (b), P k ≤ n Q k P_k \le nQ_k Pk≤nQk for all k k k. Choosing k = k 0 k = k_0 k=k0 gives P k 0 ≤ n Q k 0 < n ⋅ ( 1 / n 3 ) = 1 / n 2 P_{k_0} \le nQ_{k_0} < n \cdot (1 / n^3) = 1 / n^2 Pk0≤nQk0<n⋅(1/n3)=1/n2. For k > k 0 k > k_0 k>k0, we will show that we can pick the constant c c c such that Q k < 1 / n 3 Q_k < 1 / n^3 Qk<1/n3 for all k ≥ k 0 k \ge k_0 k≥k0, and thus conclude that P k < 1 / n 2 P_k < 1 / n^2 Pk<1/n2 for all k ≥ k 0 k \ge k_0 k≥k0.
To pick c c c as required, we let c c c be large enough that k 0 > 3 > e k_0 > 3 > e k0>3>e. Then e / k < 1 e / k < 1 e/k<1 for all k ≥ k 0 k \ge k_0 k≥k0, and so e k / k k e^k / k^k ek/kk decreases as k k k increases. Thus,
Q k < e k / k k ≤ e k 0 / k k 0 < 1 / n 3 \begin{aligned} Q_k & < e^k / k^k \\ & \le e^{k_0} / k^{k_0} \\ & < 1 / n^3 \end{aligned} Qk<ek/kk≤ek0/kk0<1/n3
for k ≥ k 0 k \ge k_0 k≥k0.
The expectation of M M M is
E [ M ] = ∑ k = 0 n k ⋅ Pr { M = k } = ∑ k = 0 k 0 k ⋅ Pr { M = k } + ∑ k = k 0 + 1 n k ⋅ Pr { M = k } ≤ ∑ k = 0 k 0 k 0 ⋅ Pr { M = k } + ∑ k = k 0 + 1 n n ⋅ Pr { M = k } ≤ k 0 ∑ k = 0 k 0 Pr { M = k } + n ∑ k = k 0 + 1 n Pr { M = k } = k 0 ⋅ Pr { M ≤ k 0 } + n ⋅ Pr { M > k 0 } , \begin{aligned} \text E[M] & = \sum_{k = 0}^n k \cdot \Pr\{M = k\} \\ & = \sum_{k = 0}^{k_0} k \cdot \Pr\{M = k\} + \sum_{k = k_0 + 1}^n k \cdot \Pr\{M = k\} \\ & \le \sum_{k = 0}^{k_0} k_0 \cdot \Pr\{M = k\} + \sum_{k = k_0 + 1}^n n \cdot \Pr\{M = k\} \\ & \le k_0 \sum_{k = 0}^{k_0} \Pr\{M = k\} + n \sum_{k = k_0 + 1}^n \Pr\{M = k\} \\ & = k_0 \cdot \Pr\{M \le k_0\} + n \cdot \Pr\{M > k_0\}, \end{aligned} E[M]=k=0∑nk⋅Pr{M=k}=k=0∑k0k⋅Pr{M=k}+k=k0+1∑nk⋅Pr{M=k}≤k=0∑k0k0⋅Pr{M=k}+k=k0+1∑nn⋅Pr{M=k}≤k0k=0∑k0Pr{M=k}+nk=k0+1∑nPr{M=k}=k0⋅Pr{M≤k0}+n⋅Pr{M>k0},
which is what we needed to show, since k 0 = c lg n / lg lg n k_0 = c \lg n / \lg\lg n k0=clgn/lglgn.
To show that E [ M ] = O ( lg n / lg lg n ) \text E[M] = O(\lg n / \lg\lg n) E[M]=O(lgn/lglgn), note that Pr { M ≤ k 0 } ≤ 1 \Pr\{M \le k_0\} \le 1 Pr{M≤k0}≤1 and
Pr { M > k 0 } = ∑ k = k 0 + 1 n Pr { M = k } = ∑ k = k 0 + 1 n P k < ∑ k = k 0 + 1 n 1 / n 2 (by part (d)) < n ⋅ ( 1 / n 2 ) = 1 / n . \begin{aligned} \Pr\{M > k_0\} & = \sum_{k = k_0 + 1}^n \Pr\{M = k\} \\ & = \sum_{k = k_0 + 1}^n P_k \\ & < \sum_{k = k_0 + 1}^n 1 / n^2 & \text{(by part (d))} \\ & < n \cdot (1 / n^2) \\ & = 1 / n. \end{aligned} Pr{M>k0}=k=k0+1∑nPr{M=k}=k=k0+1∑nPk<k=k0+1∑n1/n2<n⋅(1/n2)=1/n.(by part (d))
We conclude that
E [ M ] ≤ k 0 ⋅ 1 + n ⋅ ( 1 / n ) = k 0 + 1 = O ( lg n / lg lg n ) . \begin{aligned} \text E[M] & \le k_0 \cdot 1 + n \cdot (1 / n) \\ & = k_0 + 1 \\ & = O(\lg n / \lg\lg n). \end{aligned} E[M]≤k0⋅1+n⋅(1/n)=k0+1=O(lgn/lglgn).
From how the probe-sequence computation is specified, it is easy to see that the probe sequence is
⟨ h ( k ) , h ( k ) + 1 , h ( k ) + 1 + 2 , h ( k ) + 1 + 2 + 3 , … , h ( k ) + 1 + 2 + 3 + ⋯ + i , … ⟩ , \langle h(k), h(k) + 1, h(k) + 1 + 2, h(k) + 1 + 2 + 3, \ldots, h(k) + 1 + 2 + 3 + \cdots + i, \ldots \rangle, ⟨h(k),h(k)+1,h(k)+1+2,h(k)+1+2+3,…,h(k)+1+2+3+⋯+i,…⟩,
where all arithmetic is modulo m m m. Starting the probe numbers from 0 0 0, the i i ith probe is offset (modulo m m m) from h ( k ) h(k) h(k) by
∑ j = 0 i j = i ( i + 1 ) 2 = 1 2 i 2 + 1 2 i . \sum_{j = 0}^i j = \frac{i(i + 1)}{2} = \frac{1}{2}i^2 + \frac{1}{2}i. j=0∑ij=2i(i+1)=21i2+21i.
Thus, we can write the probe sequence as
h ′ ( k , i ) = ( h ( k ) + 1 2 i + 1 2 i 2 ) m o d    m , h'(k, i) = \Big(h(k) + \frac{1}{2} i + \frac{1}{2} i^2 \Big) \mod m, h′(k,i)=(h(k)+21i+21i2)modm,
which demonstrates that this scheme is a special case of quadratic probing.
Let h ′ ( k , i ) h'(k, i) h′(k,i) denote the ith probe of our scheme. We saw in part (a) that h ′ ( k , i ) = ( h ( k ) + i ( i + 1 ) / 2 ) m o d    m h'(k, i) = (h(k) + i(i + 1) / 2) \mod m h′(k,i)=(h(k)+i(i+1)/2)modm. To show that our algorithm examines every table position in the worst case, we show that for a given key, each of the first m m m probes hashes to a distinct value. That is, for any key k k k and for any probe numbers i i i and j j j such that 0 ≤ i < j < m 0 \le i < j < m 0≤i<j<m, we have h ′ ( k , i ) ≠ h ′ ( k , j ) h'(k, i) \ne h'(k, j) h′(k,i)̸=h′(k,j). We do so by showing that h ′ ( k , i ) = h ′ ( k , j ) h'(k, i) = h'(k, j) h′(k,i)=h′(k,j) yields a contradiction.
Let us assume that there exists a key k k k and probe numbers i i i and j j j satsifying 0 ≤ i < j < m 0 \le i < j < m 0≤i<j<m for which h ′ ( k , i ) = h ′ ( k , j ) h'(k, i) = h'(k, j) h′(k,i)=h′(k,j). Then
h ( k ) + i ( i + 1 ) / 2 = h ( k ) + j ( j + 1 ) / 2 m o d    m , h(k) + i(i + 1) / 2 = h(k) + j(j + 1) / 2 \mod m, h(k)+i(i+1)/2=h(k)+j(j+1)/2modm,
which in turn implies that
i ( i + 1 ) / 2 = j ( j + 1 ) / 2 m o d    m , i(i + 1) / 2 = j(j + 1) / 2 \mod m, i(i+1)/2=j(j+1)/2modm,
or
j ( j + 1 ) / 2 − i ( i + 1 ) / 2 = 0 m o d    m . j(j + 1) / 2 - i(i + 1) / 2 = 0 \mod m. j(j+1)/2−i(i+1)/2=0modm.
Since j ( j + 1 ) / 2 − i ( i + 1 ) / 2 = ( j − i ) ( j + i + 1 ) / 2 j(j + 1) / 2 - i(i + 1) / 2 = (j - i)(j + i + 1) / 2 j(j+1)/2−i(i+1)/2=(j−i)(j+i+1)/2, we have
( j − i ) ( j + i + 1 ) / 2 = 0 m o d    m . (j - i)(j + i + 1) / 2 = 0 \mod m. (j−i)(j+i+1)/2=0modm.
The factors j − i j - i j−i and j + i + 1 j + i + 1 j+i+1 must have different parities, i.e., j − i j - i j−i is even if and only if j + i + 1 j + i + 1 j+i+1 is odd. (Work out the various cases in which i i i and j j j are even and odd.) Since ( j − i ) ( j + i + 1 ) / 2 = 0 m o d    m (j - i)(j + i + 1) / 2 = 0 \mod m (j−i)(j+i+1)/2=0modm, we have ( j − i ) ( j + i + 1 ) / 2 = r m (j - i)(j + i + 1) / 2 = rm (j−i)(j+i+1)/2=rm for some integer r r r or, equivalently, ( j − i ) ( j + i + 1 ) = r ⋅ 2 m (j - i)(j + i + 1) = r \cdot 2m (j−i)(j+i+1)=r⋅2m. Using the assumption that m m m is a power of 2 2 2, let m = 2 p m = 2^p m=2p for some nonnegative integer p p p, so that now we have ( j − i ) ( j + i + 1 ) = r ⋅ 2 p + 1 (j - i)(j + i + 1) = r \cdot 2^{p + 1} (j−i)(j+i+1)=r⋅2p+1. Because exactly one of the factors j − i j - i j−i and j + i + 1 j + i + 1 j+i+1 is even, 2 p + 1 2^{p + 1} 2p+1 must divide one of the factors. It cannot be j − i j - i j−i, since j − i < m < 2 p + 1 j - i < m < 2^{p + 1} j−i<m<2p+1. But it also cannot be j + i + 1 j + i + 1 j+i+1, since j + i + 1 ≤ ( m − 1 ) + ( m − 2 ) + 1 = 2 m − 2 < 2 p + 1 j + i + 1 \le (m - 1) + (m - 2) + 1 = 2m - 2 < 2^{p + 1} j+i+1≤(m−1)+(m−2)+1=2m−2<2p+1. Thus we have derived the contradiction that 2 p + 1 2^{p + 1} 2p+1 divides neither of the factors j − i j - i j−i and j + i + 1 j + i + 1 j+i+1. We conclude that h ′ ( k , i ) ≠ h ′ ( k , j ) h'(k, i) \ne h'(k, j) h′(k,i)̸=h′(k,j).
The number of hash functions for which h ( k ) = h ( l ) h(k) = h(l) h(k)=h(l) is m m 2 ∣ H ∣ = 1 m ∣ H ∣ \frac{m}{m^2}|\mathcal H| = \frac{1}{m}|\mathcal H| m2m∣H∣=m1∣H∣, therefore the family is universal.
For x = ⟨ 0 , 0 , … , 0 ⟩ x = \langle 0, 0, \ldots, 0 \rangle x=⟨0,0,…,0⟩, H \mathcal H H could not be 2 2 2-universal.
Let x , y ∈ U x, y \in U x,y∈U be fixed, distinct n n n-tuples. As a i a_i ai and b b b range over Z p , h a b ′ ( x ) \mathbb Z_p, h'_{ab}(x) Zp,hab′(x) is equally likely to achieve every value from 1 1 1 to p p p since for any sequence a a a, we can let b b b vary from 1 1 1 to p − 1 p - 1 p−1.
Thus, ⟨ h a b ′ ( x ) , h a b ′ ( y ) ⟩ \langle h'_{ab}(x), h'_{ab}(y) \rangle ⟨hab′(x),hab′(y)⟩ is equally likely to be any of the p 2 p^2 p2 sequences, so H \mathcal H H is 2 2 2-universal.
Since H \mathcal H H is 2 2 2-universal, every pair of ⟨ t , t ′ ⟩ \langle t, t' \rangle ⟨t,t′⟩ is equally likely to appear, thus t ′ t' t′ could be any value from Z p \mathbb Z_p Zp. Even the adversary knows H \mathcal H H, since H \mathcal H H is 2 2 2-universal, then H \mathcal H H is universal, the probability of choosing a hash function that h ( k ) = h ( l ) h(k) = h(l) h(k)=h(l) is at most 1 / p 1 / p 1/p, therefore the probability is at most 1 / p 1 / p 1/p.