Problem: Given n elements in an array, find the kth smallest element (rank k).
The naive algorithm to solve this problem: sort the array A and return A[k]. If use heap sort or merge sort, requires Theta(nlgn) time.
We can do better than this: in linear time.
The trivial case is:
1) k = 1, it's the minimun,
2) k = n, the maximum, both these two cases requires Theta(n).
3) k = floor((n+1)/2) or ceiling((n+1)/2), it's the median.
Idea: radomized divide and conquer. Use the randomized partition in Quicksort and find in the two splits recursively.
RANDOM-SELECT(A, p, q, i) if p = q then return A[p] r := RAND-PARTITION(A, p, q) //randomly select pivot, partition around it, return its rank. k := r - p + 1 if i = k then return A[r] if i < k then return RANDOM-SELECT(A, p, r-1, i) else return RANDOM-SELECT(A, r+1, q, i-k)
Intuition for analysis:
Lucky case - 1/10 : 9/10 splits:
T(n) <= T(n/10) + T(9n/10) + Theta(n)
T(n) = Theta(n);
Unlucky case - 0 : n-1 splits:
T(n) = T(n-1) + T(0) + Theta(n)
= Theta(n^2)
So RANDOM-SELECT needs Theta(n^2) in worst case.
Expected running time of RANDOM-SELECT: Define T(n) be the random variable for running time of RANDOM-SELECT on input of size n, assuming random numbers are independent. Define indicator random variables X_k for k=0,1,2,...,n-1:
X_k = 1 if RAND-PARTITION generates k:n-k-1 split,
= 0 otherwise.
then
T(n) = sum_{k=0}^{n-1} X_k T(max{k, n-k-1}) + Theta(n).
E[T(n)] = 1/n sum_{k=0}^{n-1} E[T(max{k,n-k-1})]+Theta(n)
= 2/n sum_{k=floor(n/2)}^{n-1} E[T(k)] + Theta(n).
Claim E[T(n)] <= cn for some c>0
T(n) <= 2/n sum_{k=floor(n/2)}^{n-1} ck + Theta(n)
<= cn - (cn/4 - Theta(n))
<= cn if cn/4 dominates Theta(n).
So E[T(n)] = Theta(n), e.g the expected running time is linear.
Here's a worst case linear time order statistics [Blum, Floyd, Pratt, Rivest, Tarjan 1973]: Idea is to generate good pivot recursively (it's garanteed to be good).
SELECT(i, n)
1. Divide the n elements into floor(n/5) groups of 5 elements each. Find the median of each group.
2. Recursively select the median x of the floor(n/5) group medians.
3. Partition with x as pivot, let k = rank(x).
4. if i = k then return x
5. if i < k then recursively select ith smallest elements in the lower part
otherwise select the (i-k)th smallest elements in the upper part.
Analysis:
After the partition, there're 3*floor(floor(n/5)/2)=3*floor(n/10) elements greater than or equal to x, and the same for less than or equal to x.
A simplication for analysis, assume for n>=50, 3*floor(n/10) >= n/4.
So T(n) <= T(n/5) + T(3n/4) + Theta(n). Claim T(n) <= cn
T(n) <= cn/5 + 3cn/4 + Theta(n)
= cn - (cn/20 - Theta(n))
<= cn if cn/20 dominates Theta(n)
So T(n) is linear time.