算法导论Lecture 6:中值与顺序统计

Order statistics

Problem: Given n elements in an array, find the kth smallest element (rank k).

 

The naive algorithm to solve this problem: sort the array A and return A[k]. If use heap sort or merge sort, requires Theta(nlgn) time.

 

We can do better than this: in linear time.

 

The trivial case is:

 

1) k = 1, it's the minimun,

2) k = n, the maximum, both these two cases requires Theta(n).

3) k = floor((n+1)/2) or ceiling((n+1)/2), it's the median.

 

Idea: radomized divide and conquer. Use the randomized partition in Quicksort and find in the two splits recursively.

 

RANDOM-SELECT(A, p, q, i)
if p = q then return A[p]
r := RAND-PARTITION(A, p, q) //randomly select pivot, partition around it, return its rank.
k := r - p + 1
if i = k then return A[r]
if i < k then return RANDOM-SELECT(A, p, r-1, i)
         else return RANDOM-SELECT(A, r+1, q, i-k)

 

Intuition for analysis:

 

Lucky case - 1/10 : 9/10 splits:

T(n) <= T(n/10) + T(9n/10) + Theta(n)

T(n) = Theta(n);

 

Unlucky case - 0 : n-1 splits:

T(n) = T(n-1) + T(0) + Theta(n)

      = Theta(n^2)

 

So RANDOM-SELECT needs Theta(n^2) in worst case.

 

Expected running time of RANDOM-SELECT: Define T(n) be the random variable for running time of RANDOM-SELECT on input of size n, assuming random numbers are independent. Define indicator random variables X_k for k=0,1,2,...,n-1:

X_k = 1 if RAND-PARTITION generates k:n-k-1 split,

       = 0 otherwise.

then

T(n) = sum_{k=0}^{n-1} X_k T(max{k, n-k-1}) + Theta(n).

E[T(n)] = 1/n sum_{k=0}^{n-1} E[T(max{k,n-k-1})]+Theta(n)

            = 2/n sum_{k=floor(n/2)}^{n-1} E[T(k)] + Theta(n).

Claim E[T(n)] <= cn for some c>0

T(n) <= 2/n sum_{k=floor(n/2)}^{n-1} ck + Theta(n)

       <= cn - (cn/4 - Theta(n))

       <= cn if cn/4 dominates Theta(n).

 

So E[T(n)] = Theta(n), e.g the expected running time is linear.

 

Here's a worst case linear time order statistics [Blum, Floyd, Pratt, Rivest, Tarjan 1973]: Idea is to generate good pivot recursively (it's garanteed to be good).

 

SELECT(i, n)

1. Divide the n elements into floor(n/5) groups of 5 elements each. Find the median of each group.

2. Recursively select the median x of the floor(n/5) group medians.

3. Partition with x as pivot, let k = rank(x).

4. if i = k then return x

5. if i < k then recursively select ith smallest elements in the lower part

              otherwise select the (i-k)th smallest elements in the upper part.

 

Analysis:

After the partition, there're 3*floor(floor(n/5)/2)=3*floor(n/10) elements greater than or equal to x, and the same for less than or equal to x.

A simplication for analysis, assume for n>=50, 3*floor(n/10) >= n/4.

 

So T(n) <= T(n/5) + T(3n/4) + Theta(n). Claim T(n) <= cn

T(n) <= cn/5 + 3cn/4 + Theta(n)

       = cn - (cn/20 - Theta(n))

      <= cn if cn/20 dominates Theta(n)

 

So T(n) is linear time.

你可能感兴趣的:(算法,idea)