1. Merge Sort and Quick Sort are two classic sorting algorithms and Critical components in the world’s computational infrastructure.
2. Arrays.sort() use Quick Sort for primitive types and Merge Sort ( or Tim Sort) for Objects.
3. Basic Plan of Merge Sort:
a) Divide array into two halves.
b) Recursively sort each half.
c) Merge two halves.
4. Java assert statement throws an exception unless boolean condition is true. It can be enalbed/disabled at runtime by parameter -ea/-da.
5. Java Implementation :
public class Merge { private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi) { assert isSorted(a, lo, mid); // precondition: a[lo..mid] sorted assert isSorted(a, mid+1, hi); // precondition: a[mid+1..hi] sorted for (int k = lo; k <= hi; k++) aux[k] = a[k]; int i = lo, j = mid+1; for (int k = lo; k <= hi; k++) { if (i > mid) a[k] = aux[j++]; else if (j > hi) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; } assert isSorted(a, lo, hi); // postcondition: a[lo..hi] sorted } private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi) { if (hi <= lo) return; int mid = lo + (hi - lo) / 2; sort (a, aux, lo, mid); sort (a, aux, mid+1, hi); merge(a, aux, lo, mid, hi); } public static void sort(Comparable[] a) { aux = new Comparable[a.length]; sort(a, aux, 0, a.length - 1); } }
6. Proposition: Mergesort uses at most N lg N compares and 6 N lg N array accesses to sort any array of size N.
Pf : C (N) ≤ C (ceil(N / 2)) + C (floor(N / 2)) + N for N > 1, with C (1) = 0.
A (N) ≤ A (ceil(N / 2)) + A (floor(N / 2)) + 6 N for N > 1, with A (1) = 0.
D (N) = 2 D (N / 2) + N, for N > 1, with D (1) = 0.
7. Mergesort uses extra space proportional to N.
8. A sorting algorithm is in-place if it uses ≤ c log N extra memory. Ex. Insertion sort, selection sort, shellsort.
9. Improvement 1: Use insertion sort for small subarrays:
a) Mergesort has too much overhead for tiny subarrays.
b) Cutoff to insertion sort for ≈ 7 items.
private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi) { if (hi <= lo + CUTOFF - 1) Insertion.sort(a, lo, hi); int mid = lo + (hi - lo) / 2; sort (a, aux, lo, mid); sort (a, aux, mid+1, hi); merge(a, aux, lo, mid, hi); }
10. Improvement 2: Stop if already sorted.
1) Is biggest item in first half ≤ smallest item in second half?
2) Helps for partially-ordered arrays.
11. Improvement 3: Eliminate the copy to the auxiliary array. Save time (but not space) by switching the role of the input and auxiliary array in each recursive call.
private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi) { int i = lo, j = mid+1; for (int k = lo; k <= hi; k++) { if (i > mid) aux[k] = a[j++]; else if (j > hi) aux[k] = a[i++]; else if (less(a[j], a[i])) aux[k] = a[j++]; else aux[k] = a[i++]; } } private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi) { if (hi <= lo) return; int mid = lo + (hi - lo) / 2; sort (aux, a, lo, mid); sort (aux, a, mid+1, hi); merge(aux, a, lo, mid, hi); }
12. Bottom-up mergesort:
a) Pass through array, merging subarrays of size 1.
b) Repeat for subarrays of size 2, 4, 8, 16, ....
public static void sort(Comparable[] a) { int N = a.length; aux = new Comparable[N]; for (int sz = 1; sz < N; sz = sz+sz) for (int lo = 0; lo < N-sz; lo += sz+sz) merge(a, lo, lo+sz-1, Math.min(lo+sz+sz-1, N-1)); }
13. Computational complexity: Framework to study efficiency of algorithms for solving a particular problem X.
Model of computation : Allowable operations.
Cost model : Operation count(s).
Upper bound : Cost guarantee provided by some algorithm for X.
Lower bound : Proven limit on cost guarantee of all algorithms for X.
Optimal algorithm : Algorithm with best possible cost guarantee for X.
14. Proposition: Any compare-based sorting algorithm must use at least lg ( N ! ) ~ N lg N compares in the worst-case.
Pf.
a) Assume array consists of N distinct values a1 through aN.
b) Worst case dictated by height h of decision tree.
c) Binary tree of height h has at most 2 h leaves.
d) N ! different orderings ⇒ at least N ! leaves.
2 h ≥ # leaves ≥ N ! ⇒ h ≥ lg ( N ! ) ~ N lg N (Stirling's formula)
15. Compare-based sort :
1) Model of computation: decision tree.
2) Cost model: # compares.
3) Upper bound: ~ N lg N from mergesort.
4) Lower bound: ~ N lg N.
5) Optimal algorithm = mergesort.
16. Lower bound may not hold if the algorithm has information about:
1) The initial order of the input. (Insertion sort take ~N for partially sorted array)
2) The distribution of key values. (Duplicated keys)
3) The representation of the keys. (key is inform of digit or character)
17. A stable sort preserves the relative order of items with equal keys. Insertion sort and mergesort are stable but selection sort and shellsort are not. (Long-distance exchange might move an item past some equal item.)