Top K 问题指从一组数据中选出最大的K个数。常见的例子有:热门搜索前10,最常听的20首歌等。
对于这类问题,可能我们会首先想到先对这组数据进行排序,再选取前K个数。虽然这能解决问题,但效率不高,因为我们只需要部分有序,它却对整体进行了排序。最小堆是解决Top K 问题的一个好的方法(如果我们需要选出K个最小的数,用的是最大堆)。
最小堆也叫小根堆,实际上是一个完全二叉树,它的子结点的值总是大于等于它的父节点。关于最小堆的构造与调整可以参考这篇文章:Java优先级队列。
对于 Top K 问题,我们只需要维持一个大小为K的最小堆。
比如我们现在要选取数组A中最大的10个数,过程如下:
最小堆的构建与调整
public class MinHeap<T> {
private Object[] queue;
private int size;
public MinHeap() {
queue = new Object[11];
}
public MinHeap(int capacity) {
queue = new Object[capacity];
}
public boolean offer(T t) {
int k = size;
if (size == 0)
queue[0] = t;
size++;
moveUp(k, t);
return true;
}
public void moveUp(int k, T t) {
Comparable<? super T> key = (Comparable<? super T>) t;
while (k > 0) {
int parent = (k - 1) >>> 1;
Object e = queue[parent];
if (key.compareTo((T) e) > 0)
break;
queue[k] = e;
k = parent;
}
queue[k] = key;
}
public T poll() {
if (size == 0)
return null;
int s = --size;
T result = (T) queue[0];
T end = (T) queue[s];
queue[s] = null;
if (s != 0)
moveDown(0, end);
return result;
}
public void moveDown(int k, T end) {
Comparable<? super T> key = (Comparable<? super T>) end;
int half = size >>> 1;
while (k < half) {
int left = (k << 1) + 1;
int right = left + 1;
Object c = queue[left];
if (right < size && ((Comparable<? super T>) c).compareTo((T) queue[right]) > 0) {
c = queue[left = right];
}
if (key.compareTo((T) c) <= 0)
break;
queue[k] = c;
k = left;
}
queue[k] = key;
}
boolean setHead(T t) {
queue[0] = t;
return true;
}
public T peek() {
return size == 0 ? null : (T) queue[0];
}
}
从数组中选取前K个最大的数
public class TopK {
private static Random random = new Random();
public static int[] factory(int n) {
int[] data = new int[n];
for (int i = 0; i < n; i++)
data[i] = random.nextInt(100);
return data;
}
public void find(int[] array, int n) {
MinHeap minHeap = new MinHeap(n);
for (int i = 0; i < n; i++) {
minHeap.offer(array[i]);
}
for (int j = n; j < array.length; j++) {
if (array[j] > (int) minHeap.peek()) {
minHeap.setHead(array[j]);
minHeap.moveDown(0, array[j]);
}
}
System.out.print("[");
for (int t = 0; t < n - 1; t++)
System.out.print(minHeap.poll() + ", ");
System.out.println(minHeap.poll() + "]");
}
public static void main(String[] args) {
int[] data = factory(11);
System.out.println(Arrays.toString(data));
TopK topK = new TopK();
topK.find(data,10);
}
}
输出结果:
41, 34, 39, 58, 37, 9, 70, 18, 97, 75, 92]
[18, 34, 37, 39, 41, 58, 70, 75, 92, 97]
堆排序借助了最小堆或最大堆的特性,它的时间复杂度为 O(nlogn),空间复杂度为 O(1)。堆排序是一种原地排序,一般比快速排序慢,但它占用的空间少,因此在对占用空间有要求或求解类似 Top K 问题时,可以考虑采用。
注意,堆排序与快速排序都是不稳定的算法。
Java 堆排序示例:
public class HeapSort {
/**
*从下往上建堆
* @param array
*/
public static void buildHeap(int[] array) {
for (int t = array.length / 2; t >= 0; t--) {
heapify(array, array.length, t);
}
}
/**
* @param array
* @param size 待排序数组长度
* @param t 从t位置向下堆化
*/
public static void heapify(int[] array, int size, int t) {
int half = size >>> 1;
int temp = array[t];
while (t < half) {
int left = (t << 1) + 1;
int right = left + 1;
int min = array[left];
if (right < size && min > array[right])
min = array[left = right];
if (temp < min)
break;
array[t] = min;
t = left;
}
array[t] = temp;
}
public static void sort(int[] array) {
for (int i = array.length - 1; i > 0; i--) {
int temp = array[0];
array[0] = array[i];
array[i] = temp;
heapify(array, i, 0);
}
}
public static Random random = new Random();
public static int[] factory(int i) {
int[] array = new int[i];
for (int t = 0; t < i; t++) {
array[t] = random.nextInt(100);
}
return array;
}
public static void main(String[] args) {
int[] array = factory(20);
System.out.println("初始数组:" + Arrays.toString(array));
buildHeap(array);
System.out.println("堆化数组:" + Arrays.toString(array));
sort(array);
System.out.println("堆排序后数组:" + Arrays.toString(array));
}
}
输出结果:
初始数组:[96, 77, 19, 14, 12, 91, 43, 36, 56, 21, 91, 37, 21, 48, 16, 14, 4, 37, 83, 39]
堆化数组:[4, 12, 16, 14, 19, 37, 21, 14, 37, 21, 91, 91, 43, 77, 48, 96, 36, 56, 83, 39]
堆排序后数组:[96, 91, 91, 83, 77, 56, 48, 43, 39, 37, 37, 36, 21, 21, 19, 16, 14, 14, 12, 4]
参考链接: