上篇堆排序介绍堆排序的基本过程,本文说下堆排序的应用实例。
大纲
- 创建Huffman树
- 合并K个有序数组
1. 创建Huffman树
这里默认我们清楚构建哈夫曼树的过程。那为什么构建哈夫曼树会用到堆排序呢?我们知道在构建哈夫曼树的过程中,每次都要选两个最小的元素,并且添加一个新元素,如果是普通排序的话,最快也需要O(nlogn)。但是如果是堆排序的话,每次调整只是O(logn),这样时间复杂度就少了很多。
但是与堆排序的过程有点区别的是,我们除了实现普通堆排序的过程,还需要添加一个push函数(即添加元素)。而且,细心考虑后会发现,在添加前,root节点比所有的节点值都小,添加元素后只需要调整左子树或右子树即可,而不需要把整棵树都调整一遍。
class Node:
def __init__(self, freq, left=None, right=None):
self.left = left
self.right = right
self.freq = freq
class MinNodeHeap(object):
def __init__(self):
self.heap = []
self.size = 0
def push(self, node):
self.heap.append(node)
cur = self.size
print "before push cur_size", self.size
father = (cur - 1) // 2
while cur > 0:
if self.heap[father].freq > self.heap[cur].freq:
self.heap[father], self.heap[cur] = self.heap[cur], self.heap[father]
cur = father
father = (cur - 1) // 2
else:
break
self.size += 1
def _adjust(self, begin, end):
root = begin
while True:
child = root * 2 + 1
if child > end:
break
if child + 1 <= end and self.heap[child + 1].freq < self.heap[child].freq:
child = child + 1
if child <= end and self.heap[root].freq > self.heap[child].freq:
self.heap[root], self.heap[child] = self.heap[child], self.heap[root]
root = child
else:
break
def pop(self):
top = self.heap[0]
self.heap[0] = self.heap[self.size - 1]
self.size -= 1
self.heap = self.heap[:self.size]
self._adjust(0, self.size - 1)
return top
def create_huffman_tree(array):
# init
heap = MinNodeHeap()
for i in array:
node = Node(i)
heap.push(node)
# create
while heap.size > 1:
left = heap.pop()
right = heap.pop()
new_node = Node(left.freq + right.freq, left, right)
heap.push(new_node)
return heap.heap[0]
def print_huffman_tree(root):
queue = []
queue.append(root)
while queue:
root = queue.pop(0)
print root.freq,
if root.left:
queue.append(root.left)
if root.right:
queue.append(root.right)
def print_heap(heap):
for n in heap:
print n.freq,
if __name__ == "__main__":
array = [19,21,2,3,6,7,10,32]
root = create_huffman_tree(array)
print root.freq
print_huffman_tree(root)
需要注意的坑:
- 因为heap里是node,所以在比较的时候一定要带上freq,不然会出错。
- 在遍历Huffman树时,一定要注意更新root节点,不然陷入死循环了。
- 一定要弄清楚下标的计算!!!
2. 合并K个有序数组
这个是对合并两个有序数组的扩展,可能会有点好奇,为什么多个有序数组的合并会用到堆排序?因此,先回顾下两个有序数组合并的步骤。在合并两个有序数组的时候,我们需要比较两个数组各自最小的值,然后将两者较小的那个添加到结果列表中。当数组扩大到K个的时候,每次我们则需要比较K个数组最小的值,但是,我们每次只会取出K个数中最小的那个,其余的不会变。这种情况和最小堆相似,因此这个操作可以借助最小堆完成,并且每次比较的时间复杂度$log(K)$
,若共有N个数,则一共需要比较N次,因此最终复杂度为$Nlog(K)$
.
# -*- coding:utf-8 -*-
from collections import namedtuple
Item = namedtuple("Item", ["array_index", "value"])
class MinHeap(object):
def __init__(self):
self.heap = []
self.size = 0
def top(self):
return self.heap[0]
def _adjust_heap(self, begin, end):
root = begin
while True:
child = root * 2 + 1
if child > end:
break
if child + 1 <= end and self.heap[child+1].value < self.heap[child].value:
child = child + 1
if self.heap[root].value > self.heap[child].value:
self.heap[root], self.heap[child] = self.heap[child], self.heap[root]
root = child
else:
break
def add(self, item):
self.heap.append(item)
self.size += 1
self._adjust_heap(0, self.size-1)
def pop_add(self, item):
self.heap[0] = item
self._adjust_heap(0, self.size-1)
def pop(self):
self.heap[0], self.heap[self.size-1] = self.heap[self.size-1], self.heap[0]
self.size -= 1
#self.heap = self.heap[:self.size]
self._adjust_heap(0, self.size-1)
def merge_k_sorted_list(arrays, k):
indices = [0] * k
sizes = [len(array) for array in arrays]
total_size = sum(sizes)
mins = MinHeap()
results = []
for ind, array in enumerate(arrays):
mins.add(Item(ind, array[0]))
while len(results) < total_size:
_array_ind, _min_value = mins.top()
results.append(_min_value)
indices[_array_ind] += 1
if indices[_array_ind] < sizes[_array_ind]:
_candidate_value = arrays[_array_ind][indices[_array_ind]]
mins.pop_add(Item(_array_ind, _candidate_value))
else:
mins.pop()
return results
if __name__ == "__main__":
arrays = [[1, 3, 5],
[2, 4, 6, 9, 10],
[3, 7, 8, 13]]
arrays = [[1, 4, 7],
[2, 5, 8],
[3, 6, 9, 10, 13]]
arrays = [[1, 4, 7],
[2, 5, 8],
[14, 15, 16, 17, 18, 19],
[3, 6, 9, 10, 13]]
k = len(arrays)
print merge_k_sorted_list(arrays, k)
实现中除了堆排序的操作,其次是要记录最小值是来自哪个数组,以及每个数组遍历的位置。