堆排序 ---应用篇

上篇堆排序介绍堆排序的基本过程,本文说下堆排序的应用实例。

大纲

  1. 创建Huffman树
  2. 合并K个有序数组

1. 创建Huffman树

这里默认我们清楚构建哈夫曼树的过程。那为什么构建哈夫曼树会用到堆排序呢?我们知道在构建哈夫曼树的过程中,每次都要选两个最小的元素,并且添加一个新元素,如果是普通排序的话,最快也需要O(nlogn)。但是如果是堆排序的话,每次调整只是O(logn),这样时间复杂度就少了很多。

但是与堆排序的过程有点区别的是,我们除了实现普通堆排序的过程,还需要添加一个push函数(即添加元素)。而且,细心考虑后会发现,在添加前,root节点比所有的节点值都小,添加元素后只需要调整左子树或右子树即可,而不需要把整棵树都调整一遍。

class Node:
    def __init__(self, freq, left=None, right=None):
        self.left = left
        self.right = right
        self.freq = freq
        

class MinNodeHeap(object):
    def __init__(self):
        self.heap = []
        self.size = 0

    def push(self, node):
        self.heap.append(node)
        cur = self.size
        print "before push cur_size", self.size
        father = (cur - 1) // 2
        while cur > 0:
            if self.heap[father].freq > self.heap[cur].freq:
                self.heap[father], self.heap[cur] = self.heap[cur], self.heap[father]
                cur = father
                father = (cur - 1) // 2
            else:
                break
        self.size += 1

    def _adjust(self, begin, end):
        root = begin
        while True:
            child = root * 2 + 1
            if child > end:
                break
            if child + 1 <= end and self.heap[child + 1].freq < self.heap[child].freq:
                child = child + 1
            if child <= end and self.heap[root].freq > self.heap[child].freq:
                self.heap[root], self.heap[child] = self.heap[child], self.heap[root]
                root = child
            else:
                break

    def pop(self):
        top = self.heap[0]
        self.heap[0] = self.heap[self.size - 1]
        self.size -= 1
        self.heap = self.heap[:self.size]
        self._adjust(0, self.size - 1)
        return top


def create_huffman_tree(array):
    # init
    heap = MinNodeHeap()
    for i in array:
        node = Node(i)
        heap.push(node)

    # create
    while heap.size > 1:
        left = heap.pop()
        right = heap.pop()
        new_node = Node(left.freq + right.freq, left, right)
        heap.push(new_node)
    return heap.heap[0]


def print_huffman_tree(root):
    queue = []
    queue.append(root)
    while queue:
        root = queue.pop(0)
        print root.freq,
        if root.left:
            queue.append(root.left)
        if root.right:
            queue.append(root.right)


def print_heap(heap):
    for n in heap:
        print n.freq,


if __name__ == "__main__":
    array = [19,21,2,3,6,7,10,32]
    root = create_huffman_tree(array)
    print root.freq
    print_huffman_tree(root)

需要注意的坑:

  1. 因为heap里是node,所以在比较的时候一定要带上freq,不然会出错。
  2. 在遍历Huffman树时,一定要注意更新root节点,不然陷入死循环了。
  3. 一定要弄清楚下标的计算!!!

2. 合并K个有序数组

这个是对合并两个有序数组的扩展,可能会有点好奇,为什么多个有序数组的合并会用到堆排序?因此,先回顾下两个有序数组合并的步骤。在合并两个有序数组的时候,我们需要比较两个数组各自最小的值,然后将两者较小的那个添加到结果列表中。当数组扩大到K个的时候,每次我们则需要比较K个数组最小的值,但是,我们每次只会取出K个数中最小的那个,其余的不会变。这种情况和最小堆相似,因此这个操作可以借助最小堆完成,并且每次比较的时间复杂度$log(K)$,若共有N个数,则一共需要比较N次,因此最终复杂度为$Nlog(K)$.

# -*- coding:utf-8 -*-
from collections import namedtuple

Item = namedtuple("Item", ["array_index", "value"])

class MinHeap(object):
    def __init__(self):
        self.heap = []
        self.size = 0

    def top(self):
        return self.heap[0]
 
    def _adjust_heap(self, begin, end):
        root = begin
        while True:
            child = root * 2 + 1
            if child > end:
                break
            if child + 1 <= end and self.heap[child+1].value < self.heap[child].value:
                child = child + 1
            if self.heap[root].value > self.heap[child].value:
                self.heap[root], self.heap[child] = self.heap[child], self.heap[root]
                root = child
            else:
                break

    def add(self, item):
        self.heap.append(item)
        self.size += 1
        self._adjust_heap(0, self.size-1)
 
    def pop_add(self, item):
        self.heap[0] = item
        self._adjust_heap(0, self.size-1)

    def pop(self):
        self.heap[0], self.heap[self.size-1] = self.heap[self.size-1], self.heap[0]
        self.size -= 1
        #self.heap = self.heap[:self.size]
        self._adjust_heap(0, self.size-1)


def merge_k_sorted_list(arrays, k):
    indices = [0] * k
    sizes = [len(array) for array in arrays]
    total_size = sum(sizes)
    mins = MinHeap()
    results = []
    for ind, array in enumerate(arrays):
        mins.add(Item(ind, array[0]))
 
    while len(results) < total_size:
        _array_ind, _min_value = mins.top()
        results.append(_min_value)
        indices[_array_ind] += 1
        if indices[_array_ind] < sizes[_array_ind]:
            _candidate_value = arrays[_array_ind][indices[_array_ind]]
            mins.pop_add(Item(_array_ind, _candidate_value))
        else:
            mins.pop()
    return results


if __name__ == "__main__":
    arrays = [[1, 3, 5],
            [2, 4, 6, 9, 10],
            [3, 7, 8, 13]]
    arrays = [[1, 4, 7],
            [2, 5, 8],
            [3, 6, 9, 10, 13]]
    arrays = [[1, 4, 7], 
            [2, 5, 8], 
            [14, 15, 16, 17, 18, 19],
            [3, 6, 9, 10, 13]]
    k = len(arrays) 
    print merge_k_sorted_list(arrays, k)

实现中除了堆排序的操作,其次是要记录最小值是来自哪个数组,以及每个数组遍历的位置。

你可能感兴趣的:(堆排序 ---应用篇)