Heap in python

Heap，即堆，也就是优先队列。我们可以在这里找到[]维基百科](https://en.wikipedia.org/wiki/Heap_(data_structure))

堆（英语：Heap）是计算机科学中一类特殊的数据结构的统称。堆通常是一个可以被看做一棵树的数组对象。在队列中，调度程序反复提取队列中第一个作业并运行，因为实际情况中某些时间较短的任务将等待很长时间才能结束，或者某些不短小，但具有重要性的作业，同样应当具有优先权。堆即为解决此类问题设计的一种数据结构。

逻辑定义：

n个元素序列{k1,k2...ki...kn},当且仅当满足下列关系时称之为堆：
(ki <= k2i,ki <= k2i+1)或者(ki >= k2i,ki >= k2i+1), (i = 1,2,3,4...n/2)

性质：

堆的实现通过构造二叉堆（binary heap），实为二叉树的一种；由于其应用的普遍性，当不加限定时，均指该数据结构的这种实现。这种数据结构具有以下性质。

任意节点小于（或大于）它的所有后裔，最小元（或最大元）在堆的根上（堆序性）。
堆总是一棵完全树。即除了最底层，其他层的节点都被元素填满，且最底层尽可能地从左到右填入。

将根节点最大的堆叫做最大堆或大根堆，根节点最小的堆叫做最小堆或小根堆。常见的堆有二叉堆、斐波那契堆等。

支持的基本操作[编辑]

操作描述时间复杂度
build 创建一个空堆 O(n)
insert 向堆中插入一个新元素 O(log n)
update 将新元素提升使其匹配堆的性质
get 获取当前堆顶元素的值 O(1)
delete 删除堆顶元素 O(log n)
heapify 使删除堆顶元素的堆再次成为堆

某些堆实现还支持其他的一些操作，如斐波那契堆支持检查一个堆中是否存在某个元素。

示例程序：

为将元素X插入堆中，找到空闲位置，创建一个空穴，若满足堆序性（英文：heap order），则插入完成；否则将父节点元素装入空穴，删除该父节点元素，完成空穴上移。直至满足堆序性。这种策略叫做上滤（percolate up）。[1]

void Insert( ElementType X, PriorityQueue H )
{
    int i;

    if( IsFull(H) )
    {
        printf( "Queue is full.\n" );
        return;
    }

    for( i = ++H->Size; H->Element[i/2] > X; i /= 2 )
        H->Elements[i] = H->Elements[i/2];
    H->Elements[i] = X;
}

以上是插入到一个二叉堆的过程。
DeleteMin，删除最小元，即二叉树的根或父节点。删除该节点元素后，队列最后一个元素必须移动到堆得某个位置，使得堆仍然满足堆序性质。这种向下替换元素的过程叫作下滤。

ElementType
DeleteMin( PriorityQueue H )
{
    int i, Child;
    ElementType MinElement, LastElement;

    if( IsEmpty( H ) )
    {
        printf( "Queue is empty.\n" );
        return H->Elements[0];
    }
    MinElement = H->Elements[1];
    LastElement = H->Elements[H->Size--];

    for( i = 1; i*2 <= H->Size; i = Child )
    {
        // Find smaller child.
        Child = i*2;
        if( Child != H->Size && H->Elements[Child+1]
                             <  H->Elements[Child] )
            Child++;

        // Percolate one level.
        if( LastElement > H->Elements[Child] )
            H->Elements[i] = H->Elements[Child];
        else
            break;
    }
    H->Elements[i] = LastElement;
    return MinElement;
}

应用

堆排序，或者运用堆的排序以选择优先

Python中的heapq模块 -- 堆排序算法

Purpose:

目的：

The heapq implements a min-heap sort algorithm suitable for use with Python’s lists.
heapq模块执行了一个适用于Python列表的最小堆排序算法。

A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2N+1 and 2N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.
堆，是一个类似于树的数据结构，其中子节点们与其父节点有一个排序的关系。二叉堆能够表示为使用一个列表或者数组来组织，因此元素N的子节点位于2N+1和2N+2（对于基于0的索引）。这样的布局允许在原来位置重置堆，因此没有必要在添加和删除元素的时候重置过多的内存空间。

A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.
一个最大堆可以确保父节点大于或者等于其两个子节点。最小堆需要父节点小于或者等于其子节点。Python的heapq模块使用最小堆。

Example Data

示例数据

The examples in this section use the data in heapq_heapdata.py.
本节的这个示例使用heapq_heapdata.py中的data。

heapq_heapdata.py

# This data was generated with the random module.

data = [19, 9, 4, 10, 11]

The heap output is printed using heapq_showtree.py:
heapq_showtree.py使用heap的输出：

heapq_showtree.py

import math
from io import StringIO


def show_tree(tree, total_width=36, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = -1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i + 1, 2)))
        else:
            row = 0
        if row != last_row:
            output.write('\n')
        columns = 2 ** row
        col_width = int(math.floor(total_width / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print(output.getvalue())
    print('-' * total_width)
    print()

Creating a Heap

创建一个堆

There are two basic ways to create a heap, heappush() and heapify().
想要创建一个堆，有两种基本方法：heappush() 和 heapify()。

heapq_heappush.py

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heap = []
print('random :', data)
print()

for n in data:
    print('add {:>3}:'.format(n))
    heapq.heappush(heap, n)
    show_tree(heap)

Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.
使用heappush()，当来自于数据源的新元素添加到堆中时，堆排序算法将维护元素的顺序。
$ python3 heapq_heappush.py

random : [19, 9, 4, 10, 11]

add 19:

add 9:

             9
    19

add 4:

             4
    19                9

add 10:

             4
    10                9
19

add 11:

             4
    10                9
19       11

If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.
如果data已经在内存之中，使用heapify()来在列表内部重置元素将会更高效。

heapq_heapify.py

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print('random    :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)

The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify().
按照堆的顺序，一次一个元素构建一个列表的结果，与构建一个构建一个未排序的列表一致，直接使用heapify():

$ python3 heapq_heapify.py

random : [19, 9, 4, 10, 11]
heapified :

             4
    9                 19
10       11

Accessing Contents of a Heap

访问堆中的内容

Once the heap is organized correctly, use heappop() to remove the element with the lowest value.
一旦堆正确的组织，就可以使用heappop()来移除最小的元素值。

heapq_heappop.py

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print('random    :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
print

for i in range(2):
    smallest = heapq.heappop(data)
    print('pop    {:>3}:'.format(smallest))
    show_tree(data)

In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.
在本示例中，参考标准库的文档，heapify() 和 heappop()用于对于一个数字列表进行排序。
$ python3 heapq_heappop.py

random : [19, 9, 4, 10, 11]
heapified :

             4
    9                 19
10       11

pop 4:

             9
    10                19
11

pop 9:

             10
    11                19

To remove existing elements and replace them with new values in a single operation, use heapreplace().
想要用一次操作移除一个已经存在的元素，然后使用一个新的元素替代它，可以使用 heapreplace():

heapq_heapreplace.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heapq.heapify(data)
print('start:')
show_tree(data)

for n in [0, 13]:
    smallest = heapq.heapreplace(data, n)
    print('replace {:>2} with {:>2}:'.format(smallest, n))
    show_tree(data)

Replacing elements in place makes it possible to maintain a fixed size heap, such as a queue of jobs ordered by priority.
在原来的位置替换元素，这样就可以维护一个固定大小的堆，例如按照优先级排列的任务队列。

$ python3 heapq_heapreplace.py

start:

             4
    9                 19
10       11

replace 4 with 0:

             0
    9                 19
10       11

replace 0 with 13:

             9
    10                19
13       11

Data Extremes From a Heap

堆中两端的数据

heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains.
heapq包含两个函数用与迭代查找堆中一定范围内最大或者最小的值。

heapq_extremes.py

import heapq
from heapq_heapdata import data

print('all       :', data)
print('3 largest :', heapq.nlargest(3, data))
print('from sort :', list(reversed(sorted(data)[-3:])))
print('3 smallest:', heapq.nsmallest(3, data))
print('from sort :', sorted(data)[:3])

Using nlargest() and nsmallest() are only efficient for relatively small values of n > 1, but can still come in handy in a few cases.
使用 nlargest() 和 nsmallest() 这两个函数对于查找 n > 1的较小数值会显得更高效，但是在一些场景下更灵活。

$ python3 heapq_extremes.py

all : [19, 9, 4, 10, 11]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [4, 9, 10]
from sort : [4, 9, 10]

Efficiently Merging Sorted Sequences

高效的合并已排序的序列

Combining several sorted sequences into one new sequence is easy for small data sets.
使用如下所示的方法，对于较小的数据集，合并一些已经排序的序列到一个新的序列会变的很容易。

list(sorted(itertools.chain(*data)))

For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, merge() uses a heap to generate a new sequence one item at a time, and determine the next item using a fixed amount of memory.
对于较大的数据集，使用如上所示的代码将会占用大量的内存。与对整个已经组合的序列进行排序相比，merge()使用了堆来一次性生成新的序列，并且判断新生成的序列使用固定数量的内存。

heapq_merge.py

import heapq
import random

random.seed(2016)

data = []
for i in range(4):
    new_data = list(random.sample(range(1, 101), 5))
    new_data.sort()
    data.append(new_data)

for i, d in enumerate(data):
    print('{}: {}'.format(i, d))

print('\nMerged:')
for i in heapq.merge(*data):
    print(i, end=' ')
print()

Because the implementation of merge() uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.
由于在堆中执行merge()函数，所消耗的内存取决于所合并的序列数量，而不是这些序列中的元素数量。

$ python3 heapq_merge.py

0: [33, 58, 71, 88, 95]
1: [10, 11, 17, 38, 91]
2: [13, 18, 39, 61, 63]
3: [20, 27, 31, 42, 45]

Merged:
10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95

最让我眼前一亮的应用，就是leetcode中的第二十三题目：merge k sorted list，我们可以将问题转换为合并queue：

class ListNode(object):
    def __init__(self, x):
        self.val = x
        self.next = None

def mergeKLists(lists):
    current = dummy = ListNode(0)

    heap = []
    for sorted_list in lists:
        if sorted_list:
            heapq.heappush(heap, (sorted_list.val, sorted_list))

    while heap:
        smallest = heapq.heappop(heap)[1]
        current.next = smallest
        current = current.next
        if smallest.next:
            heapq.heappush(heap, (smallest.next.val, smallest.next))

    return dummy.next

将链表的问题转换成堆排序的问题。

Python中的堆问题