Heap in python
Heap,即堆,也就是优先队列。我们可以在这里找到[]维基百科](https://en.wikipedia.org/wiki/Heap_(data_structure))
堆(英语:Heap)是计算机科学中一类特殊的数据结构的统称。堆通常是一个可以被看做一棵树的数组对象。在队列中,调度程序反复提取队列中第一个作业并运行,因为实际情况中某些时间较短的任务将等待很长时间才能结束,或者某些不短小,但具有重要性的作业,同样应当具有优先权。堆即为解决此类问题设计的一种数据结构。
逻辑定义:
n个元素序列{k1,k2...ki...kn},当且仅当满足下列关系时称之为堆:
(ki <= k2i,ki <= k2i+1)或者(ki >= k2i,ki >= k2i+1), (i = 1,2,3,4...n/2)
性质:
堆的实现通过构造二叉堆(binary heap),实为二叉树的一种;由于其应用的普遍性,当不加限定时,均指该数据结构的这种实现。这种数据结构具有以下性质。
- 任意节点小于(或大于)它的所有后裔,最小元(或最大元)在堆的根上(堆序性)。
- 堆总是一棵完全树。即除了最底层,其他层的节点都被元素填满,且最底层尽可能地从左到右填入。
将根节点最大的堆叫做最大堆或大根堆,根节点最小的堆叫做最小堆或小根堆。常见的堆有二叉堆、斐波那契堆等。
支持的基本操作[编辑]
操作 描述 时间复杂度
build 创建一个空堆 O(n)
insert 向堆中插入一个新元素 O(log n)
update 将新元素提升使其匹配堆的性质
get 获取当前堆顶元素的值 O(1)
delete 删除堆顶元素 O(log n)
heapify 使删除堆顶元素的堆再次成为堆
某些堆实现还支持其他的一些操作,如斐波那契堆支持检查一个堆中是否存在某个元素。
示例程序:
为将元素X插入堆中,找到空闲位置,创建一个空穴,若满足堆序性(英文:heap order),则插入完成;否则将父节点元素装入空穴,删除该父节点元素,完成空穴上移。直至满足堆序性。这种策略叫做上滤(percolate up)。[1]
void Insert( ElementType X, PriorityQueue H )
{
int i;
if( IsFull(H) )
{
printf( "Queue is full.\n" );
return;
}
for( i = ++H->Size; H->Element[i/2] > X; i /= 2 )
H->Elements[i] = H->Elements[i/2];
H->Elements[i] = X;
}
以上是插入到一个二叉堆的过程。
DeleteMin,删除最小元,即二叉树的根或父节点。删除该节点元素后,队列最后一个元素必须移动到堆得某个位置,使得堆仍然满足堆序性质。这种向下替换元素的过程叫作下滤。
ElementType
DeleteMin( PriorityQueue H )
{
int i, Child;
ElementType MinElement, LastElement;
if( IsEmpty( H ) )
{
printf( "Queue is empty.\n" );
return H->Elements[0];
}
MinElement = H->Elements[1];
LastElement = H->Elements[H->Size--];
for( i = 1; i*2 <= H->Size; i = Child )
{
// Find smaller child.
Child = i*2;
if( Child != H->Size && H->Elements[Child+1]
< H->Elements[Child] )
Child++;
// Percolate one level.
if( LastElement > H->Elements[Child] )
H->Elements[i] = H->Elements[Child];
else
break;
}
H->Elements[i] = LastElement;
return MinElement;
}
应用
堆排序,或者运用堆的排序以选择优先
Python中的heapq模块 -- 堆排序算法
Purpose:
目的:
The heapq implements a min-heap sort algorithm suitable for use with Python’s lists.
heapq模块执行了一个适用于Python列表的最小堆排序算法。
A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2N+1 and 2N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.
堆,是一个类似于树的数据结构,其中子节点们与其父节点有一个排序的关系。二叉堆能够表示为使用一个列表或者数组来组织,因此元素N的子节点位于2N+1和2N+2(对于基于0的索引)。这样的布局允许在原来位置重置堆,因此没有必要在添加和删除元素的时候重置过多的内存空间。
A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.
一个最大堆可以确保父节点大于或者等于其两个子节点。最小堆需要父节点小于或者等于其子节点。Python的heapq模块使用最小堆。
Example Data
示例数据
The examples in this section use the data in heapq_heapdata.py.
本节的这个示例使用heapq_heapdata.py中的data。
heapq_heapdata.py
# This data was generated with the random module.
data = [19, 9, 4, 10, 11]
The heap output is printed using heapq_showtree.py:
heapq_showtree.py使用heap的输出:
heapq_showtree.py
import math
from io import StringIO
def show_tree(tree, total_width=36, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = -1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i + 1, 2)))
else:
row = 0
if row != last_row:
output.write('\n')
columns = 2 ** row
col_width = int(math.floor(total_width / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print(output.getvalue())
print('-' * total_width)
print()
Creating a Heap
创建一个堆
There are two basic ways to create a heap, heappush() and heapify().
想要创建一个堆,有两种基本方法:heappush() 和 heapify()。
heapq_heappush.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print('random :', data)
print()
for n in data:
print('add {:>3}:'.format(n))
heapq.heappush(heap, n)
show_tree(heap)
Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.
使用heappush(),当来自于数据源的新元素添加到堆中时,堆排序算法将维护元素的顺序。
$ python3 heapq_heappush.py
random : [19, 9, 4, 10, 11]
add 19:
19
add 9:
9
19
add 4:
4
19 9
add 10:
4
10 9
19
add 11:
4
10 9
19 11
If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.
如果data已经在内存之中,使用heapify()来在列表内部重置元素将会更高效。
heapq_heapify.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify().
按照堆的顺序,一次一个元素构建一个列表的结果,与构建一个构建一个未排序的列表一致,直接使用heapify():
$ python3 heapq_heapify.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
Accessing Contents of a Heap
访问堆中的内容
Once the heap is organized correctly, use heappop() to remove the element with the lowest value.
一旦堆正确的组织,就可以使用heappop()来移除最小的元素值。
heapq_heappop.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
print
for i in range(2):
smallest = heapq.heappop(data)
print('pop {:>3}:'.format(smallest))
show_tree(data)
In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.
在本示例中,参考标准库的文档,heapify() 和 heappop()用于对于一个数字列表进行排序。
$ python3 heapq_heappop.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
pop 4:
9
10 19
11
pop 9:
10
11 19
To remove existing elements and replace them with new values in a single operation, use heapreplace().
想要用一次操作移除一个已经存在的元素,然后使用一个新的元素替代它,可以使用 heapreplace():
heapq_heapreplace.py
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print('start:')
show_tree(data)
for n in [0, 13]:
smallest = heapq.heapreplace(data, n)
print('replace {:>2} with {:>2}:'.format(smallest, n))
show_tree(data)
Replacing elements in place makes it possible to maintain a fixed size heap, such as a queue of jobs ordered by priority.
在原来的位置替换元素,这样就可以维护一个固定大小的堆,例如按照优先级排列的任务队列。
$ python3 heapq_heapreplace.py
start:
4
9 19
10 11
replace 4 with 0:
0
9 19
10 11
replace 0 with 13:
9
10 19
13 11
Data Extremes From a Heap
堆中两端的数据
heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains.
heapq包含两个函数用与迭代查找堆中一定范围内最大或者最小的值。
heapq_extremes.py
import heapq
from heapq_heapdata import data
print('all :', data)
print('3 largest :', heapq.nlargest(3, data))
print('from sort :', list(reversed(sorted(data)[-3:])))
print('3 smallest:', heapq.nsmallest(3, data))
print('from sort :', sorted(data)[:3])
Using nlargest() and nsmallest() are only efficient for relatively small values of n > 1, but can still come in handy in a few cases.
使用 nlargest() 和 nsmallest() 这两个函数对于查找 n > 1的较小数值会显得更高效,但是在一些场景下更灵活。
$ python3 heapq_extremes.py
all : [19, 9, 4, 10, 11]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [4, 9, 10]
from sort : [4, 9, 10]
Efficiently Merging Sorted Sequences
高效的合并已排序的序列
Combining several sorted sequences into one new sequence is easy for small data sets.
使用如下所示的方法,对于较小的数据集,合并一些已经排序的序列到一个新的序列会变的很容易。
list(sorted(itertools.chain(*data)))
For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, merge() uses a heap to generate a new sequence one item at a time, and determine the next item using a fixed amount of memory.
对于较大的数据集,使用如上所示的代码将会占用大量的内存。与对整个已经组合的序列进行排序相比,merge()使用了堆来一次性生成新的序列,并且判断新生成的序列使用固定数量的内存。
heapq_merge.py
import heapq
import random
random.seed(2016)
data = []
for i in range(4):
new_data = list(random.sample(range(1, 101), 5))
new_data.sort()
data.append(new_data)
for i, d in enumerate(data):
print('{}: {}'.format(i, d))
print('\nMerged:')
for i in heapq.merge(*data):
print(i, end=' ')
print()
Because the implementation of merge() uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.
由于在堆中执行merge()函数,所消耗的内存取决于所合并的序列数量,而不是这些序列中的元素数量。
$ python3 heapq_merge.py
0: [33, 58, 71, 88, 95]
1: [10, 11, 17, 38, 91]
2: [13, 18, 39, 61, 63]
3: [20, 27, 31, 42, 45]
Merged:
10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95
最让我眼前一亮的应用,就是leetcode中的第二十三题目:merge k sorted list,我们可以将问题转换为合并queue:
class ListNode(object):
def __init__(self, x):
self.val = x
self.next = None
def mergeKLists(lists):
current = dummy = ListNode(0)
heap = []
for sorted_list in lists:
if sorted_list:
heapq.heappush(heap, (sorted_list.val, sorted_list))
while heap:
smallest = heapq.heappop(heap)[1]
current.next = smallest
current = current.next
if smallest.next:
heapq.heappush(heap, (smallest.next.val, smallest.next))
return dummy.next
将链表的问题转换成堆排序的问题。
1