Python源码阅读:堆的入堆出堆方法实现

基本概念

优先队列(priority queue)是一种特殊的队列,取出元素的顺序是按照元素的优先权(关键字)大小,而不是进入队列的顺序,堆就是一种优先队列的实现。堆一般是由数组实现的,逻辑上堆可以被看做一个完全二叉树(除底层元素外是完全充满的,且底层元素是从左到右排列的)。

堆分为最大堆和最小堆,最大堆是指每个根结点的值大于左右孩子的节点值,最小堆则是根结点的值小于左右孩子的值。

Python源码阅读:堆的入堆出堆方法实现_第1张图片

实现

Python中堆的是以小根堆的方式实现,本文主要介绍Python源码中入堆和出堆方法的实现:

"""
                                  0

                  1                                 2

          3               4                5               6

      7       8       9       10      11      12      13      14

    15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
"""
# push 将newitem放入底部,从底部开始把父结点往下替换
def heappush(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown(heap, 0, len(heap)-1)

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

# pop 顶部元素被弹出,把最后一个元素放到顶部,在把子节点(小于自己的)往上替换
def heappop(heap):
    """Pop the smallest item off the heap, maintaining the heap invariant."""
    lastelt = heap.pop()    # pop tail in list
                                                        # raises appropriate IndexError if heap is empty
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now. Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

对于_siftup方法实现的解释

可以看到python中_siftup方法的实现是和教科书以及直觉不太吻合的:

  1. siftup只关心哪个子节点小,就把它往上移(并不关心子节点是否大于自己)
  2. 最后当节点到达叶子节点后在通过siftDown把大于自己的父结点往下移

如下是示意图,按理来说每次_siftup的时候直接和子节点比较似乎更高效。

"""
      3 
  4       6 
9  15  [8]
   
     [8]
  4       6
9  15  

      4
 [8]      6
9  15 

      4
  9       6
[8] 15
    
_siftdown

      4
 [8]      6
9  15
"""

接下来,先贴上源码中的解释

# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too.  We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way.  During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early).  See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples.  Comparisons are thus potentially
# expensive.

从这里的描述,我们可以这么理解,由于位于 pos 的 ball 是从 end 处移动到此的,它的所有 child indices 都是 heap,通过 _siftup bubbling smaller child up till hitting a leaf, then use _siftdown to move the ball to place. 这个过程会发现和教科书中讲的不一样,一般情况下 _siftup 过程中需要比较当前 ball 与 both childs,这样可以尽快从 loop 中 break,但是事实上,ball 的值一般来说会很大(因为在heap中被排在的end),那么把它从 root 处开始和 child 开始比较通常没有意义,也就是由于可能会一直 compare 直到比较底部甚至可能到叶子处,这样来看通常并不会让我们 break loop early(作者提到在The Art of Computer Programming volume3 中有具体的阐述和比较)。此外,由于comparison methods实现方法的不同(比如在tuples,或者自定义类中),comparisons 有可能 more expensive,所以python在对heap的实现上是采用 siftup till leaf then siftdown to right place 的方法。

你可能感兴趣的:(算法堆)