要了解堆首先需要了解下二叉树(英语:Binary tree),在计算机科学中,二叉树是每个节点最多有两个子树的树结构。通常子树被称作“左子树”(left subtree)和“右子树”(right subtree)。二叉树常被用于实现二叉查找树和二叉堆。

二叉树的每个结点至多只有二棵子树(不存在度大于 2 的结点),二叉树的子树有左右之分,次序不能颠倒。二叉树的第 i 层至多有 2i - 1 个结点;深度为 k 的二叉树至多有 2k - 1 个结点;对任何一棵二叉树 T,如果其终端结点数为 n0,度为 2 的结点数为 n2,则n0 = n2 + 1。


  • 树的结点个数至少为 1,而二叉树的结点个数可以为 0
  • 树中结点的最大度数没有限制,而二叉树结点的最大度数为 2
  • 树的结点无左、右之分,而二叉树的结点有左、右之分

二叉树又分为完全二叉树(complete binary tree)和满二叉树(full binary tree)


(1)满二叉树:一棵深度为 k,且有 2k - 1 个节点称之为满二叉树


[Java排序算法]--堆排序 (Heap Sort)_第1张图片

深度为 3 的满二叉树 full binary tree。

(2)完全二叉树:深度为 k,有 n 个节点的二叉树,当且仅当其每一个节点都与深度为 k 的满二叉树中序号为 1 至 n 的节点对应时,称之为完全二叉树

[Java排序算法]--堆排序 (Heap Sort)_第2张图片

深度为 3 的完全二叉树 complete binary tree







[Java排序算法]--堆排序 (Heap Sort)_第3张图片


《算法导论》中谈到:对于给定的某个结点的下标 i,可以很容易的计算出这个结点的父结点、左孩子结点和右孩子节点的下标(基于下标以1开始):


  • Parent(i) = floor(i/2),i 的父节点下标(向下取整)
  • Left(i) = 2i,i 的左子节点下标
  • Right(i) = 2i + 1,i 的右子节点下标







  • 最大堆的最大元素在根结点(堆顶)
  • 堆中每个父节点的元素值都大于等于其孩子结点


[Java排序算法]--堆排序 (Heap Sort)_第4张图片




  • 最小堆的最小元素值在根结点(堆顶)
  • 堆中每个父节点的元素值都小于等于其孩子结点




[Java排序算法]--堆排序 (Heap Sort)_第5张图片


三、 堆排序原理



  • 最大堆调整(Max-Heapify):将堆的末端子节点作调整,使得子节点永远小于父节点,保证最大堆性质
  • 创建最大堆(Build-Max-Heap):将堆所有数据重新排序,使其成为最大堆
  • 堆排序(Heap-Sort):移除位在第一个数据的根节点,并做最大堆调整的递归运算


这里我们需要注意:数组都是 Zero-Based,这就意味着我们的堆数据结构模型要发生改变。


[Java排序算法]--堆排序 (Heap Sort)_第6张图片



  • Parent(i) = floor((i-1)/2),i 的父节点下标
  • Left(i) = 2i + 1,i 的左子节点下标
  • Right(i) = 2(i + 1),i 的右子节点下标





[Java排序算法]--堆排序 (Heap Sort)_第7张图片



创建最大堆(Build-Max-Heap)的作用是将一个数组改造成一个最大堆,接受数组和堆大小两个参数,Build-Max-Heap 将自下而上的调用 Max-Heapify 来改造数组,建立最大堆。因为 Max-Heapify 能够保证下标 i 的结点之后结点都满足最大堆的性质,所以自下而上的调用 Max-Heapify 能够在改造过程中保持这一性质。如果最大堆的数量元素是 n,那么 Build-Max-Heap 从 Parent(n) 开始,往上依次调用 Max-Heapify。流程如下:

[Java排序算法]--堆排序 (Heap Sort)_第8张图片



[Java排序算法]--堆排序 (Heap Sort)_第9张图片




package com.ngaa.bigdata.common.utils.sort;

 * Created by yangjf on 20171024.
 * Update date:
 * Time: 8:46
 * Project: ngaa-cdn-java-sdk
 * Package: com.ngaa.utils
 * Describe : 找到最大堆和最小堆的排序

* Result of Test: test ok * Command: *

* Email: [email protected] * Status:Using online *

* Please note: * Must be checked once every time you submit a configuration file is correct! * Data is priceless! Accidentally deleted the consequences! */ public class FindTopNUtils { private static HeapSortUtil heapSortUtil = new HeapSortUtil(); /** * 方法的目的:使得数组a始终保持降序排序 * * @param a 堆数组:例如 a={10,9,8,7,6,5,4,3,2,1} * @param value 输入的值 * @throws Exception 异常 */ public synchronized void findMaxTopN(int[] a, int value) throws Exception { try { int arraySize = a.length; // 数组长度 /** * tmp的值可能性是 * (1)大于最大的元素: tmp>heap[0] * (2)处于最小和最大之间:heap[arraySize-1] a[0] || (a[arraySize - 1] < value && value < a[0])) { // 阶梯交换值:即将最小的值用value替换 a[arraySize - 1] = value; // 保证最小堆的性质 heapSortUtil.heapDescSort(a, arraySize); } } catch (Exception minE) { throw new RuntimeException(minE); } } /** * 方法的目的:使得数组a始终保持升序排序 * * @param a 堆数组:例如 a={1,2,3,4,5,6,7,8} * @param value 输入的值 * @throws Exception 异常 */ public synchronized void findMinTopN(int[] a, int value) throws Exception { try { int arraySize = a.length; // 数组长度 /** * tmp的值可能性是 * (1)小于最小值: tmpheap[arraySize-1](大于数组最大值) * * */ if (value < a[0] || (a[0] < value && value < a[arraySize - 1])) { // 阶梯交换值:即将最大的值用value替换 a[arraySize - 1] = value; // 保证最大堆的性质 heapSortUtil.heapAscSort(a, arraySize); } // 为了避免数组初始时没有元素加入,需要添加:value>a[0] if (value > a[0] && a[0] == 0) { // 阶梯交换值:即将第一个元素用value替换 a[0] = value; // 保证最大堆的性质 heapSortUtil.heapAscSort(a, arraySize); } } catch (Exception maxE) { throw new RuntimeException(maxE); } } }








[Java排序算法]--堆排序 (Heap Sort)_第10张图片



package com.ngaa.bigdata.common.utils.sort

  * Created by yangjf on 20171030.
  * Update date:
  * Time: 10:09
  * Project: sparkmvn
  * Package: com.ngaa.bigdata.common.utils.sort
  * Describe :
  *          This class is the largest stack and the smallest heap sort for the second element of the ancestor.
  * Result of Test: test ok
  * Command:
  * Email:  [email protected]
  * Status:Using online
  * Please note:
  * Must be checked once every time you submit a configuration file is correct!
  * Data is priceless! Accidentally deleted the consequences!
class SortByHeapUtils extends Serializable{

  def parent(i: Int): Int = {
     (Math.floor(i / 2) - 1).asInstanceOf[Int]

  def left(i: Int): Int = {
    2 * i + 1

  def right(i: Int): Int = {
    2 * (i + 1)

  def swap(array: Array[(String, Long)], i: Int, j: Int): Unit = {
    val tmp = array(i)
    array(i) = array(j)
    array(j) = tmp

  def minHeapify(a: Array[(String, Long)], index: Int, heapSize: Int): Any = {
    val l = left(index)
    val r = right(index)
    var largestIndex: Int = 0

    if (l < heapSize && (a(l)._2 < a(index)._2)) {
      largestIndex = l
    } else {
      largestIndex = index

    if (r < heapSize && a(r)._2 < a(largestIndex)._2) {
      largestIndex = r

    if (largestIndex != index) {
      swap(a, index, largestIndex)
      minHeapify(a, largestIndex, heapSize)

  def maxHeapify(a: Array[(String, Long)], index: Int, heapSize: Int): Any = {
    val l = left(index)
    val r = right(index)
    var largestIndex: Int = 0

    if (l < heapSize && (a(l)._2 > a(index)._2)) {
      largestIndex = l
    } else {
      largestIndex = index

    if (r < heapSize && a(r)._2 > a(largestIndex)._2) {
      largestIndex = r

    if (largestIndex != index) {
      swap(a, index, largestIndex)
      maxHeapify(a, largestIndex, heapSize)

  def buildMinHeapify(a: Array[(String, Long)], heapSize: Int): Unit = {
    val parentIndex: Int = parent(a.length)
    for (i <- parentIndex to 0 by -1) {
      minHeapify(a, i, heapSize)

  def buildMaxHeapify(a: Array[(String, Long)], heapSize: Int): Unit = {
    val parentIndex: Int = parent(a.length)
    for (i <- parentIndex to 0 by -1) {
      maxHeapify(a, i, heapSize)

  def heapDescSort(a: Array[(String, Long)], headSize: Int) {
    buildMinHeapify(a, headSize)

    var headSizeTmp = headSize
    for (i <- a.length - 1 to 0 by -1) {
      swap(a, 0, i)
      headSizeTmp -= 1
      minHeapify(a, 0, headSizeTmp)

  def heapAscSort(a: Array[(String, Long)], headSize: Int) {
    buildMaxHeapify(a, headSize)

    var headSizeTmp = headSize
    for (i <- a.length - 1 to 0 by -1) {
      swap(a, 0, i)
      headSizeTmp -= 1
      maxHeapify(a, 0, headSizeTmp)





package com.ngaa.bigdata.common.utils.sort

import com.ngaa.bigdata.common.model.global.NgaaException
import com.ngaa.bigdata.common.traits.HeapSort

  * Created by yangjf on 20171030.
  * Update date:
  * Time: 11:54
  * Project: sparkmvn
  * Package: com.ngaa.bigdata.common.utils.sort
  * Describe :
  *        The Scala version looks for the largest number of N and the smallest number of N numbers in the tuple.
  * Result of Test: test ok
  * Command:
  * Email:  [email protected]
  * Status:Using online
  * Please note:
  * Must be checked once every time you submit a configuration file is correct!
  * Data is priceless! Accidentally deleted the consequences!
class FindSortTopN extends HeapSort with Serializable{
  private val sortByHeapUtils = new SortByHeapUtils

  override def findMaxTopN(a: Array[(String, Long)], value: (String, Long)): Unit = {
    try {
      val arraySize: Int = a.length // 数组长度
        * tmp的值可能性是
        * (1)大于最大的元素:   tmp>heap[0]
        * (2)处于最小和最大之间:heap[arraySize-1]< tmp < heap[0]
        * (3)舍弃值:value=heap[0] 、 value=heap[arraySize-1] 和 value < heap[arraySize-1](小于最小值)
      if (value._2 >= a(0)._2 || (a(arraySize - 1)._2 < value._2 && value._2 < a(0)._2)) {
        // 阶梯交换值:即将最小的值用value替换
        a(arraySize - 1) = value
        // 保证最小堆的性质
        sortByHeapUtils.heapDescSort(a, arraySize)
    catch {
      case minE: Exception => throw new RuntimeException(minE)

  override def findMinTopN(a: Array[(String, Long)], value: (String, Long)): Unit = {
    try {
      val arraySize = a.length; // 数组长度
        * tmp的值可能性是
        * (1)小于最小值:       tmpa[0]
      if (value._2 > a(0)._2 && a(0)._2 == 0) {
        // 阶梯交换值:即将第一个元素用value替换
        a(0) = value
        // 保证最大堆的性质
        sortByHeapUtils.heapAscSort(a, arraySize)

    } catch {
      case maxE: Exception => throw new RuntimeException(maxE)

  override def initArray(array: Array[(String, Long)],initValue:(String,Long)=("init",0l)): Unit = {
   for(i <- array.indices ){







package com.ngaa.bigdata.common.traits

import com.ngaa.bigdata.common.model.global.NgaaException

  * Created by yangjf on 20171030.
  * Update date:
  * Time: 11:19
  * Project: sparkmvn
  * Package: com.ngaa.bigdata.common.traits
  * Describe : Heap sort interface
  * Result of Test: test ok
  * Command:
  * Email:  [email protected]
  * Status:Using online
  * Please note:
  * Must be checked once every time you submit a configuration file is correct!
  * Data is priceless! Accidentally deleted the consequences!
 trait HeapSort  extends Serializable{

    *  Initialize the array
    * @param array       Input array
    * @param initValue   Init value
    * @throws com.ngaa.bigdata.common.model.global.NgaaException exception
    def initArray(array:Array[(String, Long)],initValue:(String,Long)=("init",0l))

    * Discover the largest number of N numbers in the Tuple.
    * @param array Input array.
    * @param tuple Tuple
    * @throws com.ngaa.bigdata.common.model.global.NgaaException exception
    def findMaxTopN(array:Array[(String, Long)],tuple:(String,Long))

    * Discover the smallest number of N numbers in the Tuple.
    * @param array Input array.
    * @param tuple Tuple
    * @throws com.ngaa.bigdata.common.model.global.NgaaException exception
    def findMinTopN(array:Array[(String, Long)],tuple:(String,Long))












