浅析Hadoop下的基础排序算法(MergeSort)

一     前言:

1.        语术:

·        MS:  MergeSort

·        IS:  InsertSort

·        Hadoop 版本: 基于Version 2.7.1代码分析

2.     MS归并排序的思想:

把数组二分为sub数组,递归各个sub数组排序,排好sub数组后,归并到目的数组dest的对应的下标段,归并的下标段最后回归到0到(length-1)。

这里有动画, https://www.jianshu.com/p/7d037c332a9d   

3.  目的:

分析Hadoop的MS排序是否与标准的MS排序有区别?是否有优化?

二.内容

源代码:

 

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.util;

import java.util.Comparator;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.io.IntWritable;

/** An implementation of the core algorithm of MergeSort. */
@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable
public class MergeSort {
  //Reusable IntWritables
  IntWritable I = new IntWritable(0);
  IntWritable J = new IntWritable(0);
  
  //the comparator that the algo should use
  private Comparator comparator;
  
  public MergeSort(Comparator comparator) {
    this.comparator = comparator;
  }
  
  public void mergeSort(int src[], int dest[], int low, int high) {
    int length = high - low;

    // Insertion sort on smallest arrays
    if (length < 7) {
      for (int i=low; i low; j--) {
          I.set(dest[j-1]);
          J.set(dest[j]);
          if (comparator.compare(I, J)>0)
            swap(dest, j, j-1);
        }
      }
      return;
    }

    // Recursively sort halves of dest into src
    int mid = (low + high) >>> 1;
    mergeSort(dest, src, low, mid);
    mergeSort(dest, src, mid, high);

    I.set(src[mid-1]);
    J.set(src[mid]);
    // If list is already sorted, just copy from src to dest.  This is an
    // optimization that results in faster sorts for nearly ordered lists.
    if (comparator.compare(I, J) <= 0) {
      System.arraycopy(src, low, dest, low, length);
      return;
    }

    // Merge sorted halves (now in src) into dest
    for (int i = low, p = low, q = mid; i < high; i++) {
      if (q < high && p < mid) {
        I.set(src[p]);
        J.set(src[q]);
      }
      if (q>=high || p

1.        Sub数组的长度length小于7时,引用lS来排序,这个跟Hadoop 的QS里length < 13类似。

   // Insertion sort on smallest arrays

   if(length< 7) {

     for(inti=low; i<high; i++) {

        for (intj=i;j > low; j--) {

          I.set(dest[j-1]);

          J.set(dest[j]);

          if (comparator.compare(I, J)>0)

            swap(dest, j, j-1);

        }

     }

     return;

    }

2.        二分法后递归,思路一致。

// Recursively sort halves of destinto src

   intmid= (low+ high)>>> 1;

   mergeSort(dest,src,low,mid);

   mergeSort(dest,src,mid,high);

3.        如果两个sub数组拼接到一起就是有序的话

   // If list is already sorted, just copyfrom src to destThis isan

   // optimization that results in fastersorts for nearly ordered lists.

   if(comparator.compare(I, J) <= 0) {

     System.arraycopy(src, low,dest,low,length);

     return;

    }

4.        按从小到大挨个放入 dest数组中

    // Merge sorted halves (now in src) intodest

    for (int i = low, p = low, q = mid; i

      if (q < high && p < mid) {

        I.set(src[p]);

        J.set(src[q]);

      }

      if (q>=high || p

        dest[i] = src[p++];

      else

        dest[i] = src[q++];

    }


你可能感兴趣的:(Hadoop)