一 前言:
1. 语术:
· MS: MergeSort
· IS: InsertSort
· Hadoop 版本: 基于Version 2.7.1代码分析
2. MS归并排序的思想:
把数组二分为sub数组,递归各个sub数组排序,排好sub数组后,归并到目的数组dest的对应的下标段,归并的下标段最后回归到0到(length-1)。
这里有动画, https://www.jianshu.com/p/7d037c332a9d
3. 目的:
分析Hadoop的MS排序是否与标准的MS排序有区别?是否有优化?
二.内容
源代码:
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.util;
import java.util.Comparator;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.io.IntWritable;
/** An implementation of the core algorithm of MergeSort. */
@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable
public class MergeSort {
//Reusable IntWritables
IntWritable I = new IntWritable(0);
IntWritable J = new IntWritable(0);
//the comparator that the algo should use
private Comparator comparator;
public MergeSort(Comparator comparator) {
this.comparator = comparator;
}
public void mergeSort(int src[], int dest[], int low, int high) {
int length = high - low;
// Insertion sort on smallest arrays
if (length < 7) {
for (int i=low; i low; j--) {
I.set(dest[j-1]);
J.set(dest[j]);
if (comparator.compare(I, J)>0)
swap(dest, j, j-1);
}
}
return;
}
// Recursively sort halves of dest into src
int mid = (low + high) >>> 1;
mergeSort(dest, src, low, mid);
mergeSort(dest, src, mid, high);
I.set(src[mid-1]);
J.set(src[mid]);
// If list is already sorted, just copy from src to dest. This is an
// optimization that results in faster sorts for nearly ordered lists.
if (comparator.compare(I, J) <= 0) {
System.arraycopy(src, low, dest, low, length);
return;
}
// Merge sorted halves (now in src) into dest
for (int i = low, p = low, q = mid; i < high; i++) {
if (q < high && p < mid) {
I.set(src[p]);
J.set(src[q]);
}
if (q>=high || p
1. Sub数组的长度length小于7时,引用lS来排序,这个跟Hadoop 的QS里length < 13类似。
// Insertion sort on smallest arrays
if(length< 7) {
for(inti=low; i<high; i++) {
for (intj=i;j > low; j--) {
I.set(dest[j-1]);
J.set(dest[j]);
if (comparator.compare(I, J)>0)
swap(dest, j, j-1);
}
}
return;
}
2. 二分法后递归,思路一致。
// Recursively sort halves of destinto src
intmid= (low+ high)>>> 1;
mergeSort(dest,src,low,mid);
mergeSort(dest,src,mid,high);
3. 如果两个sub数组拼接到一起就是有序的话
// If list is already sorted, just copyfrom src to dest. This isan
// optimization that results in fastersorts for nearly ordered lists.
if(comparator.compare(I, J) <= 0) {
System.arraycopy(src, low,dest,low,length);
return;
}
4. 按从小到大挨个放入 dest数组中
// Merge sorted halves (now in src) intodest
for (int i = low, p = low, q = mid; i if (q < high && p < mid) { I.set(src[p]); J.set(src[q]); } if (q>=high || p dest[i] = src[p++]; else dest[i] = src[q++]; }