归并排序算法使用的是典型的分治思维。要对一个数组A排序,那么可以将这个数组分成两个部分B和C,对B和C分别排序后,再将B和C按顺序进行归并。
这种分治的思想可以很轻松地应用到MapReduce架构。由于B和C的排序过程是彼此独立的,因此可以进行并行运算(对应于Map的过程),而B和C的归并过程则可以通过Reduce实现。
归并排序排序是一个递归的过程,需要将原始序列不停地拆分成两个小序列,直到序列中只有一个元素(此时自然是有序的),再逐层返回调用点进行两两归并。
为实现上述过程,需要编写两个函数:
merge()
merge()
函数用来实现两个子数组的归并,对应上图中的归并过程mergeSort()
mergeSort()
函数用来实现递归调用最新代码请参考本人github
# merge sort
# assume that A is an array
# the goal is to sort an child sequnce of A(by ascending order)
# p is the start element index of the child sequnce
# r is the last element index of the child sequnce
# elements [p, q] is a sorted child sequence
# elements [q+1, r] is also a sorted child sequence
def merge(A, p, q, r):
L = A[p:q+1]
R = A[q+1:r+1]
# append positive infinity as facility to aviod
# L or R is out of elements during traversing
# just for clean code
L.append(float("inf"))
R.append(float("inf"))
print("L: ", L)
print("R: ", R)
idxL = 0
idxR = 0
for idxA in range(p, r+1):
print("idxL: ", idxL, " idxR: ", idxR)
if L[idxL] <= R[idxR]:
A[idxA] = L[idxL]
idxL = idxL + 1
else:
A[idxA] = R[idxR]
idxR = idxR + 1
print("round ", idxA, "of sorting result: ", A)
# recursion process
# A is the array to be sorted
# p is the index of the start element
# r is the index of the end element
def mergeSort(A, p, r):
if p < r:
# try to split equally
q = (p + r)//2
mergeSort(A, p, q)
mergeSort(A, q+1, r)
merge(A, p, q, r)
if __name__ == '__main__':
print("--->Test function merge()...")
A = [2, 4, 5, 7, 9, 1, 2, 3, 6]
print("original seq: ", A)
merge(A, 0, 4, 8)
print("\n--->Test function mergeSort()...")
B = [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
mergeSort(B, 0, len(B)-1)
上述merge()
函数的思想是:
由于参与merge的两个子序列已经是分别排好序的(假设按升序排列),那么只需要在每次循环中比较每个子序列中最小的值(也就是二者当前索引指向的值),将这两个值中较小的值插入到原始序列中去。两个子序列的长度加起来就是原始序列的长度N,因此遍历N次就可以完成这个过程。
上述代码中用到一个小技巧,在两个数组尾部中分别插入了一个正无穷大的值。这样做是为了防止其中一个子序列的元素已经遍历完了,索引溢出的情况。假设子序列B中的元素已经遍历完,那么此时B的索引指向正无穷大,由于子序列C中剩下的所有元素都比正无穷大要小,因此后面C中所有的元素都可以通过循环中同样的一段大小比较的代码逻辑,插入到原始数组A中,而不需要为其中一个子序列为空时编写另外的逻辑。
mergeSort()
函数用来整合整个递归过程。注意,可以均分原始数组,也可以按自定义的方式切分原始数组,这里采用均分的方式。
下面是代码的输出,代码分别测试了merge()
和mergeSort()
两个函数,可以通过打印看出归并和递归的过程:
--->Test function merge()...
original seq: [2, 4, 5, 7, 9, 1, 2, 3, 6]
L: [2, 4, 5, 7, 9, inf]
R: [1, 2, 3, 6, inf]
idxL: 0 idxR: 0
round 0 of sorting result: [1, 4, 5, 7, 9, 1, 2, 3, 6]
idxL: 0 idxR: 1
round 1 of sorting result: [1, 2, 5, 7, 9, 1, 2, 3, 6]
idxL: 1 idxR: 1
round 2 of sorting result: [1, 2, 2, 7, 9, 1, 2, 3, 6]
idxL: 1 idxR: 2
round 3 of sorting result: [1, 2, 2, 3, 9, 1, 2, 3, 6]
idxL: 1 idxR: 3
round 4 of sorting result: [1, 2, 2, 3, 4, 1, 2, 3, 6]
idxL: 2 idxR: 3
round 5 of sorting result: [1, 2, 2, 3, 4, 5, 2, 3, 6]
idxL: 3 idxR: 3
round 6 of sorting result: [1, 2, 2, 3, 4, 5, 6, 3, 6]
idxL: 3 idxR: 4
round 7 of sorting result: [1, 2, 2, 3, 4, 5, 6, 7, 6]
idxL: 4 idxR: 4
round 8 of sorting result: [1, 2, 2, 3, 4, 5, 6, 7, 9]
--->Test function mergeSort()...
L: [1, inf]
R: [3, inf]
idxL: 0 idxR: 0
round 0 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 1 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
L: [1, 3, inf]
R: [7, inf]
idxL: 0 idxR: 0
round 0 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 1 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 2 idxR: 0
round 2 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
L: [2, inf]
R: [4, inf]
idxL: 0 idxR: 0
round 3 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 4 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
L: [2, 4, inf]
R: [9, inf]
idxL: 0 idxR: 0
round 3 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 4 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 2 idxR: 0
round 5 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
L: [1, 3, 7, inf]
R: [2, 4, 9, inf]
idxL: 0 idxR: 0
round 0 of sorting result: [1, 3, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 1 of sorting result: [1, 2, 7, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 1 idxR: 1
round 2 of sorting result: [1, 2, 3, 2, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 2 idxR: 1
round 3 of sorting result: [1, 2, 3, 4, 4, 9, 10, 1, 11, 12, 18, 9]
idxL: 2 idxR: 2
round 4 of sorting result: [1, 2, 3, 4, 7, 9, 10, 1, 11, 12, 18, 9]
idxL: 3 idxR: 2
round 5 of sorting result: [1, 2, 3, 4, 7, 9, 10, 1, 11, 12, 18, 9]
L: [10, inf]
R: [1, inf]
idxL: 0 idxR: 0
round 6 of sorting result: [1, 2, 3, 4, 7, 9, 1, 1, 11, 12, 18, 9]
idxL: 0 idxR: 1
round 7 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
L: [1, 10, inf]
R: [11, inf]
idxL: 0 idxR: 0
round 6 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 7 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
idxL: 2 idxR: 0
round 8 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
L: [12, inf]
R: [18, inf]
idxL: 0 idxR: 0
round 9 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
idxL: 1 idxR: 0
round 10 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 12, 18, 9]
L: [12, 18, inf]
R: [9, inf]
idxL: 0 idxR: 0
round 9 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 9, 18, 9]
idxL: 0 idxR: 1
round 10 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 9, 12, 9]
idxL: 1 idxR: 1
round 11 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 9, 12, 18]
L: [1, 10, 11, inf]
R: [9, 12, 18, inf]
idxL: 0 idxR: 0
round 6 of sorting result: [1, 2, 3, 4, 7, 9, 1, 10, 11, 9, 12, 18]
idxL: 1 idxR: 0
round 7 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 11, 9, 12, 18]
idxL: 1 idxR: 1
round 8 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 10, 9, 12, 18]
idxL: 2 idxR: 1
round 9 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 3 idxR: 1
round 10 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 3 idxR: 2
round 11 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 10, 11, 12, 18]
L: [1, 2, 3, 4, 7, 9, inf]
R: [1, 9, 10, 11, 12, 18, inf]
idxL: 0 idxR: 0
round 0 of sorting result: [1, 2, 3, 4, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 1 idxR: 0
round 1 of sorting result: [1, 1, 3, 4, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 1 idxR: 1
round 2 of sorting result: [1, 1, 2, 4, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 2 idxR: 1
round 3 of sorting result: [1, 1, 2, 3, 7, 9, 1, 9, 10, 11, 12, 18]
idxL: 3 idxR: 1
round 4 of sorting result: [1, 1, 2, 3, 4, 9, 1, 9, 10, 11, 12, 18]
idxL: 4 idxR: 1
round 5 of sorting result: [1, 1, 2, 3, 4, 7, 1, 9, 10, 11, 12, 18]
idxL: 5 idxR: 1
round 6 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]
idxL: 6 idxR: 1
round 7 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]
idxL: 6 idxR: 2
round 8 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]
idxL: 6 idxR: 3
round 9 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]
idxL: 6 idxR: 4
round 10 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]
idxL: 6 idxR: 5
round 11 of sorting result: [1, 1, 2, 3, 4, 7, 9, 9, 10, 11, 12, 18]