第35课:彻底解密Spark 2.1.X中Sort Shuffle 中TimSort排序源码具体实现
Spark 2.1.X中Sort Shuffle 中TimSort排序:
1,从Spark 1.6.x开始,默认核心的Shuffle是Sort Shuffle,同学们可能有个印象Sort Shuffle要完成数据排序的,但这个印象是有问题的,例如写个最简单的WordCount程序,为什么在默认情况下不进行排序呢?所以从Hash Shuffle的方式变成SortShuffle,具体是怎么实现排序的?
2,TimSort排序方式,一种相对权衡了各方面的排序方式,假如排序的数据分成很多不同的块,TimSort有很好的排序性能上的表现。因此,有必要彻底研究一下TimSort怎么实现的。
回顾一下,我们跟踪代码是从Sorter.scala中跟到timSort的,也就是进行ExternalSorter的时候要进行排序,默认情况下基于PartitionID进行排序,对PartitionID进行排序并不意味着对数据本身进行排序,我们在Sorter.scala用到了timSort。研究一下timSort的源代码,会发现timSort和MergeSort有点类似,实质上有很大的区别,我们可以初步感知timSort的排序方式。MergeSort排序的方式把数据分成很多片,开始分成很多小文件,最终把小文件合并成大文件。timSort可以认为是MergeSort排序的改良。
TimSort优化MergeSort排序,把它变成稳定的、适应的、迭代的排序,TimSort基于分布式的排序,效率有很大的提升。
MergeSort排序默认长度是1,归并的时候自动生成归并元素;TimSort是连续递增的,将其中的一块数据run进行反转,run有自己具体的实现算法,run可以认为是一块固定大小的数据,如果插入一段数据,数据的长度如果小于run的长度,TimSort就会采用二分的insertSort,进行一些局部的优化。MergeSort排序归并是固定的,而TimSort是随机
的,会有判断条件。TimSort在很多地方都有使用,例如安卓等。
TimSort.java位于org.apache.spark.util.collection包里面,其中还有一个测试类TestTimSort,如创建测试数组等。TimSort.java阅读源码的技巧先看Sort排序,然后将其它的成员,方法关联起来。TimSort.java代码如下:
1. publicvoid sort(Buffer a, int lo, int hi, Comparator super K> c) {
2. assert c != null;
3.
4. int nRemaining = hi - lo;
5. if (nRemaining < 2)
6. return; // Arrays of size 0 and 1 are always sorted
7.
8. // If array is small, do a"mini-TimSort" with no merges
9. if (nRemaining < MIN_MERGE) {
10. int initRunLen =countRunAndMakeAscending(a, lo, hi, c);
11. binarySort(a, lo, hi, lo + initRunLen,c);
12. return;
13. }
14.
15. /**
16. * March over the array once, left toright, finding natural runs,
17. * extending short natural runs to minRunelements, and merging runs
18. * to maintain stack invariant.
19. */
20. SortState sortState = new SortState(a, c,hi - lo);
21. int minRun = minRunLength(nRemaining);
22. do {
23. // Identify next run
24. int runLen = countRunAndMakeAscending(a,lo, hi, c);
25.
26. // If run is short, extend to min(minRun,nRemaining)
27. if (runLen < minRun) {
28. int force = nRemaining <= minRun ?nRemaining : minRun;
29. binarySort(a, lo, lo + force, lo +runLen, c);
30. runLen = force;
31. }
32.
33. // Push run onto pending-run stack, andmaybe merge
34. sortState.pushRun(lo, runLen);
35. sortState.mergeCollapse();
36.
37. // Advance to find next run
38. lo += runLen;
39. nRemaining -= runLen;
40. } while (nRemaining != 0);
41.
42. // Merge all remaining runs to completesort
43. assert lo == hi;
44. sortState.mergeForceCollapse();
45. assert sortState.stackSize == 1;
46. }
在TimSort.java代码中:
l nRemaining是未排序的数组的长度,是从数组的角度考虑的,不过TimSort是分布式的。
l 判断一下nRemaining小于2,已经是排序的数据了,直接返回。数组大小为0、1是已经排序的,那就不用排序。
l 如果nRemaining小于MIN_MERGE,就变成mini-TimSort,就是不使用归并排序。countRunAndMakeAscending计算得到递增数据的长度。然后使用binarySort二分排序法,这个是基本的排序法。
l 然后是SortState,SortState是构建一个栈,创建一个TimSort实例,维护我们排序的状态信息。
l minRunLength:获得最小的run长度。
l do while循环首先得到递增数列的长度,如果runLen小于minRun,则使用binarySort二分插入。sortState.pushRun(lo, runLen)是入栈,把即将运行的run放入栈中。sortState.mergeCollapse():可能进行归并排序,内部视不同的情况进行判断。lo += runLen:下一个要进行的run。
l 循环结束之后,所有剩余的run完成排序。
TimSort.java从源码实现的角度将,第一个比较关键的一行代码是 int initRunLen = countRunAndMakeAscending(a, lo, hi, c);我们看一下countRunAndMakeAscending,首先找到run的尾部,在while中进行判断,反转我们的run,最后返回run的长度。
countRunAndMakeAscending:返回run的长度。run在指定的开始位置,如果它是递减的,则反转运行。如一个run是最长的升序序列:a[lo] <= a[lo + 1] <= a[lo + 2] <= ...或者是最长的递减序列: a[lo] > a[lo + 1] > a[lo + 2] > ... 一个稳定的归并排序中严格的降序定义是必要的,能安全调用进行反转降序序列,而不破坏稳定性。
l @param a:数组中的run将被计数,并可能反转。
l @param lo:run第一个元素的索引。
l @param hi:run可能包含的最后一个元素的索引。需要 {@code lo < hi}
l @param c:用于排序的比较器。
l @return:返回run的长度。
countRunAndMakeAscending代码如下:
1. private int countRunAndMakeAscending(Buffer a,int lo, int hi, Comparator super K> c) {
2. assert lo < hi;
3. int runHi = lo + 1;
4. if (runHi == hi)
5. return 1;
6.
7. K key0 = s.newKey();
8. K key1 = s.newKey();
9.
10. // Find end of run, and reverse range ifdescending
11. if (c.compare(s.getKey(a, runHi++, key0),s.getKey(a, lo, key1)) < 0) { // Descending
12. while (runHi < hi &&c.compare(s.getKey(a, runHi, key0), s.getKey(a, runHi - 1, key1)) < 0)
13. runHi++;
14. reverseRange(a, lo, runHi);
15. } else { //Ascending
16. while (runHi < hi &&c.compare(s.getKey(a, runHi, key0), s.getKey(a, runHi - 1, key1)) >= 0)
17. runHi++;
18. }
19.
20. return runHi - lo;
21. }
回到TimSort.java的sort方法,我们看一下binarySort的代码:
1. private void binarySort(Buffer a, int lo, inthi, int start, Comparator super K> c) {
2. assert lo <= start && start<= hi;
3. if (start == lo)
4. start++;
5.
6. K key0 = s.newKey();
7. K key1 = s.newKey();
8.
9. Buffer pivotStore = s.allocate(1);
10. for ( ; start < hi; start++) {
11. s.copyElement(a, start, pivotStore, 0);
12. K pivot = s.getKey(pivotStore, 0, key0);
13.
14. // Set left (and right) to the indexwhere a[start] (pivot) belongs
15. int left = lo;
16. int right = start;
17. assert left <= right;
18. /*
19. * Invariants:
20. * pivot >= all in [lo, left).
21. * pivot < all in [right, start).
22. */
23. while (left < right) {
24. int mid = (left + right) >>>1;
25. if (c.compare(pivot, s.getKey(a, mid,key1)) < 0)
26. right = mid;
27. else
28. left = mid + 1;
29. }
30. assert left == right;
31.
32. /*
33. * The invariants still hold: pivot >=all in [lo, left) and
34. * pivot < all in [left, start), sopivot belongs at left. Note
35. * that if there are elements equal topivot, left points to the
36. * first slot after them -- that's whythis sort is stable.
37. * Slide elements over to make room forpivot.
38. */
39. int n = start - left; // The number of elements to move
40. // Switch is just an optimization forarraycopy in default case
41. switch (n) {
42. case 2: s.copyElement(a, left + 1, a, left + 2);
43. case 1: s.copyElement(a, left, a, left + 1);
44. break;
45. default: s.copyRange(a, left, a, left +1, n);
46. }
47. s.copyElement(pivotStore, 0, a, left);
48. }
49. }
回到TimSort.java的sort方法,我们看一下binarySort的代码, 二分法排序将指定数组的指定部分进行插入排序,小数量数据排序的最好情况需要进行O(nlogn)次比较,但最坏情况下需移动 O(n^2)次数据。如果指定范围的初始部分已排序,此方法可以利用它:该方法假定包含索引{@codelo}的元素,包括到{@code start},排除已排序的数据。
l @param a 需进行排序的数组范围
l @param lo 索引中的第一个元素进行排序的范围
l @param hi 索引的最后一个元素之后的范围进行排序
l @param start 索引中的第一个元素的范围是未知排序的 ({@code lo <= start <= hi})
l @param C 比较器用于排序
TimSort.java的binarySort的代码如下:
1. private void binarySort(Buffer a,int lo, int hi, int start, Comparator super K> c) {
2. assert lo <= start&& start <= hi;
3. if (start == lo)
4. start++;
5.
6. K key0 = s.newKey();
7. K key1 = s.newKey();
8.
9. Buffer pivotStore =s.allocate(1);
10. for ( ; start < hi; start++) {
11. s.copyElement(a, start, pivotStore, 0);
12. K pivot = s.getKey(pivotStore, 0, key0);
13.
14. // Set left (and right) to the indexwhere a[start] (pivot) belongs
15. int left = lo;
16. int right = start;
17. assert left <= right;
18. /*
19. *Invariants:
20. * pivot >= all in [lo, left).
21. * pivot < all in [right, start).
22. */
23. while (left < right) {
24. int mid = (left + right) >>>1;
25. if (c.compare(pivot, s.getKey(a, mid,key1)) < 0)
26. right = mid;
27. else
28. left = mid + 1;
29. }
30. assert left == right;
31.
32. /*
33. * The invariants still hold: pivot >=all in [lo, left) and
34. * pivot < all in [left, start), sopivot belongs at left. Note
35. * that if there are elements equal topivot, left points to the
36. * first slot after them -- that's whythis sort is stable.
37. * Slide elements over to make room forpivot.
38. */
39. int n = start - left; // The number of elements to move
40. // Switch is just an optimization forarraycopy in default case
41. switch (n) {
42. case 2: s.copyElement(a, left + 1, a, left + 2);
43. case 1: s.copyElement(a, left, a, left + 1);
44. break;
45. default: s.copyRange(a, left, a, left +1, n);
46. }
47. s.copyElement(pivotStore, 0, a, left);
48. }
49. }
回到TimSort.java的sort方法有一个minRunLength,这个方法得到我们最小run的长度,minRunLength 里面是一个while循环,循环条件是n大于等于MIN_MERGE是32(2的5次方),然后进行基本的移位运算。看一下minRunLength的代码:
1. private int minRunLength(int n) {
2. assert n >= 0;
3. int r = 0; // Becomes 1 if any 1 bits are shiftedoff
4. while (n >= MIN_MERGE) {
5. r |= (n & 1);
6. n >>= 1;
7. }
8. return n + r;
9. }
回到TimSort.java的sort方法,sortState.pushRun(lo,runLen);其中pushRun就是一个栈:
1. private void pushRun(int runBase, intrunLen) {
2. this.runBase[stackSize] =runBase;
3. this.runLen[stackSize] =runLen;
4. stackSize++;
5. }
回到TimSort.java的sort方法,下面有一句很关键的代码sortState.mergeCollapse();我们看一下mergeCollapse的源代码:
1. private void mergeCollapse() {
2. while (stackSize > 1) {
3. int n = stackSize - 2;
4. if ( (n >= 1 &&runLen[n-1] <= runLen[n] + runLen[n+1])
5. || (n >= 2 &&runLen[n-2] <= runLen[n] + runLen[n-1])) {
6. if (runLen[n - 1] 7. n--; 8. } else if (runLen[n] >runLen[n + 1]) { 9. break; // Invariant isestablished 10. } 11. mergeAt(n); 12. } 13. } mergeCollapse据说openJDK在实现mergeCollapse有BUG,在插入数据的时候插入的顺序可能有问题。但Spark进行过充分测试,mergeCollapse没有Bug。其中的关键代码是mergeAt。我们看一下mergeAt的实现,runLen[i] 如果是栈顶的第3个位置将被交换为栈顶的第二个位置。gallopRight从我们的run1找到run2中第一个元素的位置。在此基础上,run1中元素可以被忽略掉,将从run2找到run1中最后一个元素的位置,然后run2的元素被忽略掉。: 1. private void mergeAt(int i) { 2. assert stackSize >= 2; 3. assert i >= 0; 4. assert i == stackSize - 2 ||i == stackSize - 3; 5. 6. int base1 = runBase[i]; 7. int len1 = runLen[i]; 8. int base2 = runBase[i + 1]; 9. int len2 = runLen[i + 1]; 10. assert len1 > 0&& len2 > 0; 11. assert base1 + len1 ==base2; 12. 13. /* 14. * Record the length of thecombined runs; if i is the 3rd-last 15. * run now, also slide overthe last run (which isn't involved 16. * in this merge). The current run (i+1) goes away in any case. 17. */ 18. runLen[i] = len1 + len2; 19. if (i == stackSize - 3) { 20. runBase[i + 1] = runBase[i+ 2]; 21. runLen[i + 1] = runLen[i +2]; 22. } 23. stackSize--; 24. 25. K key0 = s.newKey(); 26. 27. /* 28. * Find where the firstelement of run2 goes in run1. Prior elements 29. * in run1 can be ignored(because they're already in place). 30. */ 31. int k =gallopRight(s.getKey(a, base2, key0), a, base1, len1, 0, c); 32. assert k >= 0; 33. base1 += k; 34. len1 -= k; 35. if (len1 == 0) 36. return; 37. 38. /* 39. * Find where the lastelement of run1 goes in run2. Subsequent elements 40. * in run2 can be ignored(because they're already in place). 41. */ 42. len2 = gallopLeft(s.getKey(a,base1 + len1 - 1, key0), a, base2, len2, len2 - 1, c); 43. assert len2 >= 0; 44. if (len2 == 0) 45. return; 46. 47. // Merge remaining runs,using tmp array with min(len1, len2) elements 48. if (len1 <= len2) 49. mergeLo(base1, len1,base2, len2); 50. else 51. mergeHi(base1, len1,base2, len2); 52. } 其中有一个方法gallopRight,类似于gallopleft,除非包含相等的元素key,gallopRight返回最右边的相等元素的索引。 l @param key 关键的搜索插入点 l @param a 需搜索的数组 l @param base 第一个元素的索引范围 l @param len范围的长度需大于 0 l @param hint 开始搜索的索引,0 <= hint < n.结果越接近hint, 方法运行的越快。 l @param c 用于排序和搜索范围的比较器 l @return k 返回k, 0 <= k <= n 这样a[b + k - 1] <= key < a[b + k] gallopRight的代码如下: 1. private int gallopRight(K key, Buffer a, intbase, int len, int hint, Comparator super K> c) { 2. assert len > 0 &&hint >= 0 && hint < len; 3. 4. int ofs = 1; 5. int lastOfs = 0; 6. K key1 = s.newKey(); 7. 8. if (c.compare(key,s.getKey(a, base + hint, key1)) < 0) { 9. // Gallop left untila[b+hint - ofs] <= key < a[b+hint - lastOfs] 10. int maxOfs = hint + 1; 11. while (ofs < maxOfs&& c.compare(key, s.getKey(a, base + hint - ofs, key1)) < 0) { 12. lastOfs = ofs; 13. ofs = (ofs << 1) + 1; 14. if (ofs <= 0) // int overflow 15. ofs = maxOfs; 16. } 17. if (ofs > maxOfs) 18. ofs = maxOfs; 19. 20. // Make offsets relativeto b 21. int tmp = lastOfs; 22. lastOfs = hint - ofs; 23. ofs = hint - tmp; 24. } else { // a[b + hint]<= key 25. // Gallop right untila[b+hint + lastOfs] <= key < a[b+hint + ofs] 26. int maxOfs = len - hint; 27. while (ofs < maxOfs&& c.compare(key, s.getKey(a, base + hint + ofs, key1)) >= 0) { 28. lastOfs = ofs; 29. ofs = (ofs << 1) +1; 30. if (ofs <= 0) // int overflow 31. ofs = maxOfs; 32. } 33. if (ofs > maxOfs) 34. ofs = maxOfs; 35. 36. // Make offsets relativeto b 37. lastOfs += hint; 38. ofs += hint; 39. } 40. assert -1 <= lastOfs&& lastOfs < ofs && ofs <= len; 41. 42. /* 43. * Now a[b + lastOfs] <=key < a[b + ofs], so key belongs somewhere to 44. * the right of lastOfs butno farther right than ofs. Do a binary 45. * search, with invarianta[b + lastOfs - 1] <= key < a[b + ofs]. 46. */ 47. lastOfs++; 48. while (lastOfs < ofs) { 49. int m = lastOfs + ((ofs -lastOfs) >>> 1); 50. 51. if (c.compare(key,s.getKey(a, base + m, key1)) < 0) 52. ofs = m; // key < a[b + m] 53. else 54. lastOfs = m + 1; // a[b + m] <= key 55. } 56. assert lastOfs == ofs; // so a[b + ofs - 1] <= key < a[b +ofs] 57. return ofs; 58. } 在前面的的代码中还有一个gallopLeft,定位将指定key插入到指定的排序范围;如果该范围包含等于key的元素,返回最左边的相等的元素的索引。 l @param key 关键的搜索插入点 l @param a 搜索的数组 l @param base 第一个元素的索引范围 l @param len 范围的长度需大于 0 l @param hint 开始搜索的索引,0 <= hint < n.结果越接近hint, 方法运行的越快。 l @param c 用于排序和搜索范围的比较器 l @return 返回 k, 0 <= k <= n 这样a[b + k - 1] < key <= a[b + k], 假设a[b - 1]是负无穷大和 a[b + n]是无穷大。关键属于索引b + k,换句话说,a的第一个k元素应先于key,最后 n - k元素应该排在其之后。 gallopLeft代码如下: 1. private int gallopLeft(K key, Buffer a, intbase, int len, int hint, Comparator super K> c) { 2. assert len > 0 &&hint >= 0 && hint < len; 3. int lastOfs = 0; 4. int ofs = 1; 5. K key0 = s.newKey(); 6. 7. if (c.compare(key,s.getKey(a, base + hint, key0)) > 0) { 8. // Gallop right untila[base+hint+lastOfs] < key <= a[base+hint+ofs] 9. int maxOfs = len - hint; 10. while (ofs < maxOfs&& c.compare(key, s.getKey(a, base + hint + ofs, key0)) > 0) { 11. lastOfs = ofs; 12. ofs = (ofs << 1) + 1; 13. if (ofs <= 0) // int overflow 14. ofs = maxOfs; 15. } 16. if (ofs > maxOfs) 17. ofs = maxOfs; 18. 19. // Make offsets relativeto base 20. lastOfs += hint; 21. ofs += hint; 22. } else { // key <= a[base+ hint] 23. // Gallop left untila[base+hint-ofs] < key <= a[base+hint-lastOfs] 24. final int maxOfs = hint +1; 25. while (ofs < maxOfs&& c.compare(key, s.getKey(a, base + hint - ofs, key0)) <= 0) { 26. lastOfs = ofs; 27. ofs = (ofs << 1) +1; 28. if (ofs <= 0) // int overflow 29. ofs = maxOfs; 30. } 31. if (ofs > maxOfs) 32. ofs = maxOfs; 33. 34. // Make offsets relativeto base 35. int tmp = lastOfs; 36. lastOfs = hint - ofs; 37. ofs = hint - tmp; 38. } 39. assert -1 <= lastOfs&& lastOfs < ofs && ofs <= len; 40. 41. /* 42. * Now a[base+lastOfs] 43. * to the right of lastOfsbut no farther right than ofs. Do a binary 44. * search, with invarianta[base + lastOfs - 1] < key <= a[base + ofs]. 45. */ 46. lastOfs++; 47. while (lastOfs < ofs) { 48. int m = lastOfs + ((ofs -lastOfs) >>> 1); 49. 50. if (c.compare(key,s.getKey(a, base + m, key0)) > 0) 51. lastOfs = m + 1; // a[base + m] < key 52. else 53. ofs = m; // key <= a[base + m] 54. } 55. assert lastOfs == ofs; // so a[base + ofs - 1] < key <=a[base + ofs] 56. return ofs; 57. } 回到TimSort.java的mergeAt方法,下面看一个关键代码mergeLo(base1, len1, base2,len2); mergeLo方法以稳定的方式合并两个相邻run。 第一个run的第一个的元素必须大于第二个run的第一个元素 (a[base1] > a[base2]),第一个run的最后一个元素 (a[base1 + len1-1])必须大于第二个run的所有的元素。 这种方法只有当len1 <= len2的时候被调用;另一个类似方法mergeHi在len1 > = len2的情况下被调用。(如果 len1 == len2,任何一种方法可被调用) l @param base1 第一个run的第一个元素被合并的索引 l @param len1 第一个run被合并的长度(必须大于0) l @param base2 第二个run被合并的第一个元素的索引(必须是 aBase + aLen) l @param len2 第二个run被合并的长度 (必须大于0) mergeLo代码如下: 1. private void mergeLo(int base1, int len1, intbase2, int len2) { 2. assert len1 > 0&& len2 > 0 && base1 + len1 == base2; 3. 4. // Copy first run into temparray 5. Buffer a = this.a; // Forperformance 6. Buffer tmp = ensureCapacity(len1); 7. s.copyRange(a, base1, tmp,0, len1); 8. 9. int cursor1 = 0; // Indexes into tmp array 10. int cursor2 = base2; // Indexes int a 11. int dest = base1; // Indexes int a 12. 13. // Move first element ofsecond run and deal with degenerate cases 14. s.copyElement(a, cursor2++,a, dest++); 15. if (--len2 == 0) { 16. s.copyRange(tmp, cursor1,a, dest, len1); 17. return; 18. } 19. if (len1 == 1) { 20. s.copyRange(a, cursor2, a,dest, len2); 21. s.copyElement(tmp,cursor1, a, dest + len2); // Last elt of run 1 to end of merge 22. return; 23. } 24. 25. K key0 = s.newKey(); 26. K key1 = s.newKey(); 27. 28. Comparator super K>c = this.c; // Use local variable forperformance 29. int minGallop =this.minGallop; // " " " " " 30. outer: 31. while (true) { 32. int count1 = 0; // Numberof times in a row that first run won 33. int count2 = 0; // Numberof times in a row that second run won 34. 35. /* 36. * Do the straightforwardthing until (if ever) one run starts 37. * winning consistently. 38. */ 39. do { 40. assert len1 > 1&& len2 > 0; 41. if(c.compare(s.getKey(a, cursor2, key0), s.getKey(tmp, cursor1, key1)) < 0) { 42. s.copyElement(a,cursor2++, a, dest++); 43. count2++; 44. count1 = 0; 45. if (--len2 == 0) 46. break outer; 47. } else { 48. s.copyElement(tmp,cursor1++, a, dest++); 49. count1++; 50. count2 = 0; 51. if (--len1 == 1) 52. break outer; 53. } 54. } while ((count1 | count2)< minGallop); 55. 56. /* 57. * One run is winning soconsistently that galloping may be a 58. * huge win. So try that,and continue galloping until (if ever) 59. * neither run appears tobe winning consistently anymore. 60. */ 61. do { 62. assert len1 > 1&& len2 > 0; 63. count1 = gallopRight(s.getKey(a,cursor2, key0), tmp, cursor1, len1, 0, c); 64. if (count1 != 0) { 65. s.copyRange(tmp,cursor1, a, dest, count1); 66. dest += count1; 67. cursor1 += count1; 68. len1 -= count1; 69. if (len1 <= 1) //len1 == 1 || len1 == 0 70. break outer; 71. } 72. s.copyElement(a,cursor2++, a, dest++); 73. if (--len2 == 0) 74. break outer; 75. 76. count2 =gallopLeft(s.getKey(tmp, cursor1, key0), a, cursor2, len2, 0, c); 77. if (count2 != 0) { 78. s.copyRange(a,cursor2, a, dest, count2); 79. dest += count2; 80. cursor2 += count2; 81. len2 -= count2; 82. if (len2 == 0) 83. break outer; 84. } 85. s.copyElement(tmp,cursor1++, a, dest++); 86. if (--len1 == 1) 87. break outer; 88. minGallop--; 89. } while (count1 >=MIN_GALLOP | count2 >= MIN_GALLOP); 90. if (minGallop < 0) 91. minGallop = 0; 92. minGallop += 2; // Penalize for leaving gallop mode 93. } // End of "outer" loop 94. this.minGallop = minGallop< 1 ? 1 : minGallop; // Write back tofield 95. 96. if (len1 == 1) { 97. assert len2 > 0; 98. s.copyRange(a, cursor2, a,dest, len2); 99. s.copyElement(tmp,cursor1, a, dest + len2); // Last elt ofrun 1 to end of merge 100. } else if (len1 == 0) { 101. throw newIllegalArgumentException( 102. "Comparisonmethod violates its general contract!"); 103. } else { 104. assert len2 == 0; 105. assert len1 > 1; 106. s.copyRange(tmp, cursor1,a, dest, len1); 107. } 108. } 回到TimSort.java的sort方法,还有一行关键代码sortState.mergeForceCollapse();合并堆栈上所有的run,直到只有一个run。这种方法是调用一次,完成排序。 mergeForceCollapse方法如下: 1. private void mergeForceCollapse() { 2. while (stackSize > 1) { 3. int n = stackSize - 2; 4. if (n > 0 &&runLen[n - 1] < runLen[n + 1]) 5. n--; 6. mergeAt(n); 7. } 8. } 总结: TimSort会预先按连续递增的run的片段归并元素,进入插入的时候,如果长度小于run,就会使用insert进行排序的实现。与mergesort相比,mergesort归并是预先定义好的,而TimSort比较灵活。如果第三个run小于栈顶的run,那先归并第2个第3个run,从而得出需要归并的片段;如果run1的头部和run2的尾部有不用进行归并的部分,TimSort在截取的基础上进行二分排序,得到需要归并的起始位置。如果run的长度为1,会进行一些优化。 这节课有点难度,同学们了解Timsort即可。