timsort

Intro ----- This describes an adaptive, stable, natural mergesort, modestly called timsort (hey, I earned it <wink>). It has supernatural performance on many kinds of partially ordered arrays (less than lg(N!) comparisons needed, and as few as N-1), yet as fast as Python's previous highly tuned samplesort hybrid on random arrays. In a nutshell, the main routine marches over the array once, left to right, alternately identifying the next run, then merging it into the previous runs "intelligently". Everything else is complication for speed, and some hard-won measure of memory efficiency. Comparison with Python's Samplesort Hybrid ------------------------------------------ + timsort can require a temp array containing as many as N//2 pointers, which means as many as 2*N extra bytes on 32-bit boxes. It can be expected to require a temp array this large when sorting random data; on data with significant structure, it may get away without using any extra heap memory. This appears to be the strongest argument against it, but compared to the size of an object, 2 temp bytes worst-case (also expected- case for random data) doesn't scare me much. It turns out that Perl is moving to a stable mergesort, and the code for that appears always to require a temp array with room for at least N pointers. (Note that I wouldn't want to do that even if space weren't an issue; I believe its efforts at memory frugality also save timsort significant pointer-copying costs, and allow it to have a smaller working set.) + Across about four hours of generating random arrays, and sorting them under both methods, samplesort required about 1.5% more comparisons (the program is at the end of this file). + In real life, this may be faster or slower on random arrays than samplesort was, depending on platform quirks. Since it does fewer comparisons on average, it can be expected to do better the more expensive a comparison function is. OTOH, it does more data movement (pointer copying) than samplesort, and that may negate its small comparison advantage (depending on platform quirks) unless comparison is very expensive. + On arrays with many kinds of pre-existing order, this blows samplesort out of the water. It's significantly faster than samplesort even on some cases samplesort was special-casing the snot out of. I believe that lists very often do have exploitable partial order in real life, and this is the strongest argument in favor of timsort (indeed, samplesort's special cases for extreme partial order are appreciated by real users, and timsort goes much deeper than those, in particular naturally covering every case where someone has suggested "and it would be cool if list.sort() had a special case for this too ... and for that ..."). + Here are exact comparison counts across all the tests in sortperf.py, when run with arguments "15 20 1". First the trivial cases, trivial for samplesort because it special-cased them, and trivial for timsort because it naturally works on runs. Within an "n" block, the first line gives the # of compares done by samplesort, the second line by timsort, and the third line is the percentage by which the samplesort count exceeds the timsort count: n /sort /sort =sort ------- ------ ------ ------ 32768 32768 32767 32767 samplesort 32767 32767 32767 timsort 0.00% 0.00% 0.00% (samplesort - timsort) / timsort 65536 65536 65535 65535 65535 65535 65535 0.00% 0.00% 0.00% 131072 131072 131071 131071 131071 131071 131071 0.00% 0.00% 0.00% 262144 262144 262143 262143 262143 262143 262143 0.00% 0.00% 0.00% 524288 524288 524287 524287 524287 524287 524287 0.00% 0.00% 0.00% 1048576 1048576 1048575 1048575 1048575 1048575 1048575 0.00% 0.00% 0.00% The algorithms are effectively identical in these cases, except that timsort does one less compare in /sort. Now for the more interesting cases. lg(n!) is the information-theoretic limit for the best any comparison-based sorting algorithm can do on average (across all permutations). When a method gets significantly below that, it's either astronomically lucky, or is finding exploitable structure in the data. n lg(n!) *sort 3sort +sort ~sort !sort ------- ------- ------ -------- ------- ------- -------- 32768 444255 453084 453604 32908 130484 469132 samplesort 449235 33019 33016 188720 65534 timsort 0.86% 1273.77% -0.33% -30.86% 615.86% %ch from timsort 65536 954037 973111 970464 65686 260019 1004597 963924 65767 65802 377634 131070 0.95% 1375.61% -0.18% -31.15% 666.46% 131072 2039137 2100019 2102708 131232 555035 2161268 2058863 131422 131363 755476 262142 2.00% 1499.97% -0.10% -26.53% 724.46% 262144 4340409 4461471 4442796 262314 1107826 4584316 4380148 262446 262466 1511174 524286 1.86% 1592.84% -0.06% -26.69% 774.39% 524288 9205096 9448146 9368681 524468 2218562 9691553 9285454 524576 524626 3022584 1048574 1.75% 1685.95% -0.03% -26.60% 824.26% 1048576 19458756 19950541 20307955 1048766 4430616 20433371 19621100 1048854 1048933 6045418 2097150 1.68% 1836.20% -0.02% -26.71% 874.34% Discussion of cases: *sort: There's no structure in random data to exploit, so the theoretical limit is lg(n!). Both methods get close to that, and timsort is hugging it (indeed, in a *marginal* sense, it's a spectacular improvement -- there's only about 1% left before hitting the wall, and timsort knows darned well it's doing compares that won't pay on random data -- but so does the samplesort hybrid). For contrast, Hoare's original random-pivot quicksort does about 39% more compares than the limit, and the median-of-3 variant about 19% more. 3sort and !sort: No contest; there's structure in this data, but not of the specific kinds samplesort special-cases. Note that structure in !sort wasn't put there on purpose -- it was crafted as a worst case for a previous quicksort implementation. That timsort nails it came as a surprise to me (although it's obvious in retrospect). +sort: samplesort special-cases this data, and does a few less compares than timsort. However, timsort runs this case significantly faster on all boxes we have timings for, because timsort is in the business of merging runs efficiently, while samplesort does much more data movement in this (for it) special case. ~sort: samplesort's special cases for large masses of equal elements are extremely effective on ~sort's specific data pattern, and timsort just isn't going to get close to that, despite that it's clearly getting a great deal of benefit out of the duplicates (the # of compares is much less than lg(n!)). ~sort has a perfectly uniform distribution of just 4 distinct values, and as the distribution gets more skewed, samplesort's equal-element gimmicks become less effective, while timsort's adaptive strategies find more to exploit; in a database supplied by Kevin Altis, a sort on its highly skewed "on which stock exchange does this company's stock trade?" field ran over twice as fast under timsort. However, despite that timsort does many more comparisons on ~sort, and that on several platforms ~sort runs highly significantly slower under timsort, on other platforms ~sort runs highly significantly faster under timsort. No other kind of data has shown this wild x-platform behavior, and we don't have an explanation for it. The only thing I can think of that could transform what "should be" highly significant slowdowns into highly significant speedups on some boxes are catastrophic cache effects in samplesort. But timsort "should be" slower than samplesort on ~sort, so it's hard to count that it isn't on some boxes as a strike against it <wink>. A detailed description of timsort follows. Runs ---- count_run() returns the # of elements in the next run. A run is either "ascending", which means non-decreasing: a0 <= a1 <= a2 <= ... or "descending", which means strictly decreasing: a0 > a1 > a2 > ... Note that a run is always at least 2 long, unless we start at the array's last element. The definition of descending is strict, because the main routine reverses a descending run in-place, transforming a descending run into an ascending run. Reversal is done via the obvious fast "swap elements starting at each end, and converge at the middle" method, and that can violate stability if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements. If an array is random, it's very unlikely we'll see long runs. If a natural run contains less than minrun elements (see next secion), the main loop artificially boosts it to minrun elements, via a stable binary insertion sort applied to the right number of array elements following the short natural run. In a random array, *all* runs are likely to be minrun long as a result. This has two primary good effects: 1. Random data strongly tends then toward perfectly balanced (both runs have the same length) merges, which is the most efficient way to proceed when data is random. 2. Because runs are never very short, the rest of the code doesn't make heroic efforts to shave a few cycles off per-merge overheads. For example, reasonable use of function calls is made, rather than trying to inline everything. Since there are no more than N/minrun runs to begin with, a few "extra" function calls per merge is barely measurable. Computing minrun ---------------- If N < 64, minrun is N. IOW, binary insertion sort is used for the whole array then; it's hard to beat that given the overheads of trying something fancier. When N is a power of 2, testing on random data showed that minrun values of 16, 32, 64 and 128 worked about equally well. At 256 the data-movement cost in binary insertion sort clearly hurt, and at 8 the increase in the number of function calls clearly hurt. Picking *some* power of 2 is important here, so that the merges end up perfectly balanced (see next section). We pick 32 as a good value in the sweet range; picking a value at the low end allows the adaptive gimmicks more opportunity to exploit shorter natural runs. Because sortperf.py only tries powers of 2, it took a long time to notice that 32 isn't a good choice for the general case! Consider N=2112: >>> divmod(2112, 32) (66, 0) >>> If the data is randomly ordered, we're very likely to end up with 66 runs each of length 32. The first 64 of these trigger a sequence of perfectly balanced merges (see next section), leaving runs of lengths 2048 and 64 to merge at the end. The adaptive gimmicks can do that with fewer than 2048+64 compares, but it's still more compares than necessary, and-- mergesort's bugaboo relative to samplesort --a lot more data movement (O(N) copies just to get 64 elements into place). If we take minrun=33 in this case, then we're very likely to end up with 64 runs each of length 33, and then all merges are perfectly balanced. Better! What we want to avoid is picking minrun such that in q, r = divmod(N, minrun) q is a power of 2 and r>0 (then the last merge only gets r elements into place, and r<minrun is small compared to N), or r=0 and q a little larger than a power of 2 (then we've got a case similar to "2112", again leaving too little work for the last merge to do). Instead we pick a minrun in range(32, 65) such that N/minrun is exactly a power of 2, or if that isn't possible, is close to, but strictly less than, a power of 2. This is easier to do than it may sound: take the first 6 bits of N, and add 1 if any of the remaining bits are set. In fact, that rule covers every case in this section, including small N and exact powers of 2; merge_compute_minrun() is a deceptively simple function. The Merge Pattern ----------------- In order to exploit regularities in the data, we're merging on natural run lengths, and they can become wildly unbalanced. That's a Good Thing for this sort! It means we have to find a way to manage an assortment of potentially very different run lengths, though. Stability constrains permissible merging patterns. For example, if we have 3 consecutive runs of lengths A:10000 B:20000 C:10000 we dare not merge A with C first, because if A, B and C happen to contain a common element, it would get out of order wrt its occurence(s) in B. The merging must be done as (A+B)+C or A+(B+C) instead. So merging is always done on two consecutive runs at a time, and in-place, although this may require some temp memory (more on that later). When a run is identified, its base address and length are pushed on a stack in the MergeState struct. merge_collapse() is then called to see whether it should merge it with preceding run(s). We would like to delay merging as long as possible in order to exploit patterns that may come up later, but we like even more to do merging as soon as possible to exploit that the run just found is still high in the memory hierarchy. We also can't delay merging "too long" because it consumes memory to remember the runs that are still unmerged, and the stack has a fixed size. What turned out to be a good compromise maintains two invariants on the stack entries, where A, B and C are the lengths of the three righmost not-yet merged slices: 1. A > B+C 2. B > C Note that, by induction, #2 implies the lengths of pending runs form a decreasing sequence. #1 implies that, reading the lengths right to left, the pending-run lengths grow at least as fast as the Fibonacci numbers. Therefore the stack can never grow larger than about log_base_phi(N) entries, where phi = (1+sqrt(5))/2 ~= 1.618. Thus a small # of stack slots suffice for very large arrays. If A <= B+C, the smaller of A and C is merged with B (ties favor C, for the freshness-in-cache reason), and the new run replaces the A,B or B,C entries; e.g., if the last 3 entries are A:30 B:20 C:10 then B is merged with C, leaving A:30 BC:30 on the stack. Or if they were A:500 B:400: C:1000 then A is merged with B, leaving AB:900 C:1000 on the stack. In both examples, the stack configuration after the merge still violates invariant #2, and merge_collapse() goes on to continue merging runs until both invariants are satisfied. As an extreme case, suppose we didn't do the minrun gimmick, and natural runs were of lengths 128, 64, 32, 16, 8, 4, 2, and 2. Nothing would get merged until the final 2 was seen, and that would trigger 7 perfectly balanced merges. The thrust of these rules when they trigger merging is to balance the run lengths as closely as possible, while keeping a low bound on the number of runs we have to remember. This is maximally effective for random data, where all runs are likely to be of (artificially forced) length minrun, and then we get a sequence of perfectly balanced merges (with, perhaps, some oddballs at the end). OTOH, one reason this sort is so good for partly ordered data has to do with wildly unbalanced run lengths. Merge Memory ------------ Merging adjacent runs of lengths A and B in-place is very difficult. Theoretical constructions are known that can do it, but they're too difficult and slow for practical use. But if we have temp memory equal to min(A, B), it's easy. If A is smaller (function merge_lo), copy A to a temp array, leave B alone, and then we can do the obvious merge algorithm left to right, from the temp area and B, starting the stores into where A used to live. There's always a free area in the original area comprising a number of elements equal to the number not yet merged from the temp array (trivially true at the start; proceed by induction). The only tricky bit is that if a comparison raises an exception, we have to remember to copy the remaining elements back in from the temp area, lest the array end up with duplicate entries from B. But that's exactly the same thing we need to do if we reach the end of B first, so the exit code is pleasantly common to both the normal and error cases. If B is smaller (function merge_hi, which is merge_lo's "mirror image"), much the same, except that we need to merge right to left, copying B into a temp array and starting the stores at the right end of where B used to live. A refinement: When we're about to merge adjacent runs A and B, we first do a form of binary search (more on that later) to see where B[0] should end up in A. Elements in A preceding that point are already in their final positions, effectively shrinking the size of A. Likewise we also search to see where A[-1] should end up in B, and elements of B after that point can also be ignored. This cuts the amount of temp memory needed by the same amount. These preliminary searches may not pay off, and can be expected *not* to repay their cost if the data is random. But they can win huge in all of time, copying, and memory savings when they do pay, so this is one of the "per-merge overheads" mentioned above that we're happy to endure because there is at most one very short run. It's generally true in this algorithm that we're willing to gamble a little to win a lot, even though the net expectation is negative for random data. Merge Algorithms ---------------- merge_lo() and merge_hi() are where the bulk of the time is spent. merge_lo deals with runs where A <= B, and merge_hi where A > B. They don't know whether the data is clustered or uniform, but a lovely thing about merging is that many kinds of clustering "reveal themselves" by how many times in a row the winning merge element comes from the same run. We'll only discuss merge_lo here; merge_hi is exactly analogous. Merging begins in the usual, obvious way, comparing the first element of A to the first of B, and moving B[0] to the merge area if it's less than A[0], else moving A[0] to the merge area. Call that the "one pair at a time" mode. The only twist here is keeping track of how many times in a row "the winner" comes from the same run. If that count reaches MIN_GALLOP, we switch to "galloping mode". Here we *search* B for where A[0] belongs, and move over all the B's before that point in one chunk to the merge area, then move A[0] to the merge area. Then we search A for where B[0] belongs, and similarly move a slice of A in one chunk. Then back to searching B for where A[0] belongs, etc. We stay in galloping mode until both searches find slices to copy less than MIN_GALLOP elements long, at which point we go back to one-pair- at-a-time mode. Galloping --------- Still without loss of generality, assume A is the shorter run. In galloping mode, we first look for A[0] in B. We do this via "galloping", comparing A[0] in turn to B[0], B[1], B[3], B[7], ..., B[2**j - 1], ..., until finding the k such that B[2**(k-1) - 1] < A[0] <= B[2**k - 1]. This takes at most roughly lg(B) comparisons, and, unlike a straight binary search, favors finding the right spot early in B (more on that later). After finding such a k, the region of uncertainty is reduced to 2**(k-1) - 1 consecutive elements, and a straight binary search requires exactly k-1 additional comparisons to nail it. Then we copy all the B's up to that point in one chunk, and then copy A[0]. Note that no matter where A[0] belongs in B, the combination of galloping + binary search finds it in no more than about 2*lg(B) comparisons. If we did a straight binary search, we could find it in no more than ceiling(lg(B+1)) comparisons -- but straight binary search takes that many comparisons no matter where A[0] belongs. Straight binary search thus loses to galloping unless the run is quite long, and we simply can't guess whether it is in advance. If data is random and runs have the same length, A[0] belongs at B[0] half the time, at B[1] a quarter of the time, and so on: a consecutive winning sub-run in B of length k occurs with probability 1/2**(k+1). So long winning sub-runs are extremely unlikely in random data, and guessing that a winning sub-run is going to be long is a dangerous game. OTOH, if data is lopsided or lumpy or contains many duplicates, long stretches of winning sub-runs are very likely, and cutting the number of comparisons needed to find one from O(B) to O(log B) is a huge win. Galloping compromises by getting out fast if there isn't a long winning sub-run, yet finding such very efficiently when they exist. I first learned about the galloping strategy in a related context; see: "Adaptive Set Intersections, Unions, and Differences" (2000) Erik D. Demaine, Alejandro L髉ez-Ortiz, J. Ian Munro and its followup(s). An earlier paper called the same strategy "exponential search": "Optimistic Sorting and Information Theoretic Complexity" Peter McIlroy SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp 467-474, Austin, Texas, 25-27 January 1993. and it probably dates back to an earlier paper by Bentley and Yao. The McIlory paper in particular has good analysis of a mergesort that's probably strongly related to this one in its galloping strategy. Galloping with a Broken Leg --------------------------- So why don't we always gallop? Because it can lose, on two counts: 1. While we're willing to endure small per-run overheads, per-comparison overheads are a different story. Calling Yet Another Function per comparison is expensive, and gallop_left() and gallop_right() are too long-winded for sane inlining. 2. Ignoring function-call overhead, galloping can-- alas --require more comparisons than linear one-at-time search, depending on the data. #2 requires details. If A[0] belongs before B[0], galloping requires 1 compare to determine that, same as linear search, except it costs more to call the gallop function. If A[0] belongs right before B[1], galloping requires 2 compares, again same as linear search. On the third compare, galloping checks A[0] against B[3], and if it's <=, requires one more compare to determine whether A[0] belongs at B[2] or B[3]. That's a total of 4 compares, but if A[0] does belong at B[2], linear search would have discovered that in only 3 compares, and that's a huge loss! Really. It's an increase of 33% in the number of compares needed, and comparisons are expensive in Python. index in B where # compares linear # gallop # binary gallop A[0] belongs search needs compares compares total ---------------- ----------------- -------- -------- ------ 0 1 1 0 1 1 2 2 0 2 2 3 3 1 4 3 4 3 1 4 4 5 4 2 6 5 6 4 2 6 6 7 4 2 6 7 8 4 2 6 8 9 5 3 8 9 10 5 3 8 10 11 5 3 8 11 12 5 3 8 ... In general, if A[0] belongs at B[i], linear search requires i+1 comparisons to determine that, and galloping a total of 2*floor(lg(i))+2 comparisons. The advantage of galloping is unbounded as i grows, but it doesn't win at all until i=6. Before then, it loses twice (at i=2 and i=4), and ties at the other values. At and after i=6, galloping always wins. We can't guess in advance when it's going to win, though, so we do one pair at a time until the evidence seems strong that galloping may pay. MIN_GALLOP is 8 as I type this, and that's pretty strong evidence. However, if the data is random, it simply will trigger galloping mode purely by luck every now and again, and it's quite likely to hit one of the losing cases next. 8 favors protecting against a slowdown on random data at the expense of giving up small wins on lightly clustered data, and tiny marginal wins on highly clustered data (they win huge anyway, and if you're getting a factor of 10 speedup, another percent just isn't worth fighting for). Galloping Complication ---------------------- The description above was for merge_lo. merge_hi has to merge "from the other end", and really needs to gallop starting at the last element in a run instead of the first. Galloping from the first still works, but does more comparisons than it should (this is significant -- I timed it both ways). For this reason, the gallop_left() and gallop_right() functions have a "hint" argument, which is the index at which galloping should begin. So galloping can actually start at any index, and proceed at offsets of 1, 3, 7, 15, ... or -1, -3, -7, -15, ... from the starting index. In the code as I type it's always called with either 0 or n-1 (where n is the # of elements in a run). It's tempting to try to do something fancier, melding galloping with some form of interpolation search; for example, if we're merging a run of length 1 with a run of length 10000, index 5000 is probably a better guess at the final result than either 0 or 9999. But it's unclear how to generalize that intuition usefully, and merging of wildly unbalanced runs already enjoys excellent performance. Comparing Average # of Compares on Random Arrays ------------------------------------------------ Here list.sort() is samplesort, and list.msort() this sort: """ import random from time import clock as now def fill(n): from random import random return [random() for i in xrange(n)] def mycmp(x, y): global ncmp ncmp += 1 return cmp(x, y) def timeit(values, method): global ncmp X = values[:] bound = getattr(X, method) ncmp = 0 t1 = now() bound(mycmp) t2 = now() return t2-t1, ncmp format = "%5s %9.2f %11d" f2 = "%5s %9.2f %11.2f" def drive(): count = sst = sscmp = mst = mscmp = nelts = 0 while True: n = random.randrange(100000) nelts += n x = fill(n) t, c = timeit(x, 'sort') sst += t sscmp += c t, c = timeit(x, 'msort') mst += t mscmp += c count += 1 if count % 10: continue print "count", count, "nelts", nelts print format % ("sort", sst, sscmp) print format % ("msort", mst, mscmp) print f2 % ("", (sst-mst)*1e2/mst, (sscmp-mscmp)*1e2/mscmp) drive() """ I ran this on Windows and kept using the computer lightly while it was running. time.clock() is wall-clock time on Windows, with better than microsecond resolution. samplesort started with a 1.52% #-of-comparisons disadvantage, fell quickly to 1.48%, and then fluctuated within that small range. Here's the last chunk of output before I killed the job: count 2630 nelts 130906543 sort 6110.80 1937887573 msort 6002.78 1909389381 1.80 1.49 We've done nearly 2 billion comparisons apiece at Python speed there, and that's enough <wink>. For random arrays of size 2 (yes, there are only 2 interesing ones), samplesort has a 50%(!) comparison disadvantage. This is a consequence of samplesort special-casing at most one ascending run at the start, then falling back to the general case if it doesn't find an ascending run immediately. The consequence is that it ends up using two compares to sort [2, 1]. Gratifyingly, timsort doesn't do any special-casing, so had to be taught how to deal with mixtures of ascending and descending runs efficiently in all cases.

Java算法之TimSort 持续输出... #Java 算法算法 java 排序算法
TimSort简介TimSort是一种高效的排序算法，由TimPeters于2002年设计，主要特点是结合了归并排序（MergeSort）和插入排序（InsertionSort）的优点。这种算法在很多编程语言的默认排序函数中得到应用，如Python的sort()和Java的Arrays.sort()。算法原理TimSort的工作原理如下：分解：将待排序数组分解为小的有序序列，每个序列长度为minr
java timsort_简易版的TimSort排序算法真实故事计划 java timsort
欢迎探讨，如有错误敬请指正1.简易版本TimSort排序算法原理与实现TimSort排序算法是Python和Java针对对象数组的默认排序算法。TimSort排序算法的本质是归并排序算法，只是在归并排序算法上进行了大量的优化。对于日常生活中我们需要排序的数据通常不是完全随机的，而是部分有序的，或者部分逆序的，所以TimSort充分利用已有序的部分进行归并排序。现在我们提供一个简易版本TimSort
timsort java_Java TimSort算法源码笔记汪汪汪汪妄想症 timsort java
本来准备看Java容器源码的。但是看到一开始发现Arrays这个类我不是很熟，就顺便把Arrays这个类给看了。Arrays类没有什么架构与难点，但Arrays涉及到的两个排序算法似乎很有意思。那顺便把TimSort算法和双指针快速排序也研究一下吧。首先强调一下，这是个稳定的排序算法看过代码之后觉得这个算法没有想象的那么难。逻辑很清晰，整个算法最大的特点就是充分利用数组中已经存在顺序。在归并的过程
java sort 面试题目 youyouxiong 排序算法算法
Java排序是面试中经常出现的主题，因为它不仅涉及Java集合框架中的排序方法，还涉及到基本的排序算法和性能优化。以下是一些关于Java排序的面试题目：解释Java中的Collections.sort()方法是如何工作的？Collections.sort()方法用于对List进行排序。它使用了TimSort算法，这是一种基于合并排序和插入排序的混合体，旨在提供最佳的性能。Java中的Arrays.
排序算法（4）漂流小王子
姗姗来迟的排序算法的第四篇，本介绍归并排序算法，是不是有人会问这样的问题，现在书本上学习到的排序算法都太经典了，在实际生产环境中基本上不会直接拿来使用，如果你的上司让你实现一个归并或者快排在生成环境中使用，那他一定是疯了，基于此，我介绍一种在归并排序算法基础上改进而来的Timsort算法，后者是在实际排序中经常用到的排序算法，与之详情，请往下看。归并排序归并排序的核心思想就是，将一个排序数组不断的
java面试题及答案2020最新版牛课科技
java面试题及答案2020最新版java基础以及多个“比较”1.Collections.sort排序内部原理在Java6中Arrays.sort()和Collections.sort()使用的是MergeSort，而在Java7中，内部实现换成了TimSort，其对对象间比较的实现要求更加严格2.hashMap原理，java8做的改变从结构实现来讲，HashMap是数组+链表+红黑树（JDK1.
Timsort：最快排序算法极道Jdon javascript reactjs
Timsort（泰姆排序）是一种混合排序算法，结合了合并排序（MergeSort）和插入排序（InsertionSort）的特性。它由TimPeters在2002年为Python的排序算法而设计，并在Python2.3版本中首次实现。TimSort是Python的sorted()和list.sort()函数使用的默认排序算法。自从该算法被发明以来，它已被用作Python、Java、Android平
Java自定义排序异常：Comparison method violates its general contract 啥也不知道，啥也不敢说 java java 开发语言后端
java.lang.IllegalArgumentException:Comparisonmethodviolatesitsgeneralcontract!atline781,java.base/java.util.TimSort.mergeLoatline518,java.base/java.util.TimSort.mergeAtatline448,java.base/java.util.Ti
最快的排序算法TimSort还能更快吗 pro_or_check 喜欢幻想的我算法
关于TimSort排序算法，请看这篇：另一位博主的博客本文主要讨论让TimSort更快的方法。已经产生了许多run，它们的长度是：46257用类似于霍夫曼编码的方法，找出最小的两项，相加。这里是42，他们俩相加得6，现在的数据是：6657继续选最小的两个相加，是65，得到6117继续，1311最后，24解释一下，将长度为4和6的两个run，进行归并排序，需要的时间约是4+6。采用霍夫曼编码的方式，
Python sort原理 wq_0708 Python 排序算法算法
引言sort内部实现：Timesort最坏时间复杂度：O(nlogn)O(nlogn)O(nlogn)空间复杂度：O(n)O(n)O(n)内部实现原理的回答pythonsort函数采用的排序算法_知乎：其中一个回答提到了python中的sorted排序内部实现是timsort，并没有说sort。python的sorted排序分析_Github：同样只提到了python中的sorted排序内部实现是
Exception in thread “main“ java.lang.IllegalArgumentException:解决方案猛浩异常记录及解决方法 Java java 开发语言后端
昨天遇到一个很奇怪的异常，异常信息如下：Exceptioninthread"main"java.lang.IllegalArgumentException:Comparisonmethodviolatesitsgeneralcontract!atjava.util.TimSort.mergeHi(TimSort.java:899)atjava.util.TimSort.mergeAt(TimSor
2021-01-14：timsort是什么，如何用代码实现？福大大架构师每日一题
福哥答案2021-01-14：答案来自此链接：介绍：timsort是一种混合、稳定高效的排序算法，源自合并排序和插入排序，旨在很好地处理多种真实数据。它由TimPeters于2002年实施使用在Python编程语言中。该算法查找已经排序的数据的子序列，并使用该知识更有效地对其余部分进行排序。这是通过将已识别的子序列（称为运行）与现有运行合并直到满足某些条件来完成的。从版本2.3开始，Timsort
面试：聊一聊 Java 数组默认的排序算法，我懵了 wadfdhsajd 框架后端 java java 排序算法算法
背景之前一直没关注过Java底层排序的算法，才仔细看了下Timsort。Timsort是一个混合、稳定的排序算法，简单来说就是归并排序和二分插入排序算法的混合体，号称世界上最好的排序算法。它由TimPeters在2002年提出并实现，一直是Python的标准排序算法。Java在1.7后增加了TimsortAPI，从Java中的Arrays.sort可以看出它是默认的排序算法，主要用于非原始类型数组
【Java】Java中对List进行排序 ⁢Easonhe java list 开发语言
探讨几种Java对List进行排序的方法。使用Collections.sort()方法Java中的Collections.sort()方法是对List进行排序的最常用方法。它使用TimSort算法（是一种稳定的，基于合并的排序算法，是插入排序和归并排序的混合体），具有O(nlogn)的时间复杂度。importjava.util.*;publicclassMain{publicstaticvoidm
最快的排序算法是什么 fanyamin mozilla 快速排序 regex erp wap
最快的排序算法是什么，很多人的第一反应是快排，感觉QuickSort当然应该最快了，其实并非如此，快排是不稳定的，最坏情况下，快排序并不是最优，Java7中引入的TimSort就是一个结合了插入排序和归并排序的高效算法.Timsort最早是TimPeters于2001年为Python写的排序算法。自从发明该算法以来，它已被用作Python，Java，Android平台和GNUOctave中的默认排
Comparator 之于排序 nightkidjj
java里面常用的排序接口时Arrays.sort(T[],Comparator)接口，该方法在java7及android上采用的是TimSort,一个号称比快排更快，时间复杂度介于o(n)到o(nlogn)之间。排序算法一个很重要的方面就是排序稳定性：相等元素在排序之后仍然要保持排序前的顺序。TimSort是一个稳定的算法，但这依赖与Comparator的写法。先看下Comparator的声明:
TimSort算法（JDK）晓鑫_
算法介绍JDK1.8中，对于列表的排序，java.util.List中提供了sort方法，调用的Arrays.sort(T[],Comparator)，Arrays提供的对Object的一种排序方法（这里用的是泛型T，还有Object[]对应的排序方法），在该方法中可以看到使用的是TimSort类的静态方法对数组进行排序，TimSort类的内容就是TimSort算法的实现。TimSort是一种混合
【开发经验】java list.sort的坑叁滴水 java开发 java jvm sort 排序
异常信息Format:Comparisonmethodviolatesitsgeneralcontract!Params:nullStackTrace:java.lang.IllegalArgumentException:Comparisonmethodviolatesitsgeneralcontract!atjava.util.TimSort.mergeHi(TimSort.java:899)a
世界上最快的排序算法——Timsort xlj3
转：世界上最快的排序算法——Timsort前言经过60多年的发展，科学家和工程师们发明了很多排序算法，有基本的插入算法，也有相对高效的归并排序算法等，他们各有各的特点，比如归并排序性能稳定、堆排序空间消耗小等等。但是这些算法也有自己的局限性比如快速排序最坏情况和冒泡算法一样，归并排序需要消耗的空间最多，插入排序平均情况的时间复杂度太高。在实际工程应用中，我们希望得到一款综合性能最好的排序算法，能够
TimSort——最快的排序算法 JarodYv 硬核Python 排序算法算法数据结构 python
TimSort——最快的排序算法排序算法是每个程序员绕不开的课题，无论是大学课程还是日常工作，都离不开排序算法。常见的排序算法有：冒泡排序、选择排序、插入排序、希尔排序、归并排序、快速排序、堆排序、基数排序等。下面是这些算法性能的概览：算法平均时间复杂度最好情况最差情况空间复杂度排序方式稳定性冒泡排序O(n2)O(n^2)O(n2)O(n)O(n)O(n)O(n2)O(n^2)O(n2)O(1)O
历年阿里面试题汇总深度思考中
Volatitle的特征？Volatitle的内存语义？Volatitle的重排序？内存屏障/内存栅栏？happens-before原则？手机扫二维码登录是怎么实现的？Java线程有哪些状态，这些状态之间是如何转化的？List接口、Set接口和Map接口的区别Cookie和Session的区别？Java中的equals和hashCode方法详解?Java中CAS算法?TimSort原理?compa
世界上最快的排序算法-Timsort Hello_java大师排序算法算法数据结构 sql spring cloud
Timsort是一个混合、稳定的排序算法，简单来说就是归并排序和二分插入排序算法的混合体，号称世界上最好的排序算法。Timsort一直是Python的标准排序算法。JavaSE7后添加了TimsortAPI，我们从Arrays.sort可以看出它已经是非原始类型数组的默认排序算法了。所以不管是进阶编程学习还是面试，理解Timsort是比较重要。//Listsort()defaultvoidsort
java.lang.IllegalArgumentException: Comparison method violates its general contract! 钦_79f7
问题&解决上线日志功能后，发现多了很多如下的异常信息：java.lang.IllegalArgumentException:Comparisonmethodviolatesitsgeneralcontract!atjava.util.TimSort.mergeHi(TimSort.java:899)atjava.util.TimSort.mergeAt(TimSort.java:516)atjav
解决java.lang.IllegalArgumentException: Comparison method violates its general contract 晖仔Milo
今天在项目里使用Collections.sort方法是报错了Exceptioninthread"main"java.lang.IllegalArgumentException:Comparisonmethodviolatesitsgeneralcontract!atjava.util.TimSort.mergeHi(TimSort.java:899)atjava.util.TimSort.merg
Comparison method violates its general contract! 骑着乌龟去看海
一、背景昨天在使用公司的某个平台时，意外遇到了一个问题：Comparisonmethodviolatesitsgeneralcontract!以前没有见过这个异常，于是拿这个异常在网上搜了一下，发现是TimSort排序导致的，这里简单记录下。二、复现+测试代码JDK版本：master@jiangmufeng~$java-versionjavaversion"1.8.0_191"Java(TM)SE
2019 Java最常见架构技术面试题汇总：JVM+并发+锁+数据库+Spring Java微服务
Java基础以及多个“比较”1.Collections.sort排序内部原理在Java6中Arrays.sort()和Collections.sort()使用的是MergeSort，而在Java7中，内部实现换成了TimSort，其对对象间比较的实现要求更加严格2.hashMap原理，Java8做的改变从结构实现来讲，HashMap是数组+链表+红黑树（JDK1.8增加了红黑树部分）实现的。Has
「面试必备」常见Java面试题大综合马云见了都点赞 fad2aa506f5e
一、Java基础1、Arrays.sort实现原理和Collections.sort实现原理答：Collections.sort方法底层会调用Arrays.sort方法，底层实现都是TimeSort实现的。TimSort算法就是找到已经排好序数据的子序列，然后对剩余部分排序，然后合并起来.2、foreach和while的区别(编译之后)线程池的种类，区别和使用场景3、分析线程池的实现原理和线程的调
【博学谷学习记录】超强总结，用心分享丨人工智能 Python基础个人学习总结之列表排序鹏晓星学习笔记 python 学习开发语言
目录前言简述list.sort()语法返回值实例无参参数key参数reversesorted()语法返回值实例无参参数key参数reverseoperator.itemgetter功能简述实例List.sort与sored区别sorted原理：Timsort算法扩展list原理数据结构前言经过一周的学习，对Python基础部分有了一定的了解。在学习Python中list时，了解到了列表排序，于是对
Java实现世界上最快的排序算法Timsort的示例代码
目录背景前置知识指数搜索二分插入排序归并排序Timsort执行过程升序运行几个关键阀值运行合并合并条件合并内存开销合并优化背景Timsort是一个混合、稳定的排序算法，简单来说就是归并排序和二分插入排序算法的混合体，号称世界上最好的排序算法。Timsort一直是Python的标准排序算法。JavaSE7后添加了TimsortAPI，我们从Arrays.sort可以看出它已经是非原始类型数组的默认排
世界上最快的排序算法-Timsort javapython排序算法
背景Timsort是一个混合、稳定的排序算法，简单来说就是归并排序和二分插入排序算法的混合体，号称世界上最好的排序算法。Timsort一直是Python的标准排序算法。JavaSE7后添加了TimsortAPI，我们从Arrays.sort可以看出它已经是非原始类型数组的默认排序算法了。所以不管是进阶编程学习还是面试，理解Timsort是比较重要。//Listsort()defaultvoidso
Spring4.1新特性——Spring MVC增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
mysql 性能查询优化 annan211 java sql 优化 mysql 应用服务器
1 时间到底花在哪了？ mysql在执行查询的时候需要执行一系列的子任务，这些子任务包含了整个查询周期最重要的阶段，这其中包含了大量为了检索数据列到存储引擎的调用以及调用后的数据处理，包括排序、分组等。在完成这些任务的时候，查询需要在不同的地方花费时间，包括网络、cpu计算、生成统计信息和执行计划、锁等待等。尤其是向底层存储引擎检索数据的调用操作。这些调用需要在内存操
windows系统配置 cherishLC windows
删除Hiberfil.sys ：使用命令powercfg -h off 关闭休眠功能即可： http://jingyan.baidu.com/article/f3ad7d0fc0992e09c2345b51.html 类似的还有pagefile.sys msconfig 配置启动项 shutdown 定时关机 ipconfig 查看网络配置 ipconfig /flushdns
人体的排毒时间 Array_06 工作
======================== || 人体的排毒时间是什么时候？|| ======================== 转载于： http://zhidao.baidu.com/link?url=ibaGlicVslAQhVdWWVevU4TMjhiKaNBWCpZ1NS6igCQ78EkNJZFsEjCjl3T5EdXU9SaPg04bh8MbY1bR
ZooKeeper cugfy zookeeper
Zookeeper是一个高性能，分布式的，开源分布式应用协调服务。它提供了简单原始的功能，分布式应用可以基于它实现更高级的服务，比如同步，配置管理，集群管理，名空间。它被设计为易于编程，使用文件系统目录树作为数据模型。服务端跑在java上，提供java和C的客户端API。 Zookeeper是Google的Chubby一个开源的实现，是高有效和可靠的协同工作系统，Zookeeper能够用来lea
网络爬虫的乱码处理随意而生爬虫网络
下边简单总结下关于网络爬虫的乱码处理。注意，这里不仅是中文乱码，还包括一些如日文、韩文、俄文、藏文之类的乱码处理，因为他们的解决方式是一致的，故在此统一说明。网络爬虫，有两种选择，一是选择nutch、hetriex，二是自写爬虫，两者在处理乱码时，原理是一致的，但前者处理乱码时，要看懂源码后进行修改才可以，所以要废劲一些；而后者更自由方便，可以在编码处理
Xcode常用快捷键张亚雄 xcode
一、总结的常用命令：隐藏xcode command+h 退出xcode command+q 关闭窗口 command+w 关闭所有窗口 command+option+w 关闭当前
mongoDB索引操作 adminjun mongodb 索引
一、索引基础： MongoDB的索引几乎与传统的关系型数据库一模一样，这其中也包括一些基本的优化技巧。下面是创建索引的命令： > db.test.ensureIndex({"username":1}) 可以通过下面的名称查看索引是否已经成功建立： &nbs
成都软件园实习那些话 aijuans 成都软件园实习
无聊之中，翻了一下日志，发现上一篇经历是很久以前的事了，悔过~~ 　　断断续续离开了学校快一年了，习惯了那里一天天的幼稚、成长的环境，到这里有点与世隔绝的感觉。不过还好，那是刚到这里时的想法，现在感觉在这挺好，不管怎么样，最要感谢的还是老师能给这么好的一次催化成长的机会，在这里确实看到了好多好多能想到或想不到的东西。　　都说在外面和学校相比最明显的差距就是与人相处比较困难，因为在外面每个人都
Linux下FTP服务器安装及配置 ayaoxinchao linux FTP服务器 vsftp
检测是否安装了FTP [root@localhost ~]# rpm -q vsftpd 如果未安装：package vsftpd is not installed 安装了则显示：vsftpd-2.0.5-28.el5累死的版本信息安装FTP 运行yum install vsftpd命令，如[root@localhost ~]# yum install vsf
使用mongo-java-driver获取文档id和查找文档 BigBird2012 driver
注：本文所有代码都使用的mongo-java-driver实现。在MongoDB中，一个集合（collection）在概念上就类似我们SQL数据库中的表（Table），这个集合包含了一系列文档（document）。一个DBObject对象表示我们想添加到集合（collection）中的一个文档（document），MongoDB会自动为我们创建的每个文档添加一个id，这个id在
JSONObject以及json串 bijian1013 json JSONObject
一.JAR包简介要使程序可以运行必须引入JSON-lib包，JSON-lib包同时依赖于以下的JAR包： 1.commons-lang-2.0.jar 2.commons-beanutils-1.7.0.jar 3.commons-collections-3.1.jar &n
[Zookeeper学习笔记之三]Zookeeper实例创建和会话建立的异步特性 bit1129 zookeeper
为了说明问题，看个简单的代码， import org.apache.zookeeper.*; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ThreadLocal
【Scala十二】Scala核心六：Trait bit1129 scala
Traits are a fundamental unit of code reuse in Scala. A trait encapsulates method and field definitions, which can then be reused by mixing them into classes. Unlike class inheritance, in which each c
weblogic version 10.3破解 ronin47 weblogic
版本：WebLogic Server 10.3 说明：%DOMAIN_HOME%：指WebLogic Server 域(Domain）目录例如我的做测试的域的根目录 DOMAIN_HOME=D:/Weblogic/Middleware/user_projects/domains/base_domain 1.为了保证操作安全，备份%DOMAIN_HOME%/security/Defa
求第n个斐波那契数 BrokenDreams
今天看到群友发的一个问题：写一个小程序打印第n个斐波那契数。自己试了下，搞了好久。。。基础要加强了。 &nbs
读《研磨设计模式》-代码笔记-访问者模式-Visitor bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; interface IVisitor { //第二次分派，Visitor调用Element void visitConcret
MatConvNet的excise 3改为网络配置文件形式 cherishLC matlab
MatConvNet为vlFeat作者写的matlab下的卷积神经网络工具包，可以使用GPU。主页： http://www.vlfeat.org/matconvnet/ 教程： http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html 注意：需要下载新版的MatConvNet替换掉教程中工具包中的matconvnet： http
ZK Timeout再讨论 chenchao051 zookeeper timeout hbase
http://crazyjvm.iteye.com/blog/1693757 文中提到相关超时问题，但是又出现了一个问题，我把min和max都设置成了180000，但是仍然出现了以下的异常信息： Client session timed out, have not heard from server in 154339ms for sessionid 0x13a3f7732340003
CASE WHEN 用法介绍 daizj sql group by case when
CASE WHEN 用法介绍 1. CASE WHEN 表达式有两种形式 --简单Case函数 CASE sex WHEN '1' THEN '男' WHEN '2' THEN '女' ELSE '其他' END --Case搜索函数 CASE WHEN sex = '1' THEN
PHP技巧汇总:提高PHP性能的53个技巧 dcj3sjt126com PHP
PHP技巧汇总:提高PHP性能的53个技巧　　用单引号代替双引号来包含字符串，这样做会更快一些。因为PHP会在双引号包围的字符串中搜寻变量，　　单引号则不会，注意：只有echo能这么做，它是一种可以把多个字符串当作参数的函数译注：　　PHP手册中说echo是语言结构，不是真正的函数，故把函数加上了双引号)。　　1、如果能将类的方法定义成static，就尽量定义成static，它的速度会提升将近4倍
Yii框架中CGridView的使用方法以及详细示例 dcj3sjt126com yii
CGridView显示一个数据项的列表中的一个表。表中的每一行代表一个数据项的数据,和一个列通常代表一个属性的物品(一些列可能对应于复杂的表达式的属性或静态文本)。　　CGridView既支持排序和分页的数据项。排序和分页可以在AJAX模式或正常的页面请求。使用CGridView的一个好处是,当用户浏览器禁用JavaScript,排序和分页自动退化普通页面请求和仍然正常运行。实例代码如下：
Maven项目打包成可执行Jar文件 dyy_gusi assembly
Maven项目打包成可执行Jar文件在使用Maven完成项目以后，如果是需要打包成可执行的Jar文件，我们通过eclipse的导出很麻烦，还得指定入口文件的位置，还得说明依赖的jar包，既然都使用Maven了，很重要的一个目的就是让这些繁琐的操作简单。我们可以通过插件完成这项工作，使用assembly插件。具体使用方式如下： 1、在项目中加入插件的依赖： <plugin>
php常见错误 geeksun PHP
1. kevent() reported that connect() failed (61: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastc
修改linux的用户名 hongtoushizi linux change password
Change Linux Username 更改Linux用户名，需要修改4个系统的文件： /etc/passwd /etc/shadow /etc/group /etc/gshadow 古老/传统的方法是使用vi去直接修改，但是这有安全隐患（具体可自己搜一下），所以后来改成使用这些命令去代替： vipw vipw -s vigr vigr -s 具体的操作顺
第五章常用Lua开发库1-redis、mysql、http客户端 jinnianshilongnian nginx lua
对于开发来说需要有好的生态开发库来辅助我们快速开发，而Lua中也有大多数我们需要的第三方开发库如Redis、Memcached、Mysql、Http客户端、JSON、模板引擎等。一些常见的Lua库可以在github上搜索，https://github.com/search?utf8=%E2%9C%93&q=lua+resty。 Redis客户端 lua-resty-r
zkClient 监控机制实现 liyonghui160com zkClient 监控机制实现
直接使用zk的api实现业务功能比较繁琐。因为要处理session loss，session expire等异常，在发生这些异常后进行重连。又因为ZK的watcher是一次性的，如果要基于wather实现发布/订阅模式，还要自己包装一下，将一次性订阅包装成持久订阅。另外如果要使用抽象级别更高的功能，比如分布式锁，leader选举
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句 pda158 mysql
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句：　　方法一：SELECT table_name, column_name from information_schema.columns WHERE column_name LIKE 'Name'; 　　方法二：SELECT column_name from information_schema.colum
程序员对英语的依赖 Smile.zeng 英语程序猿
1、程序员最基本的技能，至少要能写得出代码，当我们还在为建立类的时候思考用什么单词发牢骚的时候，英语与别人的差距就直接表现出来咯。 2、程序员最起码能认识开发工具里的英语单词，不然怎么知道使用这些开发工具。 3、进阶一点，就是能读懂别人的代码，有利于我们学习人家的思路和技术。 4、写的程序至少能有一定的可读性，至少要人别人能懂吧... 以上一些问题，充分说明了英语对程序猿的重要性。骚年
Oracle学习笔记(8) 使用PLSQL编写触发器 vipbooks oracle sql 编程活动 Access
时间过得真快啊，转眼就到了Oracle学习笔记的最后个章节了，通过前面七章的学习大家应该对Oracle编程有了一定了了解了吧，这东东如果一段时间不用很快就会忘记了，所以我会把自己学习过的东西做好详细的笔记，用到的时候可以随时查找，马上上手！希望这些笔记能对大家有些帮助！这是第八章的学习笔记，学习完第七章的子程序和包之后

timsort

你可能感兴趣的:(timsort)