翻译自高赞答案
https://leetcode.com/problems/median-of-two-sorted-arrays/discuss/2481/Share-my-O(log(min(mn))-solution-with-explanation
要解决这个问题首先得理解中位数的概念,中位数将集合划分成两个等长的部分,且后半部分的元素都大于等于前半部分的元素。
首先我们选择一个随机的位置i将集合A分成两个部分。
left_A | right_A
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
因为A有m个元素,所以有m+1个分割的位置。分割后,我们有len(left_A)=i,len(right_A)=m-1。
同时,我们将B集合以同样的方式切割开。
left_B | right_B
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
将left_A和left_B放到一个集合中,right_A和right_B放到另一个集合中
left_part | right_part
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
如果我们能够保证
1) len(left_part) == len(right_part)
2) max(left_part) <= min(right_part)
这时,我们可以将{A,B}这种的所有元素划分成等长的两个部分,而且其中一个部分的所有元素都大于等于另一个部分中的任意元素。这时,median=(max(left_part)+min(right_part))/2;
为了保证上述的两个条件,
(1) i + j == m - i + n - j (or: m - i + n - j + 1)
if n >= m, we just need to set: i = 0 ~ m, j = (m + n + 1)/2 - i
(2) B[j-1] <= A[i] and A[i-1] <= B[j]
ps.1 为了更加简便,我们认为A[i-1],B[i-1],A[i],B[i]都是有效的,即使i==0||i==m||j==0||j==n。后面会讨论如何解决边界问题。
ps. 2 为什么n>=m?因为我们要保证j是非负的,而i>=0&&i<=m,j=(m+n+1)/2-i
然后我们需要做的是
Searching i in [0, m], to find an object `i` that:
B[j-1] <= A[i] and A[i-1] <= B[j], ( where j = (m + n + 1)/2 - i )
然后我们便可使用二分查找来找到合适的位置i。
<1> Set imin = 0, imax = m, then start searching in [imin, imax]
<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i
<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
that we may encounter:
B[j-1] <= A[i] and A[i-1] <= B[j]
Means we have found the object `i`, so stop searching.
B[j-1] > A[i]
Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
Can we `increase` i?
Yes. Because when i is increased, j will be decreased.
So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
be satisfied.
Can we `decrease` i?
`No!` Because when i is decreased, j will be increased.
So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
be never satisfied.
So we must `increase` i. That is, we must ajust the searching range to
[i+1, imax]. So, set imin = i+1, and goto <2>.
A[i-1] > B[j]
Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
That is, we must ajust the searching range to [imin, i-1].
So, set imax = i-1, and goto <2>.
当我们找到这个位置i的时候,中位数就是
max(A[i-1], B[j-1]) (when m + n is odd)
or (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 (when m + n is even)
现在来考虑边界情况i==0||i==m||j==0||j==n,此时A[i-1],B[i-1],A[i],B[i]是无效的。
我们需要做的是确保max(left_part)<=min(right_part),所以如果A[i],B[j]不存在,那么我们不去检查包含A[i]和B[j]的条件,所以我们现在需要做的是:
Searching i in [0, m], to find an object `i` that:
(j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j == n or A[i-1] <= B[j])
where j = (m + n + 1)/2 - i
在一次循环中,我们只会遇到以下三种情况:
(j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j = n or A[i-1] <= B[j])
Means i is perfect, we can stop searching.
<b> j > 0 and i < m and B[j - 1] > A[i]
Means i is too small, we must increase it.
i > 0 and j < n and A[i - 1] > B[j]
Means i is too big, we must decrease it.
最终代码实现
def median(A, B):
m, n = len(A), len(B)
if m > n:
A, B, m, n = B, A, n, m
if n == 0:
raise ValueError
imin, imax, half_len = 0, m, (m + n + 1) / 2
while imin <= imax:
i = (imin + imax) / 2
j = half_len - i
if i < m and B[j-1] > A[i]:
# i is too small, must increase it
imin = i + 1
elif i > 0 and A[i-1] > B[j]:
# i is too big, must decrease it
imax = i - 1
else:
# i is perfect
if i == 0: max_of_left = B[j-1]
elif j == 0: max_of_left = A[i-1]
else: max_of_left = max(A[i-1], B[j-1])
if (m + n) % 2 == 1:
return max_of_left
if i == m: min_of_right = B[j]
elif j == n: min_of_right = A[i]
else: min_of_right = min(A[i], B[j])
return (max_of_left + min_of_right) / 2.0