Leetcode分类解析:二分查找

Leetcode分类解析:二分查找


1.原始二分查找


1.1 典型例题

35-Search Insert Position (Medium): Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array.
Here are few examples.
[1,3,5,6], 5 → 2
[1,3,5,6], 2 → 1
[1,3,5,6], 7 → 4
[1,3,5,6], 0 → 0

首先来看一道最经典的二分查找题,考察的就是二分查找的原始实现。因为这道题比较经典,所以下面源码做了大量注释。说是注释,其实是断言,用来在后面说明这段代码的正确性。特别注意一点,就是那个看似奇怪的mid计算方式,为什么不是(low+high)/2呢?做了很久Leetcode我都没注意到,看到有些答案这样写的还以为是多此一举,直到做到某一道题发生溢出,真是井底之蛙!这样写就是因为当low和high特别大时相加会发生Integer溢出,要么都转成long做完运算再转回来,但是下面这种写法更加方便,low+(high-low)/2,我们变相去求两者差的一半,由加法变成了减法。具体请看Google的这篇Blog。现在再看这种写法是不是感觉非常漂亮!

    public int searchInsert(int[] nums, int target) {
        if (nums.length == 0) {
            return 0;
        }

        // MustBe(0,n-1)
        int low = 0, high = nums.length - 1;

        // MustBe(low,high): (1) Initialization: invariant holds
        while (low <= high) {

            // MustBe(low,high) and low <= high
            int mid = low + (high - low) / 2;

            // MustBe(low,high) and low <= mid <= high
            if (nums[mid] < target) {

                // MustBe(low,high) and num[mid] < target <= num[high]
                // MustBe(mid+1,high)
                low = mid + 1;

                // MustBe(low,high): (2) Preservation: invariant holds
            } else if (nums[mid] > target) {

                // MustBe(low,high) and num[low] <= target < num[mid]
                // MustBe(low,mid-1)
                high = mid - 1;

                // MustBe(low,high)
            } else {
                return mid;
            }
        } // (3) Termination: range shrinks, so it must terminate

        // low > high, so it was low = high = mid (it's impossible to get low > high if low < high)
        // 1) target < num[mid], low=mid, high=mid-1, low is the insert position
        // 2) target > num[mid], high=mid, low=mid+1, low is the insert position too!
        return low;
    }

1.2 正确性证明

一直觉得Binary Search是非常好的考察正确性证明思路的一类题。因为它对区间……

下面就以这道题的原始Binary Search为例看一下为什么我们的代码是正确的,后面的题不会再这样繁琐的证明了。通过这道题了解思路和技巧,后面的题我们只要把握住关键点就可以了。

首先我们要搜索target的位置,所以用low和high组成的区间表示搜索范围,不变量Invariant就是:MustBe(low,high)表示target一定存在于区间[low,high],否则它不存在于nums数组。接下来就是运用循环迭代不断缩小这个区间,最重要的是在这个过程中保持Invariant始终为真,最终我们得到的就一定是正确的结果:

  1. 初始化(Initialization):初始时,MustBe(0,nums.length-1)包含了整个数组,按照MustBe的定义来说target要么存在于整个数组要么不存在,这肯定为真。于是Invariant初始状态没问题!
  2. 维护(Preservation):对于这道题很容易,如果target在前一半,那么MustBe(low,mid-1),所以更新high=mid-1就能维持Invariant为真了。反之,更新low=mid+1就行了。
  3. 终止(Termination):能够清楚看出在循环过程中,low和high区间是在不断shrink。

有了以上三条,当循环终止时low > high并且MustBe(low,high)为真,于是我们就能知道target不存在。其实个人感觉:MustBe(low,high)应该叫做CanBe(low,high)更贴切,因为target可能不存在于数组中。所以最后CanBe(low,high)并且low > high,使得CanBe变成了MustNotBe。但也许CanBe语气比较弱,不像断言?后面我们会看到153-Find Minimum in Rotated Sorted Array找Minimum最小值,这道题的Invariant叫MustBe才比较合适!


1.3 循环终止前的样子

但这道题还没完,可以说最关键的一点来了:要返回插入位置而不是-1就完事了。这就要求我们仔细分析一下循环停止前那一刻是什么样子。因为low < high不可能直接跳到low > high,low = high - 1是倒数第二轮的情况,而最后一轮一定是low = high。于是就有low = mid = high,如果target > nums[mid],那么会导致low = mid + 1,low就是插入位置。而如果target < nums[mid],那么会导致high = mid - 1,low同样是插入位置。所以不断最后一轮是什么情况,low一定是插入位置


1.4 解题关键点

这道题的代码非常标准,可以作为下面各道题Solution的模板。做下面各题时如果赶时间,比如面试时,可以先把上面的代码骨架写出来,再思考各个关键点。那Binary Search类型题都还有哪些变化,要把握住哪些关键点呢?在开始各个击破之前,先总结一下:

以二分查找为蓝本,这一类查找类型题还是能玩出不少变化的,例如最关键的几点有:

  1. low和high初始值:大部分都是数组的范围从0到N-1,个别像278-First Bad Version和374-Guess Number Higher or Lower是从1到N。
  2. 循环结束条件
    2.1 如果我们要找的target一定存在的话,那用low < high,最后low = high导致循环结束,low位置就是我们要找的数字了。这种一定存在的问题有:69-Sqrt、153-Find Minimum in Rotated Sorted Array、162-Find Peak Element、278-First Bad Version、374-Guess Number Higher or Lower。
    2.2 如果target不一定存在的话,就要用low <= high,最终low大于high导致循环退出,说明target不存在。
  3. 区间缩减
    3.1 首先是如何确定缩减方向:最简单的是比较A[low/high]和A[mid],复杂的如33-Search in Rotated Sorted Array还要比较是不是真的在那个区间内,还有就是367-Valid Perfect Square比较平方数的。
    3.2 其次是如何确定缩减量。原则就是:确定搜索区间的缩减方向后,看如何更新low和high能保证Invariant为真。普通查找如35-Search Insert Position很简单,low=mid+1,high=mid-1。像278-First Bad Version中确定mid是Bad的话则high=mid,但mid不是Bad的话那么low=mid+1。69-Sqrt(x)中mid的平方小于x不代表mid一定小于x的平方根,例如2*2<5,所以low=mid,但反之mid一定大于x的平方根,所以high=mid+1。严重注意:high=mid没关系,但low=mid有可能导致死循环。因为low=high-1时low=mid,如果又进入low=mid分支相当于搜索区间没缩小。但high一定大于mid,所以不会造成死循环。方法就是:在low=high-1之前就跳出循环,然后自行判断。
  4. 最终结果确定:要看清题意最后需要返回什么,是-1,下标,还是值。

1.5 其他习题

278-First Bad Version (Easy): You are a product manager and currently leading a team to develop a new product. Unfortunately, the latest version of your product fails the quality check. Since each version is developed based on the previous version, all the versions after a bad version are also bad.
Suppose you have n versions [1, 2, …, n] and you want to find out the first bad one, which causes all the following ones to be bad.
You are given an API bool isBadVersion(version) which will return whether version is bad. Implement a function to find the first bad version. You should minimize the number of calls to the API.

Hint: 又是一道二分查找的变种题,**说二分查找是最重要的面试算法真不为过,《编程珠玑》选它作为例子是有道理的。它尤其能考察我们的编码编写和正确性证明能力!**low和high如何增减,low < high还是<=结束等等。

374-Guess Number Higher or Lower (Easy): We are playing the Guess Game. The game is as follows: I pick a number from 1 to n. You have to guess which number I picked. Every time you guess wrong, I’ll tell you whether the number is higher or lower. You call a pre-defined API guess(int num) which returns 3 possible results (-1, 1, or 0):
-1 : My number is lower
1 : My number is higher
0 : Congrats! You got it!
Example: n = 10, I pick 6. Return 6.

Hint: 首先,因为target一定存在,所以就无需标准二分查找里的low>high返回-1的判断。另一个一直没注意的小细节是:计算mid要用low加上high和low差的一半的方法,否则就会溢出。之前做的二分查找相关的题,都没有大数据Case所以自己也没发现问题。这也给自己提了个醒:做完一定要看答案,反复学习更好的代码,发现自己的不足,甚至尝试更优的方案。否则做过多少遍,也都是没用的!!!


2.旋转排序数组


2.1 有序与乱序的利用

现在开始就要玩出花样了,只是排序数组有什么意思,以某个位置为pivot旋转一下再搜索才好玩。试想一下在旋转后的排序数组里如何找Minimum?答案就是:找乱序部分,Minimum一定在其中。如果乱序不存在才去有序部分找,当然这个分支判断其实已经被上面包含了,因为乱序不存在也就是mid=high。那找任意一个数字呢?答案就是:找有序部分,如果target在其范围内就继续找,否则就去另一半找。这两句话也涵盖了下面两道题的核心,因为找Minimum比较简单,所以还是以找任意数字为例题吧。

33-Search in Rotated Sorted Array (Hard): Suppose a sorted array is rotated at some pivot unknown to you beforehand. (i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2). You are given a target value to search. If found in the array return its index, otherwise return -1. You may assume no duplicate exists in the array.

Hint: 主要思考如何确定二分查找的界限。对于这种“旋转”数组,取mid后至少有一半数组是有序的,可以通过low和mid比较来判断,那么如果target在[nums[low],nums[mid]]中的话就继续在前一半查找,否则都去后一半查找。

    public int search2(int[] nums, int target) {
        int low = 0, high = nums.length - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            if (nums[mid] == target) {
                return mid;
            }
            if (nums[mid] < nums[high]) {
                if (nums[mid] < target && target <= nums[high]) {
                    low = mid + 1;
                } else {
                    high = mid - 1;
                }
            } else {
                if (nums[low] <= target && target < nums[mid]) {
                    high = mid - 1;
                } else {
                    low = mid + 1;
                }
            }
        }
        return -1;
    }

2.2 duplicate元素的影响

现在才是真正的难点。那duplicate元素对Binary Search到底有什么影响?简单来说,我们上面的经验法则行不通了,因为duplicate导致low、mid、high三者有相等情况,从而无法知道目标值在左半部分还是右半部分。例如[0,0,0,1,0],这个1可以“隐藏”在数组中的任意位置。这样我们就没法做到减半了,所以最坏情况下,Binary Search的时间复杂度由O(logN)退化为O(N)。

81-Search in Rotated Sorted Array II: Follow up for “Search in Rotated Sorted Array”: What if duplicates are allowed? Would this affect the run-time complexity? How and why? Write a function to determine if a given target is in the array.

那如何解决呢?下面就是Solution,看着跟前面代码框架几乎一样啊。但是神奇的地方往往很不起眼,就是最后一个else中的high–盘活了一切。为什么可以丢掉high呢?首先我们只通过nums[mid]和nums[high]判断哪一侧是乱序,其次最后一个else表示nums[mid]=nums[high],而此处nums[mid]一定不等于target,所以high–缩小范围完全没有问题!

    public boolean search(int[] nums, int target) {
        int low = 0, high = nums.length - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            if (nums[mid] == target) {
                return true;
            }
            if (nums[mid] < nums[high]) {       // Second half is sorted
                if (nums[mid] < target && target <= nums[high]) {
                    low = mid + 1;
                } else {
                    high = mid - 1;
                }
            } else if (nums[mid] > nums[high]) {// First half may be sorted eg.[0,1,4,5,0] or all the same eg.[4,4,4,5,0]
                if (nums[low] <= target && target < nums[mid]) {
                    high = mid - 1;
                } else {
                    low = mid + 1;
                }
            } else {                            // A[mid] = A[high] and A[mid]<>target, so it's safe to shrink from high bound
                high--;
            }
        }
        return false;
    }

2.3 其他习题

153-Find Minimum in Rotated Sorted Array (Medium): Suppose a sorted array is rotated at some pivot unknown to you beforehand. (i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2). Find the minimum element. You may assume no duplicate exists in the array.

Hint: 二分查找的精髓就在于不断Shrink区间,最后锁定目标值的位置,这也是它的循环不变量。对于这道题来说,关键就在于确定最小值在左还是在右。这并不难,如果发现右半部分是有序的(判断左半部分有序是没用的,最小值既可能是第一个也可能因为旋转在右半部分),则这些值都可以排除掉,只有中点和前半部分的数有可能是最小值,所以就去前半部分继续查找就行了。否则的话则右半部分是乱序,说明最小值在其中。因为每次Shrink区间时不一定会缩小,容易死循环。

154. Find Minimum in Rotated Sorted Array II (Hard): Follow up for “Find Minimum in Rotated Sorted Array”: What if duplicates are allowed? Would this affect the run-time complexity? How and why?

Hint: 还记得前一道第153题的解题关键是什么吗?就是通过右半部分是否有序来确定缩减搜索范围。但是当允许重复值时,我们的判断法则就失灵了!比如[4,2,4,4,4]和[4,4,4,2,4]通过判断最中间的和最后的这两个4相等,我们根本无法知道最小值到底在哪半边?所以遇到这种情况就只能两边一起搜索了。这也是为什么题目中问对复杂度有什么影响。因为在不断缩小范围的过程中,某些范围要两边一起搜而有些只搜一边,所以用递归比较方便,不用手动记录上一层的状态了。于是,这道题改为递归来实现。二分查找的搜索空间树一般是很矮的,所以也不用担心大数据测试过不了。还有种方法是迭代,因为前面讲到的两边找其实不是完全必要,我们可以:发现有一边是乱序则将搜索范围减半,否则减1


3.二维搜索


3.1 区间的搜索

下面来看一道对Binary Search原始实现的简单扩展:找区间!听起来高级,说是二维,其实就是对Binary Search的一点点改造,不要被吓住,仔细看下面分析。

34-Search for a Range (Medium): Given a sorted array of integers, find the starting and ending position of a given target value. Your algorithm’s runtime complexity must be in the order of O(log n). If the target is not found in the array, return [-1, -1].
For example, Given [5, 7, 7, 8, 8, 10] and target value 8, return [3, 4].

说是找区间,这道题其实本质上就是找第一个和最后一个target出现的位置。方法就是:找到target保存到range数组之后继续在剩余区间内查找,如果又发现了target就更新下标。因为要不断向高低两个方向搜索,所以要用两个Binary Search。(当然也可以用分治的思想,以类似MergeSort为框架,不断merge区间的位置,但本文主要讲Binary Search,所以就不细说了)

    public int[] searchRange(int[] nums, int target) {
        int[] range = new int[]{ -1, -1 };
        if (nums.length == 0) {
            return range;
        }

        // 1.Find first occurence of target
        int low = 0, high = nums.length - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            if (nums[mid] < target) {
                low = mid + 1;
            } else if (nums[mid] > target) {
                high = mid - 1;
            } else {
                range[0] = mid;
                high = mid - 1;     // find target, but continue binary search on [low,mid)
            }
        }

        if (range[0] == -1) {
            return range;
        }

        // 2.Find last occurence of target
        low = 0; high = nums.length - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            if (nums[mid] < target) {
                low = mid + 1;
            } else if (nums[mid] > target) {
                high = mid - 1;
            } else {
                range[1] = mid;
                low = mid + 1;      // find target, but continue binary search on (mid,high]
            }
        }
        return range;
    }

3.2 下标的二次翻译

排好序的矩阵也是可以用Binary Search做搜索的,不用想的太复杂,直接做最普通的二分查找,当需要访问mid元素与target比较时,做一次下标的翻译转换就行了。

74-Search a 2D Matrix (Medium): Write an efficient algorithm that searches for a value in an m x n matrix. This matrix has the following properties: Integers in each row are sorted from left to right. The first integer of each row is greater than the last integer of the previous row.
For example, Consider the following matrix:
[
[1, 3, 5, 7],
[10, 11, 16, 20],
[23, 30, 34, 50]
]
Given target = 3, return true.

    // My 2nd: O(logN)
    public boolean searchMatrix(int[][] matrix, int target) {
        if (matrix.length == 0 || matrix[0].length == 0) {
            return false;
        }

        int m = matrix.length;
        int n = matrix[0].length;
        int low = 0, high = m * n - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            int num = matrix[mid / n][mid % n];     // key: index translation
            if (num > target) {
                high = mid - 1;
            } else if (num < target) {
                low = mid + 1;
            } else {
                return true;
            }
        }
        return false;
    }

其实前面讲过的33-Search in Rotated Sorted Array也是可以通过下标的二次翻译来实现的,具体方法就是:先找到pivot(方法其实就是第33题找Minimum),从而知道每个元素都照原始位置偏移了多少,然后执行一个普通的Binary Search时做翻译转换就可以了。

    public int search(int[] nums, int target) {
        int n = nums.length;

        // 1.Find pivot (smallest element position)
        // This is actually another problem 153
        int low = 0, high = n - 1;
        while (low < high) {
            int mid = low + (high - low) / 2;
            if (nums[mid] > nums[high]) {
                low = mid + 1;
            } else {
                high = mid;
            }
        }

        // 2.Pretend to binary search on a sorted array
        //  But transform index when we need to compare (realmid)
        int pivot = low;
        low = 0; high = n - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            int realmid = (mid + pivot) % n;  // key!!!
            if (nums[realmid] == target) {
                return realmid;
            } else if (nums[realmid] > target) {
                high = mid - 1;
            } else {
                low = mid + 1;
            }
        }
        return -1;
    }

3.3 其他习题

378-Kth Smallest Element in a Sorted Matrix (Medium): Given a n x n matrix where each of the rows and columns are sorted in ascending order, find the kth smallest element in the matrix. Note that it is the kth smallest element in the sorted order, not the kth distinct element.
Example:
matrix = [
[ 1, 5, 9],
[10, 11, 13],
[12, 13, 15]
],
k = 8, return 13.
Note: You may assume k is always valid, 1 ≤ k ≤ n2.

Hint: 一开始以为与第74-Search a 2D Matrix类似,结果发现矩阵并不完全有序,晕…… 先简单用最大堆做了,没有充分列用矩阵有序的特点,等回头再想办法吧。


4.无序找趋势

162-Find Peak Element (Medium): A peak element is an element that is greater than its neighbors. Given an input array where num[i] ≠ num[i+1], find a peak element and return its index. The array may contain multiple peaks, in that case return the index to any one of the peaks is fine. You may imagine that num[-1] = num[n] = -∞.
For example, in array [1, 2, 3, 1], 3 is a peak element and your function should return the index number 2.
Note: Your solution should be in logarithmic complexity.

Hint: 优雅的做法还是基于Binary Search。你没听错!无序的数组竟然也可以用Binary Search的思想来解决,厉害啊!代码如下:(也可以用分治的思想,但是在合并左右部分的结果时要做些工作,因为左半部分的最后一个和右半部分的第一个都是潜在的peak,要判断一下才能决定返回什么结果给上一层。此外,low=high时返回-1,表示没有peak值,但当数组只有一个元素时返回-1有不对,确定low和high的含义有些棘手,毕竟只有一个元素时peak就是这个元素是正确的……)

    public int findPeakElement(int[] nums) {
        int low = 0, high = nums.length - 1;
        while (low < high) {
            int mid1 = low + (high - low) / 2;
            int mid2 = mid1 + 1;

            // "The key mindset is to climb the rising slope."
            // It doesn't matter if you missed a peak
            if (nums[mid1] < nums[mid2]) {
                low = mid2;
            } else{
                high = mid1;
            }
        }
        return low;
    }

自己在尝试用Binary Search做的时候碰到的问题是:当一侧是乱序就好办,说明Peak肯定在其中。但是两侧都是有顺序的话该怎么办?应该选那一侧?不选的话至少也要shrink区间,该shrink哪一侧?例如,[1,100,2,3,4]就陷入了这种困境。上面的Solution巧妙的用两个mid找趋势,只要是判断出是不是上升趋势就行了。错过了某个Peak也没关系,因为提干说了开头和末尾都可以当作负无穷,只要是上升趋势那最终肯定会有个Peak的!


5.Invariant的破坏

367-Valid Perfect Square (Medium): Given a positive integer num, write a function which returns True if num is a perfect square else False.
Note: Do not use any built-in library function such as sqrt.
Example 1: Input: 16. Returns: True
Example 2: Input: 14. Returns: False

这道题看着还比较正常,因为:通过mid*mid与num的大小关系一定可以确定low和high的缩小方向和缩小量。唯一要注意的就是溢出问题,要用Long或num/mid避免Integer溢出。那这道题也没什么特别的啊?是的,通过这道题只是说明一些小问题,避免后面要讲的太多没有重点。热身完毕,继续看下一题吧,真正的挑战来了!

    public boolean isPerfectSquare(int num) {
        if (num <= 0) {
            return false;
        }

        int low = 1, high = num;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            long result = num - (long) mid * mid;
            if (result > 0) {
                low = mid + 1;
            } else if (result < 0) {
                high = mid - 1;
            } else {
                return true;
            }
        }
        return false;
    }

69-Sqrt(x) (Medium): Implement int sqrt(int x). Compute and return the square root of x.

这道题的难点就在于:mid的平方小于x不代表mid一定小于x的平方根,例如2*2<5。所以如果在第一个if分支处让low=mid+1的话会导致MustBe这个Invariant的失效。而网上不少做法都是在low > high导致循环退出后返回low-1或者high之类的,可能修改一下MustBe的含义的话也能说得通。但这里我还是按照更好证明正确性的代码作为例子吧。“mid的平方小于x不代表mid一定小于x的平方根”一定是在low+1=high时才会发生,所以不妨让循环在这之前就停下来,我们自己判断对错,就能保持住Invariant的性质了

    public int mySqrt(int x) {
        if (x <= 1) {
            return x;
        }

        int low = 1, high = x;

        // MustBe(low,high)
        while (low + 1 < high) {    // low+1=high -> low=mid will cause dead loop.
            int mid = low + (high - low) / 2;
            long result = x - (long) mid * mid;
            if (result > 0) {
                low = mid;          // mid*mid
            } else if (result < 0) {
                high = mid - 1;     // but mid*mid>x must suggest mid>int(sqrt(x)).
            } else {
                return mid;
            }
        }
        // low + 1 == high and MustBe(low,high)
        // => MustBe(low,low+1)
        return ((long) high * high <= x) ? high : low;
    }

你可能感兴趣的:(LeetCode,算法)