几种二分查找算法的代码和比较

4种不同的二分查找代码,都是正确的,但可能运行的结果都不同。至于原因,直接看代码和注释吧。

#include "stdafx.h"
#include <iostream>
#include <conio.h>
using namespace std;

// 第1种二分查找代码
// 来源:公司引擎中的代码
// 说明:对最大下标值+1,即nHigh++;判断循环退出的条件为(nLow < nHigh)
int BinSearch1(int *arry, int nDest, int nLow, int nHigh)
{
    nHigh++;//首先对最大下标值+1
    int nMid;
    do 
    {
        nMid = (nLow + nHigh) / 2;//nMid不会到达nHigh,所以不用担心非法访问
        if (nDest == arry[nMid])
            return nMid;
        else if (nDest < arry[nMid])
            nHigh = nMid;
        else
            nLow = nMid+1;
    } while (nLow < nHigh);//注意这里的退出条件
    return -1;
}

// 第2种二分查找代码
// 来源:《编程珠玑》
// 说明:判断循环退出的条件为(nLow <= nHigh)
int BinSearch2(int *arry, int nDest, int nLow, int nHigh)
{
    int nMid;
    while (nLow <= nHigh) 
    {
        nMid = (nLow + nHigh) / 2;
        if (nDest == arry[nMid])
            return nMid;
        else if (nDest < arry[nMid])
            nHigh = nMid-1;
        else
            nLow = nMid+1;
    } 
    return -1;
}

// 第3种二分查找代码
// 来源:《代码之美》第4章(原来是java代码,被我改成C++代码了)
// 说明:判断循环退出的条件为(nHigh - nLow > 1) 
int BinSearch3(int *arry, int nDest, int nLow, int nHigh)
{
    nLow--;//nLow 首先被置为-1
    nHigh++;//nHigh 首先被置为最大下标+1
    int nMid;
    while (nHigh - nLow > 1) 
    {
        nMid = nLow + (nHigh - nLow)/2;//防止大数溢出
        if (arry[nMid] > nDest)
            nHigh = nMid;
        else
            nLow = nMid;
    }
    if (-1 == nLow || arry[nLow] != nDest) 
        return -1;
    else
        return nLow;
}
// 备注:
// 对于这种写法,在数组中有重复元素的时候,最后得到的位置是数组中满足要求的最大下标值
// 我们可以改写该代码,使得函数返回数组中满足要求的最小下标值
// 请看下面的代码实现

// 第4种二分查找代码
// 来源:在上面的代码的基础做的修改,使得函数返回数组中满足要求的最小下标值
int BinSearch4(int *arry, int nDest, int nLow, int nHigh)
{
    int nSaveHigh = nHigh;
    nLow--;//nLow 首先被置为-1
    nHigh++;//nHigh 首先被置为最大下标+1
    int nMid;
    while (nHigh - nLow > 1) 
    {
        nMid = nLow + (nHigh - nLow)/2;//防止大数溢出
        if (arry[nMid] < nDest)
            nLow = nMid;
        else
            nHigh = nMid;
    }
    if (nSaveHigh == nHigh || arry[nHigh] != nDest) 
        return -1;
    else
        return nHigh;
}

int _tmain(int argc, _TCHAR* argv[])
{
//    int arry[] = {2,4,6,8,10};
    int arry[] = {1,1,1,2,2,2};
    int nCount = sizeof(arry)/sizeof(int);
    int (*pFun[])(int *, int , int , int ) = {
        BinSearch1, BinSearch2, BinSearch3, BinSearch4
    };

    for (int j = 0; j < sizeof(pFun)/sizeof(pFun[0]); j++)
    {
        cout<<"BinSearch"<<j+1<<":"<<endl;
        for (int i = 0; i < nCount; i++)
        {
            int nFind = pFun[j](arry, i, 0, nCount-1);
            if (nFind >= 0 && nFind <= nCount-1)
                cout<<"find "<<i<<", a["<<nFind<<"] = "<<arry[nFind]<<endl;
            else
                cout<<"find "<<i<<", not find!"<<endl;
        }
    }
    _getch();
	return 0;
}

上述代码的运行结果如下:

BinSearch1:
find 0, not find!
find 1, a[1] = 1
find 2, a[3] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch2:
find 0, not find!
find 1, a[2] = 1
find 2, a[4] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch3:
find 0, not find!
find 1, a[2] = 1
find 2, a[5] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch4:
find 0, not find!
find 1, a[0] = 1
find 2, a[3] = 2
find 3, not find!
find 4, not find!
find 5, not find!

可见,搜索包含了重复元素的有序序列时,不同的二分查找实现得到的结果是不一样的!

第1种方法找到a[1]处的1和a[3]处的2。

第2种方法找到a[2]处的1和a[4]处的2。

第3种方法找到a[2]处的1和a[5]处的2。(都是在数组中下标值最大的位置)

第4种方法找到a[0]处的1和a[3]处的2。(都是在数组中下标值最小的位置)

将3,4两种方法综合一下,我们就能确定一个有重复值的有序序列中,某一个值的上界和下界。


关于第3种方法的补充说明(摘自《代码之美》第4章作者的解释)

Escaping the Loop(退出循环)
Some look at my binary-search algorithm and ask why the loop always runs to the end
without checking whether it’s found the target. In fact, this is the correct behavior; the
math is beyond the scope of this chapter, but with a little work, you should be able to get
an intuitive feeling for it—and this is the kind of intuition I’ve observed in some of the
great programmers I’ve worked with.
Let’s think about the progress of the loop. Suppose you have n elements in the array,
where n is some really large number. The chance of finding the target the first time
through is 1/n, a really small number. The next iteration (after you divide the search set in
half) is 1/(n/2)—still small—and so on. In fact, the chance of hitting the target becomes
significant only when you’re down to 10 or 20 elements, which is to say maybe the last
four times through the loop. And in the case where the search fails (which is common in
many applications), those extra tests are pure overhead.
You could do the math to figure out when the probability of hitting the target approaches
50 percent, but qualitatively, ask yourself: does it make sense to add extra complexity to
each step of an O(log2 N) algorithm when the chances are it will save only a small number
of steps at the end?
The take-away lesson is that binary search, done properly, is a two-step process. First,
write an efficient loop that positions your low and high bounds properly, then add a simple
check to see whether you hit or missed.

翻译:

一些人看了我的二分查找算法会问为什么循环一定要运行到最后而不检查是否找到了目标。事实上,这是正确的行为。相关的数学证明已经超出了本章的范畴,

但是稍微想一想,你应该在直觉上感觉到这一点——而且我观察到这是和我共事过的伟大程序员们所拥有的一种直觉。

让我们看看循环的过程。假设你有一个n个元素的数组,n是一个很大的数,第一次就找到目标的几率是1/n,一个很小的值。第二次迭代(在第一次对半分割查找集之后)

能找到目标的几率是1/(n/2)——仍然是一个很小的数值——以此类推下去。事实上,只在最后剩下10到20个元素时命中目标的概率才会变得有意义,也就是最后的4次循环中。

在这个例子中当查找失败时(在多数程序中普遍存在),那些额外的测试是纯粹的额外开销。【注:这里指的是为了检查是否命中目标而写的三分支代码,如第1种和第2种算法】

你可以计算一下何时命中目标的概率能达到50%,但是凭心而论:在一个O(log2N)算法的每一步中添加额外的复杂度仅仅是为了节省最后几步是否真的有意义呢?

这里给我们的经验就是,恰当的二分查找是一个“两步走”的过程。首先,写一个高效的循环正确地定位下界和上界,然后添加一个简单的检查,看是否命中了目标。


参考书籍:

1 《编程珠玑》

2 《代码之美》英文版

3 《代码只没》中文版


附上《代码之美》中的java版代码:

package binary;

public class Finder {
  public static int find(String[] keys, String target) {
    int high = keys.length;
    int low = -1;
    while (high - low > 1) {
      int probe = (low + high) >>> 1;
      if (keys[probe].compareTo(target) > 0)
        high = probe;
      else
        low = probe;
    }
    if (low == -1 || keys[low].compareTo(target) != 0)
      return -1;
    else
    return low;
  }
}



你可能感兴趣的:(二分查找)