随机抽样一致(RANSAC,Random Sample Consensus)

随机抽样一致(RANSAC,Random Sample Consensus)

前言

随机采样一致(RANSAC)是一种迭代方法,可从一组包含离群值(outliers)的观察数据中估计数学模型的参数,不使离群值对估计值产生影响。因此,它也可以解释为离群值检测方法。从某种意义上说,它是一种非确定性算法,以一定的概率产生合理的结果,且随着迭代次数的增加,该概率增加。RANSAC由Fischler和Bolles于1981年首次提出,解决了位置确定问题(LDP,Location Determination Problem,LDP简述见附录A1 2

简而言之,通用RANSAC算法是通过最有可能的数据集合或者说内群值(inliers),排除离群值,拟合或估计一个高鲁棒性模型3

所以,RANSAC也可以理解成一种思想——排除可能存在的错误数据,来估计模型参数或者做一些其他的事情,如图像特征点匹配。这个跟主动学习(Active Learning)的思想有点相通之处。主动学习寻找尽可能少的标注点训练模型,就像RANSAC对inliers的迭代搜索。且两个过程都存在对outliers的判定和操作。主动学习的思想体系更繁杂一些。

回到RANSAC,其有一些基本假设需要我们知道。

  • 整个数据集由内群值(inliers)和离群值(outliers)组成;
  • 内群值的分布可用参数化模型解释,尽管其存在噪声;
  • 离群值不适合模型解释,其来自噪声的极值,错误的测量方法,对数据的错误假设等。

即使上述假设对于数据集不成立,即不存在outliers,也不影响RANSAC对模型的参数估计。因为在这种情况下,RANSAC的迭代过程,可以将整个整个数据集纳为内群值(inliers),然后估计模型参数。那么我们就来看一些,RANSAC的算法迭代流程是怎样进行的。

算法

直接搬运的Wiki的内容1,顺便贴一个KTH课件4上的图,两个描述有些许不同,但总的思想是一致的。

1. 描述

  1. 从数据集中随机选择一个子集,称作==“假设inliers(hypothetical inliers)”==,一致集合(consensus set)的初始样本集;
  2. 估计或训练一个模型,拟合上述子集;
  3. 基于某些损失函数(loss function)或者规则,从数据集剩余数据样本中,选择能较优地符合模型的数据样本,添加到一致集合。如,模型是一条直线方程,假如剩余的数据样本中存在到直线的距离小于阈值 T H TH TH的数据样本,认为该数据样本与模型一致,纳入一致集合;一致集合(consensus set)中的数据点为内群值(inliers),其余为离群值(outliers);
  4. 一致集合中有足够多的数据样本,认为2中的估计模型足够合理;
  5. 利用一致集合中的所有数据样本重新估计模型。
  6. 重复上述过程,最终返回误差最小的模型,或者包含inliers最多的模型。
    随机抽样一致(RANSAC,Random Sample Consensus)_第1张图片

2. 伪代码

Given:
    data – A set of observations.
    model – A model to explain observed data points.
    n – Minimum number of data points required to estimate model parameters.
    k – Maximum number of iterations allowed in the algorithm.
    t – Threshold value to determine data points that are fit well by model.
    d – Number of close data points required to assert that a model fits well to data.

Return:
    bestFit – model parameters which best fit the data (or nul if no good model is found)

iterations = 0
bestFit = nul
bestErr = something really large

while iterations < k do
    maybeInliers := n randomly selected values from data
    maybeModel := model parameters fitted to maybeInliers
    alsoInliers := empty set
    for every point in data not in maybeInliers do
        if point fits maybeModel with an error smaller than t
             add point to alsoInliers
    end for
    if the number of elements in alsoInliers is > d then
        // This implies that we may have found a good model
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    increment iterations
end while

return bestFit

例子

Talk is cheap, show me the codes!

1. Peter Kovesi写的MATLAB代码5

Reference 5中关于RANSAC拟合的部分。这个大神的其他关于图像处理的代码也很好。

RANSAC直线拟合(python)

python代码根据SciPy Cookbook中RANSAC的例子6改写而成,大家直接看其例子(代码传送门)也可,已经十分完美了。我增加了点自己的理解,只适合2D直线方程的RANSAC拟合。

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt

## Copyright (c) 2020, Wenquan.Zhao. All rights reserved.

def ransac(data, model, n, k, t, d, debug = False, return_all = False):
    """
    Fit model parameters to data using the RANSAC algorithm
    
    This implementation written from pseudocode found at wiki
    https://en.wikipedia.org/wiki/Random_sample_consensus

    Given:
        data  - A set of observations.                                                         # 数据集
        model - A model to explain observaed data points.                                      # 解释数据集的模型
        n     - Minimum number of data points required to estimate model parameters.           # 拟合模型所需最小数据点数
        k     – Maximum number of iterations allowed in the algorithm.                         # 最大允许迭代次数
        t     – Threshold value to determine data points that are fit well by model.           # 某点是否符合模型的阈值
        d     – Number of close data points required to assert that a model fits well to data. # inliers的数量阈值
    Return:
        bestFit - model parameters which best fit the data (or nul if no good model is found)  # 最优模型
    """
    iterations = 0
    bestFit = None
    besterr = np.inf
    best_inlier_idxs = None
    
    while iterations < k:
        maybe_idxs, test_idxs = random_partition(n,data.shape[0])
        maybeinliers = data[maybe_idxs]
        test_points = data[test_idxs]
        maybemodel = model.fit(maybeinliers)
        test_err = model.distance(test_points)
        also_idxs = test_idxs[test_err < t] # select indices of rows with accepted points
        alsoinliers = data[also_idxs,:]
        if debug:
            print('test_err.min()',test_err.min())
            print('test_err.max()',test_err.max())
            print('np.mean(test_err)',np.mean(test_err))
            print('iteration %d:len(alsoinliers) = %d'%(iterations,len(alsoinliers)))
        if len(alsoinliers) > d:
            betterdata = np.concatenate((maybeinliers, alsoinliers))
            bettermodel = model.fit(betterdata)
            better_errs = model.distance(betterdata)
            thiserr = np.mean( better_errs )
            if thiserr < besterr:
                bestFit = bettermodel
                besterr = thiserr
                best_inlier_idxs = np.concatenate( (maybe_idxs, also_idxs) )
        iterations+=1
    if bestFit is None:
        raise ValueError("did not meet fit acceptance criteria")
    if return_all:
        return bestFit, {'inliers':best_inlier_idxs}
    else:
        return bestFit

def random_partition(n,n_data):
    '''return n random rows of data'''
    all_idxs = np.arange(n_data)
    np.random.shuffle(all_idxs)
    idxs1 = all_idxs[:n]
    idxs2 = all_idxs[n:]
    return idxs1, idxs2
    
class LineModel():
    '''
    A 2D line model, use lstsq to fit it.
    '''
    def __init__(self, debug=False):
        self.debug = debug
        
    def fit(self, data):
        # use numpy.linalg.lstsq to 
        # fit a line, y = kx + b
        # rewrite the line equation as y = Ap,
        # where A = [[x 1]] and p = [[k], [b]]
        x = data[:,0]
        y = data[:,1]

        A = np.vstack([x, np.ones(len(x))]).T
        k, b = np.linalg.lstsq(A, y, None)[0]
        self.params = [k, b]
        self.residual = sum(abs(x * k + b - y))
        return [k, b]
        
    def distance(self, samples):
        """
        Calculates the vertical distances from the samples to the line model.
        """
        X = samples[:,0]
        Y = samples[:,1]
        k = self.params[0]
        b = self.params[1]
        dists = abs(X * k + b - Y)**2
        return dists

def testFitLine(disPlay1 = False, disPlay2 = False):
    # generate exact data set
    np.random.seed(1)
    n_samples = 500
    
    # y = k * x + b
    x_exact = 20*np.random.random((n_samples,1))  # x坐标, 500*1
    k = 50*np.random.normal(size =(1,1))             # 随机斜率k, 1*1
    b = np.random.rand(1)                         # 随机截距b
    y_exact = x_exact * k + b                        # y坐标, 500*1

    # add a little gaussian noise (linear least squares alone should handle this well)
    x_noisy = x_exact + np.random.normal(size=(n_samples,1)) # x坐标添加高斯噪声
    y_noisy = y_exact + np.random.normal(size=(n_samples,1)) # y坐标添加高斯噪声

    # add some outliers
    ratio = 0.2
    n_outliers = int(n_samples * ratio)         # 500个数据点随机选100个outliers
    all_idxs = np.arange(n_samples)
    np.random.shuffle(all_idxs)                 # 随机排序索引
    outlier_idxs = all_idxs[:n_outliers]        
    non_outlier_idxs = all_idxs[n_outliers:]    
    x_noisy[outlier_idxs] = 20*np.random.random((n_outliers,1) )       # outliers横坐标
    y_noisy[outlier_idxs] = 60*np.random.normal(size=(n_outliers,1) )  # outliers纵坐标
    
    if disPlay1:
        # exact data
        ax1 = plt.subplot(1,3,1)
        ax1.set_title('exact data')
        plt.plot(x_exact, y_exact, 'b.', label = 'exact_data')
        plt.legend()
        
        # noisy data
        ax2 = plt.subplot(1,3,2)
        ax2.set_title('noisy data')
        plt.plot(x_noisy, y_noisy, 'r.', label='noisy data')
        plt.legend()

        # data with outliers
        ax3 = plt.subplot(1,3,3)
        ax3.set_title('outliers data')
        plt.plot(x_noisy, y_noisy, 'r.', label='outliers data')
        plt.legend()
        plt.show()

    # setup model
    allData = np.hstack((x_noisy,y_noisy)) # 组成坐标对
    model = LineModel()
    [linear_k, linear_b] = model.fit(allData)
    # run RANSAC algorithm
    ransac_fit, ransac_data = ransac(allData, model, 50, 1000, 7e3, 300, return_all=True)
    
    if disPlay2:
        sort_idxs = np.argsort(x_exact[:,0]) # 对x_exact排序, sort_idxs为排序索引
        x_sorted = x_exact[sort_idxs]        # 方便画图,保证x轴数据是data set的数据

        if True:
            plt.plot( x_noisy, y_noisy, 'k.', lw=3, label='data' )
            plt.plot( x_noisy[ransac_data['inliers'],0], y_noisy[ransac_data['inliers'],0], 'bx', label='RANSAC data' )

        else:
            plt.plot( x_noisy[non_outlier_idxs,0], y_noisy[non_outlier_idxs,0], 'k.', label='noisy data' )
            plt.plot( x_noisy[outlier_idxs,0], y_noisy[outlier_idxs,0], 'r.', label='outlier data' )
            
        # compare
        y_ransac = x_sorted*ransac_fit[0] + ransac_fit[1]
        y_exact  = x_sorted *k + b
        y_linear = x_sorted * linear_k + linear_b
        
        plt.plot( x_sorted,y_ransac, label='RANSAC fit, k is %.2f' %ransac_fit[0])
        plt.plot( x_sorted,y_exact,label='exact system, k is %.2f' %k)
        plt.plot( x_sorted,y_linear,label='linear fit, k is %.2f' %linear_k)
        plt.legend()
        plt.title(r'RANSAC robust fit demo', fontsize=20)
        plt.show()

if __name__=='__main__':
    testFitLine(disPlay2=True)

执行结果如下,
随机抽样一致(RANSAC,Random Sample Consensus)_第2张图片

附录A

LDP问题如下图2所示, 随机抽样一致(RANSAC,Random Sample Consensus)_第3张图片

给定一个3D空间坐标系下的控制点(control points)或地标点(landmarks)集合,共 m m m个点,其3D坐标已知;给定一张图像,图像中包含 m m m个控制点的投影;确定图像(也就是成像原点,或者相机)在上述3D空间坐标系下的坐标。

实际上就是相机的位姿估计,为SLAM中的经典问题——3D-2D: PnP

Reference


  1. Random sample consensus from Wiki ↩︎ ↩︎

  2. Fischler, Martin A., and Robert C. Bolles. “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.” Communications of the ACM 24.6 (1981): 381-395. ↩︎ ↩︎

  3. Borkar, Amol, Monson Hayes, and Mark T. Smith. “Robust lane detection and tracking with ransac and kalman filter.” 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2009. ↩︎

  4. KTH Regression 课件 ↩︎

  5. MATLAB and Octave Functions for Computer Vision and Image Processing ↩︎

  6. SciPy Cookbook RANSAC ↩︎

你可能感兴趣的:(随机抽样一致(RANSAC,Random Sample Consensus))