随机采样一致(RANSAC)是一种迭代方法,可从一组包含离群值(outliers)的观察数据中估计数学模型的参数,不使离群值对估计值产生影响。因此,它也可以解释为离群值检测方法。从某种意义上说,它是一种非确定性算法,以一定的概率产生合理的结果,且随着迭代次数的增加,该概率增加。RANSAC由Fischler和Bolles于1981年首次提出,解决了位置确定问题(LDP,Location Determination Problem,LDP简述见附录A)1 2。
简而言之,通用RANSAC算法是通过最有可能的数据集合或者说内群值(inliers),排除离群值,拟合或估计一个高鲁棒性模型3。
所以,RANSAC也可以理解成一种思想——排除可能存在的错误数据,来估计模型参数或者做一些其他的事情,如图像特征点匹配。这个跟主动学习(Active Learning)的思想有点相通之处。主动学习寻找尽可能少的标注点训练模型,就像RANSAC对inliers的迭代搜索。且两个过程都存在对outliers的判定和操作。主动学习的思想体系更繁杂一些。
回到RANSAC,其有一些基本假设需要我们知道。
即使上述假设对于数据集不成立,即不存在outliers,也不影响RANSAC对模型的参数估计。因为在这种情况下,RANSAC的迭代过程,可以将整个整个数据集纳为内群值(inliers),然后估计模型参数。那么我们就来看一些,RANSAC的算法迭代流程是怎样进行的。
直接搬运的Wiki的内容1,顺便贴一个KTH课件4上的图,两个描述有些许不同,但总的思想是一致的。
Given:
data – A set of observations.
model – A model to explain observed data points.
n – Minimum number of data points required to estimate model parameters.
k – Maximum number of iterations allowed in the algorithm.
t – Threshold value to determine data points that are fit well by model.
d – Number of close data points required to assert that a model fits well to data.
Return:
bestFit – model parameters which best fit the data (or nul if no good model is found)
iterations = 0
bestFit = nul
bestErr = something really large
while iterations < k do
maybeInliers := n randomly selected values from data
maybeModel := model parameters fitted to maybeInliers
alsoInliers := empty set
for every point in data not in maybeInliers do
if point fits maybeModel with an error smaller than t
add point to alsoInliers
end for
if the number of elements in alsoInliers is > d then
// This implies that we may have found a good model
// now test how good it is.
betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
thisErr := a measure of how well betterModel fits these points
if thisErr < bestErr then
bestFit := betterModel
bestErr := thisErr
end if
end if
increment iterations
end while
return bestFit
Talk is cheap, show me the codes!
Reference 5中关于RANSAC拟合的部分。这个大神的其他关于图像处理的代码也很好。
python代码根据SciPy Cookbook中RANSAC的例子6改写而成,大家直接看其例子(代码传送门)也可,已经十分完美了。我增加了点自己的理解,只适合2D直线方程的RANSAC拟合。
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
## Copyright (c) 2020, Wenquan.Zhao. All rights reserved.
def ransac(data, model, n, k, t, d, debug = False, return_all = False):
"""
Fit model parameters to data using the RANSAC algorithm
This implementation written from pseudocode found at wiki
https://en.wikipedia.org/wiki/Random_sample_consensus
Given:
data - A set of observations. # 数据集
model - A model to explain observaed data points. # 解释数据集的模型
n - Minimum number of data points required to estimate model parameters. # 拟合模型所需最小数据点数
k – Maximum number of iterations allowed in the algorithm. # 最大允许迭代次数
t – Threshold value to determine data points that are fit well by model. # 某点是否符合模型的阈值
d – Number of close data points required to assert that a model fits well to data. # inliers的数量阈值
Return:
bestFit - model parameters which best fit the data (or nul if no good model is found) # 最优模型
"""
iterations = 0
bestFit = None
besterr = np.inf
best_inlier_idxs = None
while iterations < k:
maybe_idxs, test_idxs = random_partition(n,data.shape[0])
maybeinliers = data[maybe_idxs]
test_points = data[test_idxs]
maybemodel = model.fit(maybeinliers)
test_err = model.distance(test_points)
also_idxs = test_idxs[test_err < t] # select indices of rows with accepted points
alsoinliers = data[also_idxs,:]
if debug:
print('test_err.min()',test_err.min())
print('test_err.max()',test_err.max())
print('np.mean(test_err)',np.mean(test_err))
print('iteration %d:len(alsoinliers) = %d'%(iterations,len(alsoinliers)))
if len(alsoinliers) > d:
betterdata = np.concatenate((maybeinliers, alsoinliers))
bettermodel = model.fit(betterdata)
better_errs = model.distance(betterdata)
thiserr = np.mean( better_errs )
if thiserr < besterr:
bestFit = bettermodel
besterr = thiserr
best_inlier_idxs = np.concatenate( (maybe_idxs, also_idxs) )
iterations+=1
if bestFit is None:
raise ValueError("did not meet fit acceptance criteria")
if return_all:
return bestFit, {'inliers':best_inlier_idxs}
else:
return bestFit
def random_partition(n,n_data):
'''return n random rows of data'''
all_idxs = np.arange(n_data)
np.random.shuffle(all_idxs)
idxs1 = all_idxs[:n]
idxs2 = all_idxs[n:]
return idxs1, idxs2
class LineModel():
'''
A 2D line model, use lstsq to fit it.
'''
def __init__(self, debug=False):
self.debug = debug
def fit(self, data):
# use numpy.linalg.lstsq to
# fit a line, y = kx + b
# rewrite the line equation as y = Ap,
# where A = [[x 1]] and p = [[k], [b]]
x = data[:,0]
y = data[:,1]
A = np.vstack([x, np.ones(len(x))]).T
k, b = np.linalg.lstsq(A, y, None)[0]
self.params = [k, b]
self.residual = sum(abs(x * k + b - y))
return [k, b]
def distance(self, samples):
"""
Calculates the vertical distances from the samples to the line model.
"""
X = samples[:,0]
Y = samples[:,1]
k = self.params[0]
b = self.params[1]
dists = abs(X * k + b - Y)**2
return dists
def testFitLine(disPlay1 = False, disPlay2 = False):
# generate exact data set
np.random.seed(1)
n_samples = 500
# y = k * x + b
x_exact = 20*np.random.random((n_samples,1)) # x坐标, 500*1
k = 50*np.random.normal(size =(1,1)) # 随机斜率k, 1*1
b = np.random.rand(1) # 随机截距b
y_exact = x_exact * k + b # y坐标, 500*1
# add a little gaussian noise (linear least squares alone should handle this well)
x_noisy = x_exact + np.random.normal(size=(n_samples,1)) # x坐标添加高斯噪声
y_noisy = y_exact + np.random.normal(size=(n_samples,1)) # y坐标添加高斯噪声
# add some outliers
ratio = 0.2
n_outliers = int(n_samples * ratio) # 500个数据点随机选100个outliers
all_idxs = np.arange(n_samples)
np.random.shuffle(all_idxs) # 随机排序索引
outlier_idxs = all_idxs[:n_outliers]
non_outlier_idxs = all_idxs[n_outliers:]
x_noisy[outlier_idxs] = 20*np.random.random((n_outliers,1) ) # outliers横坐标
y_noisy[outlier_idxs] = 60*np.random.normal(size=(n_outliers,1) ) # outliers纵坐标
if disPlay1:
# exact data
ax1 = plt.subplot(1,3,1)
ax1.set_title('exact data')
plt.plot(x_exact, y_exact, 'b.', label = 'exact_data')
plt.legend()
# noisy data
ax2 = plt.subplot(1,3,2)
ax2.set_title('noisy data')
plt.plot(x_noisy, y_noisy, 'r.', label='noisy data')
plt.legend()
# data with outliers
ax3 = plt.subplot(1,3,3)
ax3.set_title('outliers data')
plt.plot(x_noisy, y_noisy, 'r.', label='outliers data')
plt.legend()
plt.show()
# setup model
allData = np.hstack((x_noisy,y_noisy)) # 组成坐标对
model = LineModel()
[linear_k, linear_b] = model.fit(allData)
# run RANSAC algorithm
ransac_fit, ransac_data = ransac(allData, model, 50, 1000, 7e3, 300, return_all=True)
if disPlay2:
sort_idxs = np.argsort(x_exact[:,0]) # 对x_exact排序, sort_idxs为排序索引
x_sorted = x_exact[sort_idxs] # 方便画图,保证x轴数据是data set的数据
if True:
plt.plot( x_noisy, y_noisy, 'k.', lw=3, label='data' )
plt.plot( x_noisy[ransac_data['inliers'],0], y_noisy[ransac_data['inliers'],0], 'bx', label='RANSAC data' )
else:
plt.plot( x_noisy[non_outlier_idxs,0], y_noisy[non_outlier_idxs,0], 'k.', label='noisy data' )
plt.plot( x_noisy[outlier_idxs,0], y_noisy[outlier_idxs,0], 'r.', label='outlier data' )
# compare
y_ransac = x_sorted*ransac_fit[0] + ransac_fit[1]
y_exact = x_sorted *k + b
y_linear = x_sorted * linear_k + linear_b
plt.plot( x_sorted,y_ransac, label='RANSAC fit, k is %.2f' %ransac_fit[0])
plt.plot( x_sorted,y_exact,label='exact system, k is %.2f' %k)
plt.plot( x_sorted,y_linear,label='linear fit, k is %.2f' %linear_k)
plt.legend()
plt.title(r'RANSAC robust fit demo', fontsize=20)
plt.show()
if __name__=='__main__':
testFitLine(disPlay2=True)
给定一个3D空间坐标系下的控制点(control points)或地标点(landmarks)集合,共 m m m个点,其3D坐标已知;给定一张图像,图像中包含 m m m个控制点的投影;确定图像(也就是成像原点,或者相机)在上述3D空间坐标系下的坐标。
实际上就是相机的位姿估计,为SLAM中的经典问题——3D-2D: PnP
Random sample consensus from Wiki ↩︎ ↩︎
Fischler, Martin A., and Robert C. Bolles. “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.” Communications of the ACM 24.6 (1981): 381-395. ↩︎ ↩︎
Borkar, Amol, Monson Hayes, and Mark T. Smith. “Robust lane detection and tracking with ransac and kalman filter.” 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2009. ↩︎
KTH Regression 课件 ↩︎
MATLAB and Octave Functions for Computer Vision and Image Processing ↩︎
SciPy Cookbook RANSAC ↩︎