RANSAC随机采样一致算法

RANSAC简介

RANSAC(RAndom SAmple Consensus,随机采样一致)算法是从一组含有“外点”(outliers)的数据中正确估计数学模型参数的迭代算法。“外点”一般指的的数据中的噪声,比如说匹配中的误匹配和估计曲线中的离群点。所以,RANSAC也是一种“外点”检测算法。RANSAC算法是一种不确定算法,它只能在一种概率下产生结果,并且这个概率会随着迭代次数的增加而加大(之后会解释为什么这个算法是这样的)。RANSAC算最早是由Fischler和Bolles在SRI上提出用来解决LDP(Location Determination Proble)问题的。

对于RANSAC算法来说一个基本的假设就是数据是由“内点”和“外点”组成的。“内点”就是组成模型参数的数据,“外点”就是不适合模型的数据。同时RANSAC假设:在给定一组含有少部分“内点”的数据,存在一个程序可以估计出符合“内点”的模型。

算法基本思想和流程

RANSAC是通过反复选择数据集去估计出模型,一直迭代到估计出认为比较好的模型。
具体的实现步骤可以分为以下几步:

  1. 选择出可以估计出模型的最小数据集;(对于直线拟合来说就是两个点,对于计算Homography矩阵就是4个点)
  2. 使用这个数据集来计算出数据模型;
  3. 将所有数据带入这个模型,计算出“内点”的数目;(累加在一定误差范围内的适合当前迭代推出模型的数据)
  4. 比较当前模型和之前推出的最好的模型的“内点“的数量,记录最大“内点”数的模型参数和“内点”数;
  5. 重复1-4步,直到迭代结束或者当前模型已经足够好了(“内点数目大于一定数量”)。
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 30 20:07:19 2018

@author: Yuki
"""
import random
import numpy as np
from matplotlib import pyplot as plt

# Magic Numbers
# Controls the inlier range
THRESHOLD = 0.1

# Finds random potential fit lines
def RANSAC(data):
    n = len(data)

    # Another magic number
    NUM_TRIALS = n // 2

    best_in_count = 0
    for i in range(0, NUM_TRIALS):
        r = random.sample(data, 2)
        r = np.array(r)

        # linear regression on two points will just give the line through both points
        m, b = lin_reg(r)

        # finds the line with the most inliers
        in_count = 0
        for j in data:
            # if the distance between the line and point is less than or equal to THRESHOLD it is an inlier
            if abs(j[1] - ((m * j[0]) + b)) <= THRESHOLD:
                in_count = in_count + 1
        # Tracks the best fit line so far
        if in_count > best_in_count:
            best_in_count = in_count
            best_m = m
            best_b = b

    # record both inliers and outliers to make end graph pretty
    in_line = []
    out_line = []
    for j in data:
        if abs(j[1] - ((best_m * j[0]) + best_b)) <= THRESHOLD:
            in_line.append(j)
        else:
            out_line.append(j)

    # returns two lists, inliers and outliers
    return in_line, out_line

# performs the linear regression as described on the assignment sheet
def lin_reg(data):
    n = float(len(data))
    x_sum = 0.0
    y_sum = 0.0

    # averages the x and y values
    for i in data:
        x_sum = x_sum + i[0]
        y_sum = y_sum + i[1]
    x_average = x_sum / n
    y_average = y_sum / n

    # initializes slope numerator and denominator
    # note denominator should not be zero with data
    m_numerator = 0.0
    m_denominator = 0.0

    # calculates the slope
    for i in data:
        m_numerator = m_numerator + ((i[0] - x_average)*(i[1] - y_average))
        m_denominator = m_denominator + ((i[0] - x_average)*(i[0] - x_average))
    m = m_numerator / m_denominator

    # finds the intercept
    b = y_average - (m * x_average)

    # returns slope and intercept
    return m, b

def plot_best_fit(data):

    # Get our inlier and outlier points
    in_line, out_line = RANSAC(data)

    # find the best fit line for inliers
    m, b = lin_reg(in_line)
    
    # This was the hardest part
    # Could not find a function that would make a non line segment so I just covered our domain
    # Admittedly with potential error on giant domains
    x_min = 100000.0
    x_max = -100000.0
    for i in data:
        if i[0] > x_max:
            x_max = i[0]
        if i[0] < x_min:
            x_min = i[0]
    domain = [x_min, x_max]
    line_points = [m * i + b for i in domain]
    line_points_top= [m * i + 0.5 * b for i in domain]
    line_points_bottom = [m * i + 1.2 * b for i in domain]
    
    # Plot the inliers as blue dots
    in_line = np.array(in_line)
    x, y = in_line.T
    plt.scatter(x, y)

    # plot the outliers as red x's
    # if statement for if outliers is empty, which it is for the easy case
    if out_line != []:
        out_line = np.array(out_line)
        x, y = out_line.T
        plt.scatter(x, y, s=30, c='r', marker='x')

    # plot our best fit line
    plt.plot(domain, line_points, '-')    
    plt.plot(domain, line_points_bottom, '-')    
    plt.plot(domain, line_points_top, '-')    
    plt.gca().invert_yaxis()
    # show the plot
    plt.title("Road-Line-Estimation")
    plt.xlabel('1/X')
    plt.ylabel('Laser')
    plt.show()
    
    # return slope and intercept for answers
    return m, b

# ----------------------------------------------------------------------------------------------------
#测试栗子
'''

data = []

with open('noisy_data_medium.txt') as file:
    # Creates 2D array to hold the data
    for l in file:
        data.append(l.split())

    # removes comma from first entry
    for i in data:
        i[0] = float(i[0][:-1])
        i[1] = float(i[1])

# function also returns slop and intercept should you want them
m, b = plot_best_fit(data)
print(m, b)
'''

 

你可能感兴趣的:(机器学习,RSANSAC)