SLAM方法汇总

SLAM概述

  • SLAM一般处理流程包括track和map两部分。所谓的track是用来估计相机的位姿,也叫front-end。而map部分(back-end)则是深度的构建,通过前面的跟踪模块估计得到相机的位姿,采用三角法(triangulation)计算相应特征点的深度,进行当前环境map的重建,重建出的map同时为front-end提供更好的姿态估计,并可以用于例如闭环检测.

  • 单目slam根据构建地图的稀疏程度可以大致分为:
    • 稀疏法(特征点),半稠密法,稠密法

  • 根据匹配方法,可分为:直接法和特征点法

           

  • 根据系统采用的优化策略,可分为Keyframe-based和filter-based方法;
 
  
  1. Strasdat H, Montiel J M M, Davison A J. Visual SLAM: why filter?[J]. Image and Vision Computing, 2012, 30(2): 65-77.  
  2. For all these scenarios, we conclude that keyframe bundle adjustment outperforms filtering, since it gives the most accuracy per unit of computing time.​

 

 

典型的单目slam系统

 

  • EKF-SLAM, FastSLAM 1.0, FastSLAM 2.0 and UKF-SLAM: http://www-personal.acfr.usyd.edu.au/tbailey/software/slam_simulations.htm 

         https://github.com/yglee/FastSLAM

         ekf-slam-matlab

         EKF-SLAM TOOLBOX FOR MATLAB

 

  • SceneLib2: SLAM originally designed and implemented by Professor Andrew Davison at Imperial College London

     > MonoSLAM: Real-Time Single Camera SLAM (PDF format), Andrew J. Davison, Ian Reid, Nicholas Molton and Olivier Stasse, IEEE Trans. PAMI 2007.

 

  • PTAM: http://www.robots.ox.ac.uk/~gk/PTAM/ 

          https://github.com/Oxford-PTAM/PTAM-GPL 

          https://ewokrampage.wordpress.com/ 

          https://github.com/tum-vision/tum_ardrone 

          PTAM类图.png

         > Georg Klein and David Murray, "Parallel Tracking and Mapping for Small AR Workspaces", Proc. ISMAR 2007

 

  • DTSLAM: Deferred Triangulation for Robust SLAM
    > Herrera C., D., Kim, K., Kannala, J., Pulli, K., Heikkila, J., DT-SLAM: Deferred Triangulation for Robust SLAM, 3DV, 2014.

 

  • LSD-SLAM: http://vision.in.tum.de/research/vslam/lsdslam  

          A novel, direct monocular SLAM technique: Instead of using keypoints, it directly operates on image intensities both for tracking and mapping. The camera is tracked using direct image alignment, while geometry is estimated in the form of semi-dense depth maps, obtained by filtering over many pixelwise stereo comparisons. We then build a Sim(3) pose-graph of keyframes, which allows to build scale-drift corrected, large-scale maps including loop-closures. LSD-SLAM runs in real-time on a CPU, and even on a modern smartphone.

          > LSD-SLAM: Large-Scale Direct Monocular SLAM (J. Engel, T. Schöps, D. Cremers)In European Conference on Computer Vision (ECCV), 2014. [bib] [pdf] [video]

 

  • SVO: Fast Semi-Direct Monocular Visual Odometry (ICRA 2014)

          SVO类图.png

          > Paper: http://rpg.ifi.uzh.ch/docs/ICRA14_Forster.pdf

 

  • ORB-SLAM2: Orbslam-workflow.png

          http://webdiis.unizar.es/~raulmur/orbslam/ 

          论文翻译:http://qiqitek.com/blog/?p=13 

        ORB-SLAM是西班牙Zaragoza大学的Raul Mur-Artal编写的视觉SLAM系统。他的论文“ORB-SLAM: a versatile and accurate monocular SLAM system"发表在2015年的IEEE Trans. on Robotics上。开源代码包括前期的ORB-SLAM[1]和后期的ORB-SLAM2[2]。第一个版本主要用于单目SLAM,而第二个版本支持单目、双目和RGBD三种接口。

        ORB-SLAM是一个完整的SLAM系统,包括视觉里程计、跟踪、回环检测。它是一种完全基于稀疏特征点的单目SLAM系统,其核心是使用ORB(Orinted FAST and BRIEF)作为整个视觉SLAM中的核心特征。具体体现在两个方面:

  • 提取和跟踪的特征点使用ORB。ORB特征的提取过程非常快,适合用于实时性强的系统。
  • 回环检测使用词袋模型,其字典是一个大型的ORB字典。
  • 接口丰富,支持单目、双目、RGBD多种传感器输入,编译时ROS可选,使得其应用十分轻便。代价是为了支持各种接口,代码逻辑稍为复杂。
  • 在PC机以30ms/帧的速度进行实时计算,但在嵌入式平台上表现不佳。

      它主要有三个线程组成:跟踪、Local Mapping(又称小图)、Loop Closing(又称大图)。跟踪线程相当于一个视觉里程计,流程如下:

  • 首先,对原始图像提取ORB特征并计算描述子。
  • 根据特征描述,在图像间进行特征匹配。
  • 根据匹配特征点估计相机运动。
  • 根据关键帧判别准则,判断当前帧是否为关键帧。

相比于多数视觉SLAM中利用帧间运动大小来取关键帧的做法,ORB_SLAM的关键帧判别准则较为复杂。

      > Raúl Mur-Artal, J. M. M. Montiel and Juan D. Tardós. ORB-SLAM: A Versatile and Accurate Monocular SLAM System.  IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, October 2015. [pdf]

      > Raúl Mur-Artal and Juan D. Tardós. Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM. Robotics: Science and Systems. Rome, Italy, July 2015. [pdf] [poster]

 

基于单目的稠密slam系统

 

  • DTAM: https://github.com/anuranbaka/OpenDTAM 

          http://homes.cs.washington.edu/~newcombe/papers/newcombe_etal_iccv2011.pdf

 

  • REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time (ICRA 2014)

          http://rpg.ifi.uzh.ch/docs/ICRA14_Pizzoli.pdf 

 

  • DPPTAM: DPPTAM is a direct monocular odometry algorithm that estimates a dense reconstruction of a scene in real-time on a CPU. Highly textured image areas are mapped using standard direct mapping techniques, that minimize the photometric error across different views. We make the assumption that homogeneous-color regions belong to approximately planar areas. Related Publication:

          > Alejo Concha, Javier Civera. DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS15), Hamburg, Germany, 2015

 

基于RGBD的稠密slam系统

 

  • Elastic Fusion: Real-time dense visual SLAM system

          ElasticFusion: Dense SLAM Without A Pose Graph, T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker and A. J. Davison, RSS '15

 

  • Kintinuous: Real-time large scale dense visual SLAM system  
    • Real-time Large Scale Dense RGB-D SLAM with Volumetric Fusion, T. Whelan, M. Kaess, H. Johannsson, M.F. Fallon, J. J. Leonard and J.B. McDonald, IJRR '14 
    • Kintinuous: Spatially Extended KinectFusion, T. Whelan, M. Kaess, M.F. Fallon, H. Johannsson, J. J. Leonard and J.B. McDonald, RSS RGB-D Workshop '12

 

  • RGBDSLAMv2:  a state-of-the-art SLAM system for RGB-D cameras, e.g., the Microsoft Kinect or the Asus Xtion Pro Live. You can use it to create 3D point clouds or OctoMaps.

          > "3D Mapping with an RGB-D Camera", F. Endres, J. Hess, J. Sturm, D. Cremers, W. Burgard, IEEE Transactions on Robotics, 2014.

 

  • RTAB-Map: Real-Time Appearance-Based Mapping

          The loop closure detector uses a bag-of-words approach to determinate how likely a new image comes from a previous location or a new location. When a loop closure hypothesis is accepted, a new constraint is added to the map's graph, then a graph optimizer minimizes the errors in the map. A memory management approach is used to limit the number of locations used for loop closure detection and graph optimization, so that real-time constraints on large-scale environnements are always respected. RTAB-Map can be used alone with a hand-held Kinect or stereo camera for 6DoF RGB-D mapping, or on a robot equipped with a laser rangefinder for 3DoF mapping.

          > M. Labbé and F. Michaud, “Online Global Loop Closure Detection for Large-Scale Multi-Session Graph-Based SLAM,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014.

 

  • DVO: Dense Visual Odometry and SLAM

          > Dense Visual SLAM for RGB-D Cameras (C. Kerl, J. Sturm, D. Cremers)In Proc. of the Int. Conf. on Intelligent Robot Systems (IROS), 2013.

          > Robust Odometry Estimation for RGB-D Cameras (C. Kerl, J. Sturm, D. Cremers)In Int. Conf. on Robotics and Automation, 2013.

 

Visual-Inertial Slam系统

  • ROVIO:Robust Visual Inertial Odometry 

          Paper: http://dx.doi.org/10.3929/ethz-a-010566547

 

  • OKVIS:Open Keyframe-based Visual Inertial SLAM

          Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart and Paul Timothy Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 2015.


最新单目slam系统

  • REBVO:Realtime Edge Based Visual Odometry for a Monocular Camera

        REBVO tracks a camera in Realtime using edges. The system is split in 2 components. An on-board part (rebvo itself) doing all the processing and sending data over UDP and an OpenGL visualizer.

        > Tarrio, J. J., & Pedre, S. (2015). Realtime Edge-Based Visual Odometry for a Monocular Camera. In Proceedings of the IEEE International Conference on Computer Vision (pp. 702-710).

 

  • Direct Sparse Odometry:  http://vision.in.tum.de/research/vslam/dso

           https://www.youtube.com/watch?v=C6-xwSOOdqQ 

          A novel direct and sparse formulation for Visual Odometry. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry - represented as inverse depth in a reference frame - and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. DSO does not depend on keypoint detectors or descriptors, thus it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

         > Direct Sparse Odometry (J. Engel, V. Koltun, D. Cremers)In arXiv:1607.02565, 2016. [bib] [pdf]

         > A Photometrically Calibrated Benchmark For Monocular Visual Odometry (J. Engel, V. Usenko, D. Cremers)In arXiv:1607.02555, 2016. [bib] [pdf]

 

  • svo 2.0

        > C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza. Svo 2.0: Semi-direct visual odometry for monocular and multi-camera systems. IEEE Trans- actions on Robotics, accepted, January 2016.

        > C. Forster, M. Pizzoli, and D. Scaramuzza. SVO: Fast Semi-Direct Monocular Visual Odometry. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2014. doi:10.1109/ICRA.2014.6906584.

 

典型的双目slam系统

 

  • LIBVISO2: http://www.cvlibs.net/software/libviso/

          LIBVISO2 (Library for Visual Odometry 2) is a very fast cross-platfrom (Linux, Windows) C++ library with MATLAB wrappers for computing the 6 DOF motion of a moving mono/stereo camera. The stereo version is based on minimizing the reprojection error of sparse feature matches and is rather general (no motion model or setup restrictions except that the input images must be rectified and calibration parameters are known). The monocular version is still very experimental and uses the 8-point algorithm for fundamental matrix estimation. It further assumes that the camera is moving at a known and fixed height over ground (for estimating the scale). Due to the 8 correspondences needed for the 8-point algorithm, many more RANSAC samples need to be drawn, which makes the monocular algorithm slower than the stereo algorithm, for which 3 correspondences are sufficent to estimate parameters. 

         > Geiger A, Ziegler J, Stiller C. Stereoscan: Dense 3d reconstruction in real-time[C]//Intelligent Vehicles Symposium (IV), 2011 IEEE. IEEE, 2011: 963-968.

         > Kitt B, Geiger A, Lategahn H. Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme[C]//Intelligent Vehicles Symposium. 2010: 486-492.

 

  • ORB-SLAM2: https://github.com/raulmur/ORB_SLAM2

          ORB-SLAM2 is a real-time SLAM library for Monocular, Stereo and RGB-D cameras that computes the camera trajectory and a sparse 3D reconstruction (in the stereo and RGB-D case with true scale). It is able to detect loops and relocalize the camera in real time. We provide examples to run the SLAM system in the KITTI dataset as stereo or monocular, and in theTUM dataset as RGB-D or monocular.

 

  • S-PTAM: Stereo Parallel Tracking and Mapping: https://github.com/lrse/sptam

          S-PTAM is a Stereo SLAM system able to compute the camera trajectory in real-time. It heavily exploits the parallel nature of the SLAM problem, separating the time-constrained pose estimation from less pressing matters such as map building and refinement tasks. On the other hand, the stereo setting allows to reconstruct a metric 3D map for each frame of stereo images, improving the accuracy of the mapping process with respect to monocular SLAM and avoiding the well-known bootstrapping problem. Also, the real scale of the environment is an essential feature for robots which have to interact with their surrounding workspace.

          > Taihú Pire, Thomas Fischer, Javier Civera, Pablo De Cristóforis and Julio Jacobo Berlles. Stereo Parallel Tracking and Mapping for Robot Localization Proc. of The International Conference on Intelligent Robots and Systems (IROS) (Accepted), Hamburg, Germany, 2015.

 

  • ORBSLAM_DWO: https://github.com/JzHuai0108/ORB_SLAM

   ORBSLAM_DWO is developed on top of ORB-SLAM with double window optimization by Jianzhu Huai. The major differences from ORB-SLAM are: (1) it can run with or without ROS, (2) it does not use the modified version of g2o shipped in ORB-SLAM, instead it uses the g2o from github, (3) it uses Eigen vectors and Sophus members instead of OpenCV Mat to represent pose entities, (4) it incorporates the pinhole camera model from rpg_vikit and a decay velocity motion model fromStereo PTAM, (5) currently, it supports monocular, stereo, and stereo + inertial input for SLAM, note it does not work with monocular + inertial input.

 

  •  Faster than real time visual odometry: https://github.com/halismai/bpvo

         A library for (semi-dense) real-time visual odometry from stereo data using direct alignment of feature descriptors. There are descriptors implemented. First, is raw intensity (no descriptor), which runs in real-time or faster. Second, is an implementation of the Bit-Planes descriptor designed for robust performance under challenging illumination conditions as described here andhere.

 

  • PL-StVO: Stereo Visual Odometry by combining point and line segment features 

        >Gómez-Ojeda R, González-Jiménez J. Robust Stereo Visual Odometry through a Probabilistic Combination of Points and Line Segments[J]. 2016.

 

  • ScaViSLAM

    This is a general and scalable framework for visual SLAM. It employs  "Double Window Optimization" (DWO) as described in our ICCV paper:

    > H. Strasdat, A.J. Davison, J.M.M. Montiel, and K. Konolige "Double Window Optimisation for Constant Time Visual SLAM" Proceedings of the IEEE International Conference on Computer Vision, 2011.​

 

闭环检测 

  • DLoopDetector:DLoopDetector is an open source C++ library to detect loops in a sequence of images collected by a mobile robot. It implements the algorithm presented in GalvezTRO12, based on a bag-of-words database created from image local descriptors, and temporal and geometrical constraints. The current implementation includes versions to work with SURF64 and BRIEF descriptors. DLoopDetector is based on the DBoW2 library, so that it can work with any other type of descriptor with little effort.

          > Bags of Binary Words for Fast Place Recognition in Image Sequences. D Gálvez-López, JD Tardos. IEEE Transactions on Robotics 28 (5), 1188-1197, 2012.

           > DBoW2: DBoW2 is an improved version of the DBow library, an open source C++ library for indexing and converting images into a bag-of-word representation.

 

  • FAB-MAP: FAB-MAP is a Simultaneous Localisation and Mapping algorithm which operates solely in appearance space. FAB-MAP performs location matching between places that have been visited within the world as well as providing a measure of the probability of being at a new, previously unvisited location. Camera images form the sole input to the system, from which OpenCV's feature extraction methods are used to develop bag-of-words representations for the Bayesian comparison technique.

 

优化工具库

  • g2o:g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA.
  • Ceres Solver:

    Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

    1. Non-linear Least Squares problems with bounds constraints.
    2. General unconstrained optimization problems.
  • GTSAM: GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices. On top of the C++ library, GTSAM includes a MATLAB interface (enable GTSAM_INSTALL_MATLAB_TOOLBOX in CMake to build it). A Python interface is under development.

 

Visual Odometry / SLAM Evaluation

  • 各大主流的vo和slam系统的精度性能评估网站

 

SLAM数据集

  • RGB-D SLAM Dataset and Benchmark:来自TUM,采用Kinect采集的数据集
  • TUM monoVO dataset
  • KITTI Vision Benchmark Suite:装备4个相机、高精度GPS和激光雷达,在城市道路采集的数据
  • Karlsruhe dataset sequence(双目): http://www.cvlibs.net/datasets/karlsruhe_sequences/ 
  • The EuRoC MAV Dataset:来自ETH,采用装备了VI-Sensor的四旋翼采集数据,双目数据集
  • MIT Stata Center Data Set: http://projects.csail.mit.edu/stata/index.php

 

SLAM综述相关References

[1] Cadena, Cesar, et al. "Simultaneous Localization And Mapping: Present, Future, and the Robust-Perception Age." arXiv preprint arXiv:1606.05830 (2016).  (Davide Scaramuzza等最新slam大综述paper,参考文献达300篇)

[2] Strasdat H, Montiel J M M, Davison A J. Visual SLAM: why filter?[J]. Image and Vision Computing, 2012, 30(2): 65-77. 

[3] Visual Odometry Part I The First 30 Years and Fundamentals

[4] Visual odometry Part II Matching, robustness, optimization, and applications

[5] Davide Scaramuzza: Tutorial on Visual Odometry 

[6] Factor Graphs and GTSAM: A Hands-on Introduction

[7] Aulinas J, Petillot Y R, Salvi J, et al. The SLAM problem: a survey[C]//CCIA. 2008: 363-371.

[8] Grisetti G, Kummerle R, Stachniss C, et al. A tutorial on graph-based SLAM[J]. IEEE Intelligent Transportation Systems Magazine, 2010, 2(4): 31-43.

[9] Saeedi S, Trentini M, Seto M, et al. Multiple‐Robot Simultaneous Localization and Mapping: A Review[J]. Journal of Field Robotics, 2016, 33(1): 3-46.

[10] Lowry S, Sünderhauf N, Newman P, et al. Visual place recognition: A survey[J]. IEEE Transactions on Robotics, 2016, 32(1): 1-19.

[11] Georges Younes, Daniel Asmar, Elie Shammas. A survey on non-filter-based monocular Visual SLAM systems. Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO), 2016. (针对目前开源的单目slam系统[ PTAM, SVO, DT SLAM, LSD SLAM, ORB SLAM, and DPPTAM] 每个模块采用的方法进行整理)

 

SLAM发展的每个重要阶段的主要大综述论文

你可能感兴趣的:(OpenCV)