ICRA2022 SLAM相关论文整理

目录

  • 视觉SLAM相关
    • 线、面、3Dfeature
    • 多相机系统
    • VIO+GPS
    • 其他
    • 数据集
    • 三维重建
    • 语义SLAM
  • Special sensor
  • Lidar相关

视觉SLAM相关

线、面、3Dfeature

EDPLVO: Efficient Direct Point-Line Visual Odometry

周博paper,高效的线特征使用方式,DSO-based

This paper introduces an efficient direct visual odometry (VO) algorithm using points and lines. Pixels on lines are generally adopted in direct methods. However, the original photometric error is only defined for points. It seems difficult to extend it to lines. In previous works, the collinear constraints for points on lines are either ignored [1] or introduce heavy computational load into the resulting optimization system [2]. This paper extends the photometric error for lines. We prove that the 3D points of the points on a 2D line are determined by the inverse depths of the endpoints of the 2D line, and derive a closed-form solution for this problem. This property can significantly reduce the number of variables to speed up the optimization, and can make the collinear constraint exactly satisfied. Furthermore, we introduce a two-step method to further accelerate the optimization, and prove the convergence of this method. The experimental results show that our algorithm outperforms the state-of-the-art direct VO algorithms.

UV-SLAM: Unconstrained Line-Based SLAM Using Vanishing Points for Structural Mapping

优化方法,点线约束+灭影点约束,代码开源

In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines’ normal vectors in the Plucker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.

MSC-VO: Exploiting Manhattan and Structural Constraints for Visual Odometry

曼哈顿世界线表示

Visual odometry algorithms tend to degrade when facing low-textured scenes from e.g. human-made environments, where it is often difficult to find a sufficient number of point features. Alternative geometrical visual cues, such as lines, which can often be found within these scenarios, can become particularly useful. Moreover, these scenarios typically present structural regularities, such as parallelism or orthogonality, and hold the Manhattan World assumption. Under these premises, in this work, we introduce MSC-VO, an RGB-D -based visual odometry approach that combines both point and line features and leverages, if exist, those structural regularities and the Manhattan axes of the scene. Within our approach, these structural constraints are initially used to estimate accurately the 3D position of the extracted lines. These constraints are also combined next with the estimated Manhattan axes and the reprojection errors of points and lines to refine the camera pose by means of local map optimization. Such a combination enables our approach to operate even in the absence of the aforementioned constraints, allowing the method to work for a wider variety of scenarios. Furthermore, we propose a novel multi-view Manhattan axes estimation procedure that mainly relies on line features. MSC-VO is assessed using several public datasets, outperforming other state-of-the-art solutions, and comparing favourably even with some SLAM methods.

Integrating Point and Line Features for Visual-Inertial Initialization

点线特征用于初始化

Accurate and robust initialization is crucial in visual-inertial system, which significantly affects the localization accuracy. Most of the existing feature-based initialization methods rely on point features to estimate initial parameters. However, the performance of these methods often decreases in real scene, as point features are unstable and may be discontinuously observed especially in low textured environments. By contrast, line features, providing richer geometrical information than points, are also very common in man-made buildings. Thereby, in this paper, we propose a novel visual-inertial initialization method integrating both point and line features. Specifically, a closed-form method of line features is presented for initialization, which is combined with point-based method to build an integrated linear system. Parameters including initial velocity, gravity, point depth and line’s endpoints depth can be jointly solved out. Furthermore, to refine these parameters, a global optimization method is proposed, which consists of two novel nonlinear least squares problems for respective points and lines. Both gravity magnitude and gyroscope bias are considered in refinement. Extensive experimental results on both simulated and public datasets show that integrating point and line features in initialization stage can achieve higher accuracy and better robustness compared with pure point-based methods.

Unified Representation of Geometric Primitives for Graph-SLAM Optimization Using Decomposed Quadrics

3D物体的统一表示,可以用一个表示方式表示点线面,3D物体,方便优化In Simultaneous Localization And Mapping (SLAM) problems, high-level landmarks have the potential to build compact and informative maps compared to traditional point-based landmarks. In this work, we focus on the parameterization of frequently used geometric primitives including points, lines, planes, ellipsoids, cylinders, and cones. We first present a unified representation based on quadrics, leading to a consistent and concise formulation. Then we further study a decomposed model of quadrics that discloses the symmetric and degenerated properties of a primitive. Based on the decomposition, we develop geometrically meaningful quadrics factors in the settings of a graph-SLAM problem. Then in simulation experiments, it is shown that the decomposed formulation has better efficiency and robustness to observation noises than baseline parameterizations. Finally, in real-world experiments, the proposed back-end framework is demonstrated to be capable of building compact and regularized maps.

多相机系统

Globally Optimal Relative Pose Estimation for Multi-Camera Systems with Known Gravity Direction

多相机系统位姿估计

Multiple-camera systems have been widely used in self-driving cars, robots, and smartphones. In addition, they are typically also equipped with IMUs (inertial measurement units). Using the gravity direction extracted from the IMU data, the y-axis of the body frame of the multi-camera system can be aligned with this common direction, reducing the original three degree-of-freedom(DOF) relative rotation to a single DOF one. This paper presents a novel globally optimal solver to compute the relative pose of a generalized camera. Existing optimal solvers based on LM (Levenberg-Marquardt) method or SDP (semidefinite program) are either iterative or have high computational complexity. Our proposed optimal solver is based on minimizing the algebraic residual objective function. According to our derivation, using the least-squares algorithm, the original optimization problem can be converted into a system of two polynomials with only two variables. The proposed solvers have been tested on synthetic data and the KITTI benchmark. The experimental results show that the proposed methods have competitive robustness and accuracy compared with the existing state-of-the-art solvers.

Information-Theoretic Online Multi-Camera Extrinsic Calibration

多相机在线外参标定,keyframe提取方法很特别

Calibration of multi-camera systems is essential for lifelong use of vision-based headsets and autonomous robots. In this work, we present an information-based framework for online extrinsic calibration of multi-camera systems. While previous work largely focuses on monocular, stereo, or strictly non-overlapping field-of-view (FoV) setups, we allow arbitrary con-figurations while also exploiting overlapping pairwise FoV when possible. In order to efficiently solve for the extrinsic calibration parameters, which increase linearly with the number of cameras, we propose a novel entropy-based keyframe measure and bound the backend optimization complexity by selecting informative motion segments that minimize the maximum entropy across all extrinsic parameter partitions. We validate the pipeline on three distinct platforms to demonstrate the generality of the method for resolving the extrinsics and performing downstream tasks. Our code is available at https://github.com/edexheim/info_ext_calib.

Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry

多相机视觉惯性里程计的前端算法

We present a multi-camera visual-inertial odometry system based on factor graph optimization which estimates motion by using all cameras simultaneously while retaining a fixed overall feature budget. We focus on motion tracking in challenging environments, such as narrow corridors, dark spaces with aggressive motions, and abrupt lighting changes. These scenarios cause traditional monocular or stereo odometry to fail. While tracking motion with extra cameras should theoretically prevent failures, it leads to additional complexity and computational burden. To overcome these challenges, we introduce two novel methods to improve multi-camera feature tracking. First, instead of tracking features separately in each camera, we track features continuously as they move from one camera to another. This increases accuracy and achieves a more compact factor graph representation. Second, we select a fixed budget of tracked features across the cameras to reduce back-end optimization time. Our proposed method was extensively tested using a hardware-synchronized device consisting of an IMU and four cameras (a front stereo pair and two lateral) in scenarios including: an underground mine, large open spaces, and building interiors with narrow stairs and corridors. Compared to stereo-only state-of-the-art visual-inertial odometry methods, our approach reduces the drift rate, relative pose error, by up to 80% in translation and 39% in rotation.

VIO+GPS

Tightly-Coupled GNSS-Aided Visual-Inertial Localization

VIO紧耦合融合GPS(黄老师实验室论文)

A navigation system which can output drift-free global trajectory estimation with local consistency holds great potential for autonomous vehicles and mobile devices. We propose a tightly-coupled GNSS-aided visual-inertial navigation system (GAINS) which is able to leverage the complementary sensing modality from a visual-inertial sensing pair, which provides high-frequency local information, and a Global Navigation Satellite System (GNSS) receiver with low-frequency global observations. Specifically, the raw GNSS measurements (including pseudorange, carrier phase changes, and Doppler frequency shift) are carefully leveraged and tightly fused within a visual-inertial framework. The proposed GAINS can accurately model the raw measurement uncertainties by canceling the atmospheric effects (e.g., ionospheric and tropospheric delays) which requires no prior model information. A robust state initialization procedure is presented to facilitate the fusion of global GNSS information with local visual-inertial odometry, and the spatiotemporal calibration between IMU-GNSS are also optimized in the estimator. The proposed GAINS is evaluated on extensive Monte-Carlo simulations on a trajectory generated from a large-scale urban driving dataset with specific verification for each component (i.e., online calibration and system initialization). GAINS also demonstrates competitive performance against existing state-of-the-art methods on a publicly available dataset with ground truth.

GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation

融合GPS,港科,VINS-based

Visual-inertial odometry (VIO) is known to suffer from drifting, especially over long-term runs. In this paper, we present GVINS, a non-linear optimization based system that tightly fuses global navigation satellite system (GNSS) raw measurements with visual and inertial information for real-time and drift-free state estimation. Our system aims to provide accurate global 6-DoF estimation under complex indoor-outdoor environments where GNSS signals may be intermittent or even inaccessible. To establish the connection between global measurements and local states, a coarse-to-fine initialization procedure is proposed to efficiently calibrate the transformation online and initialize GNSS states from only a short window of measurements. The GNSS code pseudorange and Doppler shift measurements, along with visual and inertial information, are then modelled and used to constrain the system states in a factor graph framework. For complex and GNSS-unfriendly areas, the degenerate cases are discussed and carefully handled to ensure robustness. Thanks to the tightly coupled multi-sensor approach and system design, our system fully exploits the merits of three types of sensors and is able to seamlessly cope with the transition between indoor and outdoor environments, where satellites are lost and reacquired.

其他

Map-Based Visual-Inertial Localization: A Numerical Study

基于特征点地图的VIO定位方法,EKF和SKF比较,MSCKF-based(黄老师实验室论文)

We revisit the problem of efficiently leveraging prior map information within a visual-inertial estimation framework. The use of traditional landmark-based maps with 2D-to-3D measurements along with the recently introduced keyframe-based maps with 2D-to-2D measurements are investigated. The full joint estimation of the prior map is compared within a visual-inertial simulator to the Schmidt-Kalman filter (SKF) and measurement inflation methods in terms of their computational complexity, consistency, accuracy, and memory usage. This study shows that the SKF can enable efficient and consistent estimation for small workspace scenarios and the use of 2D-to-3D landmark maps have the highest levels of accuracy. Keyframe-based 2D-to-2D maps can reduce the required state size while still enabling accuracy gains. Finally, we show that measurement inflation methods, after tuning, can be accurate and efficient for large-scale environments if the guarantee of consistency is relaxed.

Periodic SLAM: Using Cyclic Constraints to Improve the Performance of Visual-Inertial SLAM on Legged Robots

legged robots图像抖动很大,提出适用于相机剧烈运动情况的VIO算法,利用一些周期性可预测的规律提高定位精度。

Methods for state estimation that rely on visual information are challenging on legged robots due to rapid changes in the viewing angle of onboard cameras. In this work, we show that by leveraging structure in the way that the robot locomotes, the accuracy of visual-inertial SLAM in these challenging scenarios can be increased. We present a method that takes advantage of the underlying periodic predictability often present in the motion of legged robots to improve the performance of the feature tracking module within a visual-inertial SLAM system. Our method performs multi-session SLAM on a single robot, where each session is responsible for mapping during a distinct portion of the robot’s gait cycle. Our method produces lower absolute trajectory error than several state-of-the-art methods for visual-inertial SLAM in both a simulated environment and on data collected on a quadrupedal robot executing dynamic gaits. On real-world bounding gaits, our median trajectory error was less than 35% of the error of the next best estimate provided by state-of-the-art methods.

1D-LRF Aided Visual-Inertial Odometry for High-Altitude MAV Flight

用TOF辅助VIO定位,欢迎阅读

This paper addresses the problem of visual-inertial odometry (VIO) with a downward facing monocular camera when a micro aerial vehicle (MAV) flying at high altitude (over 100 meters). It is important to note that large scene depth causes visual motion constraints significantly less informative than that in near-sighted scenarios as considered in most existing VIO methods. To cope with this challenge, we develop an efficient MSCKF-based VIO algorithm aided by a single 1D laser range finder (LRF), termed LRF-VIO, which runs in real time on an embedded system. The key idea of the proposed LRF-VIO is to fully exploit the limited metric distance information provided by the 1D LRF to disambiguate the scale during visual feature tracking, thus improving the VIO performance at high altitude. Specifically, during the MSCKF visual measurement update, we deliberately constrain the depth of those SLAM features co-planar with the single LRF measuring point. Additionally, delayed initialization of features utilizes the LRF measurements whenever possible, and online extrinsic calibration between the LRF and monocular camera is performed to further improve estimation accuracy and robustness. The proposed LRF-VIO is extensively validated in both indoor and outdoor real-world experiments, outperforming the state-of-the-art methods.

Accurate and Robust Object-Oriented SLAM with 3D Quadric Landmarks in Outdoors

Object-oriented SLAM is a popular technology in autonomous driving and robotics. In this paper, we propose a stereo visual SLAM with a robust quadric landmark representation method. The system consists of four components, including deep learning detection, quadric landmark initialization, object data association and object pose optimization. State-of-the-art quadric-based SLAM algorithms always face observation-related problems and are sensitive to observation noise, which limits their application in outdoor scenes. To solve this problem, we propose a quadric initialization method based on the separation of the quadric parameters method, which improves the robustness to observation noise. The sufficient object data association algorithm and object-oriented optimization with multiple cues enable a highly accurate object pose estimation that is robust to local observations. Experimental results show that the proposed system is more robust to observation noise and significantly outperforms current state-of-the-art methods in outdoor environments. In addition, the proposed system demonstrates real-time performance.

FEJ2: A Consistent Visual-Inertial State Estimator Design

FEJ的改进 ,比FEJ一致性更好(黄老师实验室论文)

Continuous-Time Spline Visual-Inertial Odometry

连续时间的VIO

We propose a continuous-time spline-based formulation for visual-inertial odometry (VIO). Specifically, we model the poses as a cubic spline, whose temporal derivatives are used to synthesize linear acceleration and angular velocity, which are compared to the measurements from the inertial measurement unit (IMU) for optimal state estimation. The spline boundary conditions create constraints between the camera and the IMU, with which we formulate VIO as a constrained nonlinear optimization problem. Continuous-time pose representation makes it possible to address many VIO challenges, e.g., rolling shutter distortion and sensors that may lack synchronization. We conduct experiments on two publicly available datasets that demonstrate the state-of-the-art accuracy and real-time computational efficiency of our method.

Continuous-Time vs. Discrete-Time Vision-Based SLAM: A Comparative Study

同上,方法更清晰

Robotic practitioners generally approach the vision-based SLAM problem through discrete-time formulations. This has the advantage of a consolidated theory and very good understanding of success and failure cases. However, discrete-time SLAM needs tailored algorithms and simplifying assumptions when high-rate and/or asynchronous measurements, coming from different sensors, are present in the estimation process. Conversely, continuous-time SLAM, often overlooked by practitioners, does not suffer from these limitations. Indeed, it allows integrating new sensor data asynchronously without adding a new optimization variable for each new measurement. In this way, the integration of asynchronous or continuous high-rate streams of sensor data does not require tailored and highly-engineered algorithms, enabling the fusion of multiple sensor modalities in an intuitive fashion. On the down side, continuous time introduces a prior that could worsen the trajectory estimates in some unfavorable situations. In this work, we aim at systematically comparing the advantages and limitations of the two formulations in vision-based SLAM. To do so, we perform an extensive experimental analysis, varying robot type, speed of motion, and sensor modalities. Our experimental analysis suggests that, independently of the trajectory type, continuous-time SLAM is superior to its discrete counterpart whenever the sensors are not time-synchronized.

HiPE: Hierarchical Initialization for Pose Graphs

Pose graph optimization is a non-convex optimization problem encountered in many areas of robotics perception. Its convergence to an accurate solution is conditioned by two factors: the non-linearity of the cost function in use and the initial configuration of the pose variables. In this paper, we presented HiPE: a novel hierarchical algorithm for pose graph initialization. Our approach exploits a coarse-grained graph that encodes an abstract representation of the problem geometry. We construct this graph by combining maximum likelihood estimates coming from local regions of the input. By leveraging the sparsity of this representation, we can initialize the pose graph in a non-linear fashion, without computational overhead compared to existing methods. The resulting initial guess can effectively bootstrap the fine-grained optimization that is used to obtain the final solution. In addition, we performed an empirical analysis on the impact of different cost functions on the final estimate. Our experimental evaluation shows that the usage of HiPE leads to a more efficient and robust optimization process, comparing favorably with state- of-the-art methods.

Constrained Visual-Inertial Localization with Application and Benchmark in Laparoscopic Surgery

We propose a novel method to tackle the visual-inertial SLAM localization problem for constrained camera movements. We use residuals from the different modalities to jointly optimize a global cost function. The residuals emerge from IMU measurements, stereoscopic feature points, and constraints on possible solutions in S E ( 3 ) SE(3) SE(3). In settings where image quality is low and disturbances are frequent, the residuals reduce the complexity of the problem and make localization feasible. We verify the advantages of our method in a suitable medical use case and produce a dataset capturing a minimally invasive surgery in the abdomen. Our novel clinical dataset MITI is comparable to state-of-the-art evaluation datasets contains calibration and synchronization and is available at https://mediatum.ub.tum.de/1621941.

DM-VIO: Delayed Marginalization Visual-Inertial Odometry

VIO 边缘化优化 Delayed Marginalization

We present DM-VIO, a monocular visual-inertial odometry system based on two novel techniques called delayed marginalization and pose graph bundle adjustment. DM-VIO performs photometric bundle adjustment with a dynamic weight for visual residuals. We adopt marginalization, which is a popular strategy to keep the update time constrained, but it cannot easily be reversed, and linearization points of connected variables have to be fixed. To overcome this we propose delayed marginalization: The idea is to maintain a second factor graph, where marginalization is delayed. This allows us to later readvance this delayed graph, yielding an updated marginalization prior with new and consistent linearization points. In addition, delayed marginalization enables us to inject IMU information into already marginalized states. This is the foundation of the proposed pose graph bundle adjustment, which we use for IMU initialization. In contrast to prior works on IMU initialization, it is able to capture the full photometric uncertainty, improving the scale estimation. We evaluate our system on the EuRoC, TUM-VI, and 4Seasons datasets, which comprise flying drone, large-scale handheld, and automotive scenarios. Thanks to the proposed IMU initialization, our system exceeds the state of the art in visual-inertial odometry, even outperforming stereo-inertial methods while using only a single camera and IMU. The code will be published at http://vision.in.tum.de/dm-vio

Leveraging Structural Information to Improve Point Line Visual-Inertial Odometry

点线VIO优化,武汉大学

Leveraging line features to improve the accuracy of the SLAM system has been studied in many works. However, making full use of the characteristics of different line features (parallel, non-parallel) to improve the SLAM system is rarely mentioned. In this paper, we designed a VIO system based on points and straight lines, which divides straight lines into structural (that is, straight lines parallel to each other) and non-structural. To optimize the line features effectively, we used two-parameter representation methods for both structural and non-structural straight lines. Furthermore, we designed a stable line matching method based on frame-to-frame (2d-2d) and map-to-frame (3d-2d) strategies which can significantly improve the trajectory accuracy of the system. We conducted ablation experiments on synthetic data and public datasets, and also compared our method with state-of-the-art algorithms. The experiments verified the combination of different line features can improve the accuracy of the VIO system, and also demonstrated the effectiveness of our system.

Decoupled Right Invariant Error States for Consistent Visual-Inertial Navigation

features放在状态变量里时,IEKF的改进,计算更轻量,且保证一致性(黄老师实验室论文)

The invariant extended Kalman filter (IEKF) is proven to preserve the observability property of visual-inertial navigation systems (VINS) and suitable for consistent estimator design. However, if features are maintained in the state vector, the propagation of IEKF will become more computationally expensive because these features are involved in the covariance propagation. To address this issue, we propose two novel algorithms which preserve the system consistency by leveraging the invariant state representation and ensure efficiency by decoupling features from covariance propagation. The first algorithm combines right invariant error states with first-estimates Jacobian (FEJ) technique, by decoupling the features from the Lie group representation and utilizing FEJ for consistent estimation. The second algorithm is designed specifically for sliding-window filter-based VINS as it associates the features to an active cloned pose for Lie group representation. A new pseudo-anchor change algorithm is also proposed to maintain the features in the state vector longer than the window span. Both decoupled right- and left-invariant error based VINS methods are implemented for a complete comparison. Extensive Monte-Carlo simulations on three simulated trajectories and real world evaluations on the TUM-VI datasets are provided to verify our analysis and demonstrate that the proposed algorithms can achieve improved accuracy than a state-of-art filter-based VINS algorithm using FEJ.

Multirobot Collaborative Monocular SLAM Utilizing Rendezvous

多机器人协同SLAM

This paper proposes a collaborative monocular SLAM including a map fusion algorithm that utilizes rendezvous. Unlike existing rendezvous-based approaches that require additional sensors, the proposed system uses a monocular camera only. Our system can recognize robot rendezvous using non-static features (NSFs) without fiducial markers to identify team robots. NSFs, which are abandoned as outliers in typical SLAM systems for not supporting ego-motion, can include relative bearing measurements between robots in a rendezvous situation. The proposed pipeline consists of 1) a feature identification module that extracts the relative bearing measurements between robots from NSFs consisting of anonymous bearing vectors with false positives, and 2) a map fusion module that integrates local maps from each robot into a global map using identified measurements. The feature identification module can operate quickly using the proposed alternating minimization algorithm formulated by two sub-problems with closed-form solutions. The experimental results confirm that our multi-robot SLAM recognizes rendezvous rapidly and robustly, and fuses local maps into a global map accurately.

PixSelect: Less but Reliable Pixels for Accurate and Efficient Localization

华为,SFM heatmap 特征选择策略

Accurate camera pose estimation is a fundamental requirement for numerous applications, such as autonomous driving, mobile robotics, and augmented reality. In this work, we address the problem of estimating the global 6 DoF camera pose from a single RGB image in a given environment. Previous works consider every part of the image valuable for localization. However, many image regions such as the sky, occlusions, and repetitive non-distinguishable patterns cannot be utilized for localization. In addition to adding unnecessary computation efforts, extracting and matching features from such regions produce many wrong matches which in turn degrades the localization accuracy and efficiency. Our work addresses this particular issue and shows by exploiting an interesting concept of sparse 3D models that we can exploit discriminatory environment parts and avoid useless image regions for the sake of a single image localization. Interestingly, through avoiding selecting keypoints from non-reliable image regions such as trees, bushes, cars, pedestrians, and occlusions, our work acts naturally as an outlier filter. This makes our system highly efficient in that minimal set of correspondences is needed and highly accurate as the number of outliers is low. Our work exceeds state-ofthe-art methods on outdoor Cambridge Landmarks dataset. With only relying on single image at inference, it outweighs in terms of accuracy methods that exploit pose priors and/or reference 3D models while being much faster. By choosing as little as 100 correspondences, it surpasses similar methods that localize from thousands of correspondences, while being more efficient. In particular, it achieves, compared to these methods, an improvement of localization by 33% on OldHospital scene. Furthermore, It outstands direct pose regressors even those that learn from sequence of images. Our work will be publicly available

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

learning 方法恢复VO尺度

Monocular visual odometry (VO) has attracted extensive research attention by providing real-time vehicle motion from cost-effective camera images. However, state-of-the-art optimization-based monocular VO methods suffer from the scale inconsistency problem for long-term predictions. Deep learning has recently been introduced to address this issue by leveraging stereo sequences or ground-truth motions in the training dataset. However, it comes at an additional cost for data collection, and such training data may not be available in all datasets. In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases. Specifically, we first train a scale-aware disparity network using both monocular real images and stereo virtual data. The virtual-to-real domain gap is bridged by using an adversarial training strategy to map images from both domains into a shared feature space. The resulting scale-consistent disparities are then integrated with a direct VO system by constructing a virtual stereo objective that ensures the scale consistency over long trajectories. Additionally, to address the suboptimality issue caused by the separate optimization backend and the learning process, we further propose a mutual reinforcement pipeline that allows bidirectional information flow between learning and optimization, which boosts the robustness and accuracy of each other. We demonstrate the effectiveness of our framework on the KITTI and vKITTI2 datasets.

Tightly-Coupled Magneto-Visual-Inertial Fusion for Long Term Localization in Indoor Environment

We propose in this paper a tightly-coupled fusion of visual, inertial and magnetic data for long-term localization in indoor environment. Unlike state-of-the-art Visual-Inertial SLAM (VISLAM) solutions that reuse visual map to prevent drift, we present in this paper an extension of the Multi-State Constraint Kalman Filter (MSCKF) that exploits a magnetic map. It makes our solution more robust to variations of the environment appearance. The experimental results demonstrate that the localization accuracy of the proposed approach is almost the same over time periods longer than a year.

VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

RGBD用平面约束矫正累计漂移In this paper, we propose a tightly-coupled SLAM system fused with RGB, Depth, IMU and structured plane information. Traditional sparse points based SLAM systems always maintain a mass of map points to model the environment. Huge number of map points bring us a high computational complexity, making it difficult to be deployed on mobile devices. On the other hand, planes are common structures in man-made environment especially in indoor environments. We usually can use a small number of planes to represent a large scene. So the main purpose of this article is to decrease the high complexity of sparse points based SLAM. We build a lightweight back-end map which consists of a few planes and map points to achieve efficient Bundle Adjustment (BA) with an equal or better accuracy. We use homography constraints to eliminate the parameters of numerous plane points in the optimization and reduce the complexity of BA. We separate the parameters and measurements in homography and point-to-plane constraints and compress the measurements part to further effectively improve the speed of BA. We also integrate the plane information into the whole system to realize robust planar feature extraction, data association, and global consistent planar reconstruction. Finally, we perform an ablation study and compare our method with similar methods in simulation and real environment data. Our system achieves obvious advantages in accuracy and efficiency. Even if the plane parameters are involved in the optimization, we effectively simplify the back-end map by using plane structures. The global bundle adjustment is nearly 2 times faster than the sparse points based SLAM algorithm.

Electric Sense Based Pose Estimation and Localization for Small Underwater Robots

Accurate pose estimation and localization technology is always a challenge for small underwater robots, since the under-water lighting conditions could limit the use of cameras while the cramped environments restrict the use of sonars. In nature, some fishes perceive other creatures by sensing the weakly changes in their environmental electric field. Inspired by such passive electric sense behavior in fish, this letter presents an electro-localization scheme based on passive electric sense for short-distance accurate pose estimation and localization of small underwater robots. Our scheme includes a hardware solution to the electric sense and a pose estimation method. Specifically, first, we design a hardware solution including an electric emitter placed in the underwater environment and an electric receiver that can be carried by a small underwater robot. Then we construct the theoretical model of the electric field generated by our designed hardware. Finally, we propose an electric sense based method to estimate the position and orientation of the free-swimming robot, where a particle filter is utilized to combine the odometer and electric sense based measurements while the dynamic model of the robot is also included. Localization experiments for the small underwater robot equipped with the electric receiver are conducted, and the experimental results demonstrate the robustness and effectiveness of our proposed electric sense based pose estimation approach, especially in

Improved State Propagation through AI-Based Pre-Processing and Down-Sampling of High-Speed Inertial Data

We present a novel approach to improve 6 degree-of-freedom state propagation for unmanned aerial vehicles in a classical filter through pre-processing of high-speed inertial data with AI algorithms. We evaluate both an LSTM-based approach as well as a Transformer encoder architecture. Both algorithms take as input short sequences of fixed length N of high-rate inertial data provided by an inertial measurement unit (IMU) and are trained to predict in turn one pre-processed IMU sample that minimizes the state propagation error of a classical filter across M sequences. This setup allows us to provide sufficient temporal history to the networks for good performance while maintaining a high propagation rate of pre-processed IMU samples important for later deployment on real-world systems. In addition, our network architectures are formulated to directly accept input data at variable rates thus minimizing necessary data preprocessing. The results indicate that the LSTM based architecture outperforms the Transformer encoder architecture and significantly improves the propagation error even for long IMU propagation times.

Improved State Propagation through AI-Based Pre-Processing and Down-Sampling of High-Speed Inertial Data

We present a novel approach to improve 6 degree-of-freedom state propagation for unmanned aerial vehicles in a classical filter through pre-processing of high-speed inertial data with AI algorithms. We evaluate both an LSTM-based approach as well as a Transformer encoder architecture. Both algorithms take as input short sequences of fixed length N of high-rate inertial data provided by an inertial measurement unit (IMU) and are trained to predict in turn one pre-processed IMU sample that minimizes the state propagation error of a classical filter across M sequences. This setup allows us to provide sufficient temporal history to the networks for good performance while maintaining a high propagation rate of pre-processed IMU samples important for later deployment on real-world systems. In addition, our network architectures are formulated to directly accept input data at variable rates thus minimizing necessary data preprocessing. The results indicate that the LSTM based architecture outperforms the Transformer encoder architecture and significantly improves the propagation error even for long IMU propagation times.

Joint Localization Based on Split Covariance Intersection on the Lie Group

This paper presents a pose fusion method that accounts for the possible correlations among measurements. The proposed method can handle data fusion problems whose uncertainty has both independent and dependent parts. Different from the existing methods, the uncertainties of the various states or measurements are modeled on the Lie algebra and projected to the manifold through the exponential map, which is more precise than that modeled in the vector space. The correlation is based on the theory of covariance intersection, where the independent and dependent parts are split to yield a more consistent result. In this paper, we provide a novel method for the correlated pose fusion algorithm on the manifold. Theoretical derivation and analysis are detailed first, and then, the experimental results are presented to support the proposed theory. The main contributions are threefold: (1) We provide a theoretical foundation for the split covariance intersection filter performed on the manifold, where the uncertainty is associated with the Lie algebra. (2) The proposed method gives an explicit fusion formalism on SE(3) and SE(2), which covers the most use cases in the field of robotics. (3) We present a localization framework that can work for both single-robot and multirobot systems, where not only the fusion with possible correlation is derived on the manifold but also the state evolution and relative pose computation are performed on the manifold. The experimental results va

数据集

HD Ground - a Database for Ground Texture Based Localization

开源地面纹理数据集

We present the HD Ground Database, a comprehensive database for ground texture based localization. It contains sequences of a variety of textures, obtained using a downward facing camera. In contrast to existing databases of ground images, the HD Ground Database is larger, has a greater variety of textures, and has a higher image resolution with less motion blur. Also, our database enables the first systematic study of how natural changes of the ground that occur over time affect localization performance, and it allows to examine a teach-and-repeat navigation scenario. We use the HD Ground Database to evaluate four state-of-the-art localization approaches for global localization, localization with the approximate pose being known, and relative localization.

The Visual-Inertial-Dynamical Multirotor Dataset

人工加了强干扰的无人机数据,抖动很严重Recently, the community has witnessed numerous datasets built for developing and testing state estimators. However, for some applications such as aerial transportation or search-and-rescue, the contact force or other disturbance must be perceived for robust planning and control, which is beyond the capacity of these datasets. This paper introduces a Visual-Inertial-Dynamical (VID) dataset, not only focusing on traditional six degrees of freedom (6-DOF) pose estimation but also providing dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hardware synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights rotor speed and motor current measurements, control inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information in the real world for pose and external force evaluation.

M2DGR: A Multi-Sensor and Multi-Scenario SLAM Dataset for Ground Robots

无人车数据

We introduce M2DGR: a novel large-scale dataset collected by a ground robot with a full sensor-suite including six fish-eye and one sky-pointing RGB cameras, an infrared camera, an event camera, a Visual-Inertial Sensor (VI-sensor), an inertial measurement unit (IMU), a LiDAR, a consumer-grade Global Navigation Satellite System (GNSS) receiver and a GNSS-IMU navigation system with real-time kinematic (RTK) signals. All those sensors were well-calibrated and synchronized, and their data were recorded simultaneously. The ground truth trajectories were obtained by the motion capture device, a laser 3D tracker, and an RTK receiver. The dataset comprises 36 sequences (about 1TB) captured in diverse scenarios including both indoor and outdoor environments. We evaluate state-of-the-art SLAM algorithms on M2DGR. Results show that existing solutions perform poorly in some scenarios. For the benefit of the research community, we make the dataset and tools public. The webpage of our project is https://github.com/SJTU-ViSYS/M2DGR.

三维重建

Performance Guarantees for Spectral Initialization in Rotation Averaging and Pose-Graph SLAM

rotation average 优化算法,用于三维重建

In this work we present the first initialization methods equipped with explicit performance guarantees that are adapted to the pose-graph simultaneous localization and mapping (SLAM) and rotation averaging (RA) problems. SLAM and rotation averaging are typically formalized as large-scale nonconvex point estimation problems, with many bad local minima that can entrap the smooth optimization methods typically applied to solve them; the performance of standard SLAM and RA algorithms thus crucially depends upon the quality of the estimates used to initialize this local search. While many initialization methods for SLAM and RA have appeared in the literature, these are typically obtained as purely heuristic approximations, making it difficult to determine whether (or under what circumstances) these techniques can be reliably deployed. In contrast, in this work we study the problem of initialization through the lens of spectral relaxation. Specifically, we derive a simple spectral relaxation of SLAM and RA, the form of which enables us to exploit classical linear-algebraic techniques (eigenvector perturbation bounds) to control the distance from our spectral estimate to both the (unknown) ground-truth and the global minimizer of the estimation problem as a function of measurement noise. Our results reveal the critical role that spectral graph-theoretic properties of the measurement network play in controlling estimation accuracy; moreover, as a by-product of our analysis we obtain new bounds on the estimation error for the maximum likelihood estimators in SLAM and RA, which are likely to be of independent interest. Finally, we show experimentally that our spectral estimator is very effective in practice, producing initializations of comparable or superior quality at lower computational cost compared to existing state-of-the-art techniques.

Incremental Abstraction in Distributed Probabilistic SLAM Graphs

三维重建之平面重建

Scene graphs represent the key components of a scene in a compact and semantically rich way, but are difficult to build during incremental SLAM operation because of the challenges of robustly identifying abstract scene elements and optimising continually changing, complex graphs. We present a distributed, graph-based SLAM framework for incrementally building scene graphs based on two novel components. First, we propose an incremental abstraction framework in which a neural network proposes abstract scene elements that are incorporated into the factor graph of a feature-based monocular SLAM system. Scene elements are confirmed or rejected through optimisation and incrementally replace the points yielding a more dense, semantic and compact representation. Second, enabled by our novel routing procedure, we use Gaussian Belief Propagation (GBP) for distributed inference on a graph processor. The time per iteration of GBP is structure agnostic and we demonstrate the speed advantages over direct methods for inference of heterogeneous factor graphs. We run our system on real indoor datasets using planar abstractions and recover the major planes with significant compression.

Crossview Mapping with Graph-Based Geolocalization on City-Scale Street Maps

浙大zhangguofeng老师实验室

3D environment mapping has been actively studied recently with the development of autonomous driving and augmented reality. Although many image-based methods are proposed due to their convenience and flexibility compared to other complex sensors, few works focus on fixing the inherent scale ambiguity of image-based methods and registering the reconstructed structure to the real-world 3D map, which is very important for autonomous driving. This paper presents a low-cost mapping solution that is able to refine and align the monocular reconstructed point cloud given a public street map. Specifically, we first find the association between the street map and the reconstructed point cloud structure by a novel graph-based geolocalization method. Then, optimized with the corresponding relationship, the map accuracy is significantly improved. The rich environment information can also be associated with the point cloud by the geographical location. Experiments show that our geolocalization algorithm can locate the scene on a gigantic city-scale map (173.46 km2 ) in two minutes and support 3D map reconstruction with absolute scale and rich environmental information from Internet videos.

DynamicFilter: An Online Dynamic Objects Removal Framework for Highly Dynamic Environments

Emergence of massive dynamic objects will diversify spatial structures when robots navigate in urban environments. Therefore, the online removal of dynamic objects is critical. In this paper, we introduce a novel online removal framework for highly dynamic urban environments. The framework consists of the scan-to-map front-end and the map-to-map back-end modules. Both the front- and back-ends deeply integrate the visibility-based approach and map-based approach. The experiments validate the framework in highly dynamic simulation scenarios and real-world dataset.

Memory-Efficient Gaussian Fitting for Depth Images in Real Time

Computing consumes a significant portion of energy in many robotics applications, especially the ones involving energy-constrained robots. In addition, memory access accounts for a signficant portion of the computing energy. For mapping a 3D environment, prior approaches reduce the map size while incurring a large memory overhead used for storing sensor measurements and temporary variables during computation. In this work, we present a memory-efficient algorithm, named Single-Pass Gaussian Fitting (SPGF), that accurately constructs a compact Gaussian Mixture Model (GMM) which approximates measurements from a depthmap generated from a depth camera. By incrementally constructing the GMM one pixel at a time in a single pass through the depthmap, SPGF achieves higher throughput and orders-of-magnitude lower memory overhead than prior multi-pass approaches. By processing the depthmap row-by-row, SPGF exploits intrinsic properties of the camera to efficiently and accurately infer surface geometries, which leads to higher precision than prior approaches while maintaining the same compactness of the GMM. Using a low-power ARM Cortex-A57 CPU on the NVIDIA Jetson TX2 platform, SPGF operates at 32fps, requires 43KB of memory overhead, and consumes only 0.11J per frame (depthmap). Thus, SPGF enables real-time mapping of large 3D environments on energy-constrained rob

语义SLAM

Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when conducting more complicated tasks. Multi-agent collaboration can improve the efficiency and has strong application potentials. In this paper, we propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects. It is a challenging task that requires agents to learn reasonable collaboration strategies to perform efficient exploration under the restrictions of communication bandwidth. We develop a hierarchical decision framework based on semantic mapping, scene prior knowledge, and communication mechanism to solve this task. The experimental results in unseen scenes with both seen objects and unseen objects illustrate the higher accuracy and efficiency of the proposed model compared with the single-agent model.

LTSR: Long-Term Semantic Relocalization Based on HD Map for Autonomous Vehicles

华为,基于语义地图的定位,车载

Highly accurate and robust relocalization or localization initialization ability is of great importance for autonomous vehicles (AVs). Traditional GNSS-based methods are not reliable enough in occlusion and multipath conditions. In this paper we propose a novel long-term semantic relocalization algorithm based on HD map and semantic features which are compact in representation. Semantic features appear widely on urban roads, and are robust to illumination, weather, view-point and appearance changes. Repeated structures, missed and false detections make data association (DA) highly ambiguous. To this end, a robust semantic feature matching method based on a new local semantic descriptor which encodes the spatial and normal relationship between semantic features is performed. Further, we introduce an accurate, efficient, yet simple outlier removal method which works by assessing the local and global geometric consistencies and temporal consistency of semantic matching pairs. The experimental results on our urban dataset demonstrate that our approach performs better in accuracy and robustness compared with the current state-of-the-art methods.

DA-LMR: A Robust Lane Marking Representation for Data Association

多段直线的表示和对应的数据关联

While complete localization approaches are widely studied in the literature, their data association and data representation subprocesses usually go unnoticed. However, both are a key part of the final pose estimation. In this work, we present DA-LMR (Delta-Angle Lane Marking Representation), a robust data representation in the context of localization approaches. We propose a representation of lane markings that encodes how a curve changes in each point and includes this information in an additional dimension, thus providing a more detailed geometric structure description of the data. We also propose DC-SAC (Distance-Compatible Sample Consensus), a data association method. This is a heuristic version of RANSAC that dramatically reduces the hypothesis space by distance compatibility restrictions. We compare the presented methods with some state-of-the-art data representation and data association approaches in different noisy scenarios. The DA-LMR and DC-SAC produce the most promising combination among those compared, reaching 98.1% in precision and 99.7% in recall for noisy data with 0.5 m of standard deviation.

SemLoc: Accurate and Robust Visual Localization with Semantic and Structural Constraints from Prior Maps

基于语义地图定位,EM算法

Semantic information and geometrical structures of a prior map can be leveraged in visual localization to bound drift errors and improve accuracy. In this paper, we propose SemLoc, a pure visual localization system, for accurate localization in a prior semantic map. To tightly couple semantic and structure information from prior maps, a hybrid constraint is presented by using the Dirichlet distribution. Then, with the local landmarks and their semantic states tracked in the frontend, the camera poses and data associations are jointly optimized through Expectation-Maximization (EM) algorithm. We validate the effectiveness of our approach in both monocular and stereo modes on the public KITTI dataset. Experimental results demonstrate that our system can greatly reduce drift errors with an satisfying real-time performance. Compared with several state-of-the-art visual localization systems, the proposed framework achieves a competitive localization performance.

Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation

learning 方法

Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per-frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association-free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios using only monocular camera images and vehicle odometry.

Object-Based Visual-Inertial Navigation System on Matrix Lie Group

通过tracking 车辆bounding box来辅助进行定位

In this paper, we propose a novel object-based visual-inertial navigation system fully embedded in a matrix Lie group and built upon the invariant Kalman filtering theory. Specifically, we focus on relative pose measurements of objects and derive an error equation at the associated tangent space. We prove that the observability property does not suffer from the filter inconsistency and nonlinear error terms are identically zero at the object initialization. A thorough Monte-Carlo simulation reveals that our approach yields consistent estimates and is very robust to a large initial state uncertainty. Furthermore, we demonstrate a real-world application to the KITTI dataset with a deep neural network-based 3D object detector. Experimental results report that noises on pose measurements follow a Gaussian-like density matching our assumption. The proposed method improves the localization and object global mapping accuracy by probabilistically accounting for inertial readings and object pose uncertainties at multiple views.

Towards Accurate Loop Closure Detection in Semantic SLAM with 3D Semantic Covisibility Graphs

语义slam之闭环检测Loop closure is necessary for correcting errors accumulated in simultaneous localization and mapping (SLAM) in unknown environments. However, conventional loop closure methods based on low-level geometric or image features may cause high ambiguity by not distinguishing similar scenarios. Thus, incorrect loop closure can occur. Though semantic 2D image information is considered in some literature to detect loop closures, there is little work that compares 3D scenes as an integral part of a semantic SLAM system. This paper introduces an approach integrated in a semantic SLAM system to combine high-level 3D semantic information and low-level feature information to conduct accurate loop closure detection and effective drift reduction. The effectiveness of our approach is demonstrated in testing results.

A Right Invariant Extended Kalman Filter for Object Based SLAM

With the recent advance of deep learning based object recognition and estimation, it is possible to consider object level SLAM where the pose of each object is estimated in the SLAM process. In this paper, based on a novel Lie group structure, a right invariant extended Kalman filter (RI-EKF) for object based SLAM is proposed. The observability analysis shows that the proposed algorithm automatically maintains the correct unobservable subspace, while standard EKF (Std-EKF) based SLAM algorithm does not. This results in a better consistency for the proposed algorithm comparing to Std-EKF. Finally, simulations and real world experiments validate not only the consistency and accuracy of the proposed algorithm, but also the practicability of the proposed RI-EKF for object based SLAM problem. The MATLAB code of the algorithm is made publicly available.

LaneMatch: A Practical Real-Time Localization Method Via Lane-Matching

This paper presents LaneMatch, a localization method for use in autonomous vehicles (AVs). It utilizes lane matching to obtain an AV’s lane occupancy and current pose estimation. Matching is performed on a compact low-resolution road map generated from satellite images. Our approach addresses the following challenges for AVs using this map: (1) misalignment between roads on the satellite images and their global coordinates, and (2) incomplete or incorrect lane detection outputs. First, LaneMatch estimates the offset between the AV‚Äôs global pose in the global coordinate system and its local map pose on the map. Secondly, LaneMatch utilizes a spatio-temporal integration of a particle filter and a factor graph to resolve lane-matching ambiguities. It strategically constrains the dimensionality of variables to obtain real-time performance. We use highway experiments to evaluate the processing time, occupancy accuracy, lateral/longitudinal and position/heading errors. These experiments show that LaneMatch localizes our AV rather precisely on the road map in real-time and can be used for navigation and planning purposes even in GNSS-unfriendly areas.

Binary Graph Descriptor for Robust Relocalization on Heterogeneous Data

In this paper, we propose a novel binary graph descriptor to improve loop detection for visual SLAM systems. Our contribution is fourfold: i) a graph embedding technique for generating binary descriptors which conserve both spatial and histogram frequential information extracted from images; ii) a generic mean of combining multiple layers of heterogeneous information into the proposed binary graph (BiG) descriptor, coupled with a matching and geometric checking method; iii) an implementation of our descriptor into an incremental Bag-of-Words (iBoW) structure without pre-trained vocabulary, in order to accelerate the computation process and to improve the efficiency and scalability with the help of inverted indexing techniques; and iv) a method that converts DNN (Deep Neural Network) results to our descriptor for quick loop detection tasks. We evaluate our system under synthetic and real datasets, across different weather and seasonal conditions, generically implemented on various data formats. The proposed method outperforms state-of-the-art loop detection frameworks in terms of relocalization precision and computational performance, ensures compatibility with heterogeneous information and displays high robustness against cross-condition datasets in loop detection tasks.

Special sensor

DEVO: Depth-Event Camera Visual Odometry in Challenging Conditions

事件相机+深度传感器We present a novel real-time visual odometry framework for a stereo setup of a high-resolution event and depth camera. Our framework balances accuracy and robustness against computational efficiency towards strong performance in challenging scenarios. We extend conventional edge-based semi-dense visual odometry towards time-surface maps obtained from event streams. Semi-dense depth maps are generated by warping the corresponding depth values of the extrinsically calibrated depth camera. The tracking module updates the camera pose through efficient, geometric semi-dense 3D-2D edge alignment. Our approach is validated on both public and self-collected datasets captured under various conditions. We show that the proposed method performs comparable to state-of-the-art RGB-D camera-based alternatives in regular conditions, and eventually outperforms in challenging conditions such as high dynamics or low illumination.

Asynchronous Optimisation for Event-Based Visual Odometry

事件相机

Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range. On the other hand, developing effective event-based vision algorithms that fully exploit the beneficial properties of event cameras remains work in progress. In this paper, we focus on event-based visual odometry (VO). While existing event-driven VO pipelines have adopted continuous-time representations to asynchronously process event data, they either assume a known map, restrict the camera to planar trajectories, or integrate other sensors into the system. Towards map-free event-only monocular VO in SE(3), we propose an asynchronous structure-from-motion optimisation back-end. Our formulation is underpinned by a principled joint optimisation problem involving non-parametric Gaussian Process motion modelling and incremental maximum a posteriori inference. A high-performance incremental computation engine is employed to reason about the camera trajectory with every incoming event. We demonstrate the robustness of our asynchronous back-end in comparison to frame-based methods which depend on accurate temporal accumulation of measuremen

360VO: Visual Odometry Using a Single 360 Camera

DSO-based, 相机360°

In this paper, we propose a novel direct visual odometry algorithm to take the advantage of a 360-degree camera for robust localization and mapping. Our system extends direct sparse odometry by using a spherical camera model to process equirectangular images without rectification to attain omnidirectional perception. After adapting mapping and optimization algorithms to the new model, camera parameters, including intrinsic and extrinsic parameters, and 3D mapping can be jointly optimized within the local sliding window. In addition, we evaluate the proposed algorithm using both real world and large-scale simulated scenes for qualitative and quantitative validations. The extensive experiments indicate that our system achieves start of the art results.

Fast-MbyM: Leveraging Translational Invariance of the Fourier Transform for Efficient and Accurate Radar Odometry

加快计算速度,傅里叶变换解耦R和t

Masking by Moving (MbyM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MbyM, we decouple the search for angle and translation. By maintaining end-to-end differentiability a neural network is used to mask scans and trained by supervising pose prediction directly. Training faster and with less memory, utilising a decoupled search allows f-MbyM to achieve significant run-time performance improvements on a CPU (168%) and to run in real-time on embedded devices, in stark contrast to MbyM. Throughout, our approach remains accurate and competitive with the best radar odometry variants available in the literature - achieving an end-point drift of 2.01% in translation and 6.3deg/km on the Oxford Radar RobotCar Dataset.

AutoPlace: Robust Place Recognition with Single-Chip Automotive Radar

This paper presents a novel place recognition approach to autonomous vehicles by using low-cost, single-chip automotive radars. Aimed at improving recognition robustness and fully exploiting the rich information provided by this emerging automotive radar, our approach follows a principled pipeline that comprises (1) dynamic points removal from instant Doppler measurements, (2) spatial-temporal feature embedding on radar point clouds, and (3) retrieved candidates refinement from Radar Cross Section measurement. Extensive experimental results on the public nuScenes dataset demonstrate that existing approaches designed for visual/LiDAR/scanning radar-based place recognition are less suitable to single-chip automotive radar. In contrast, our purpose-built approach for radars consistently outperforms a variety of baseline methods via a comprehensive set of metrics, providing insights into the efficacy when used in a realistic system.

DC-Loc: Accurate Automotive Radar Based Metric Localization with Explicit Doppler Compensation

Automotive mmWave radar has been widely used in the automotive industry due to its small size, low cost, and complementary advantages to optical sensors (e.g., cameras, LiDAR, etc.) in adverse weathers, e.g., fog, raining, and snowing. On the other side, its large wavelength also poses fundamental challenges to perceive the environment. Recent advances have made breakthroughs on its inherent drawbacks, i.e., the multipath reflection and the sparsity of mmWave radar’s point clouds. However, the frequency-modulated continuous wave modulation of radar signals makes it more sensitive to vehicles’ mobility than optical sensors. This work focuses on the problem of frequency shift, i.e., the Doppler effect distorts the radar ranging measurements and its knock-on effect on metric localization. We propose a new radar-based metric localization framework, termed DC-Loc, which can obtain more accurate location estimation by restoring the Doppler distortion. Specifically, we first design a new algorithm that explicitly compensates the Doppler distortion of radar scans and then model the measurement uncertainty of the Doppler compensated point cloud to further optimize the metric localization. Extensive experiments using the public nuScenes dataset and CARLA simulator demonstrate that our method outperforms the state-of-the-art approach by 25.2% and 5.6% improvements in terms of translation and rotation errors, respectively.

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.

Lidar相关

GCLO: Ground Constrained LiDAR Odometry with Low-Drifts for GPS-Denied Indoor Environments

地面提取和优化,车载

LiDAR is widely adopted in Simultaneous Localization And Mapping (SLAM) and HD map production. Thus the accuracy of LiDAR Odometry (LO) is of great importance, especially in GPS-denied environments. However, we found typical LO results are prone to drift upwards along the vertical direction in underground parking lots, leading to poor mapping results. This paper proposes a Ground Constrained LO method named GCLO, which exploits planar grounds in these specific environments to compress the vertical pose drifts. GCLO is divided into three parts. First, a sensor-centric sliding map is maintained, and the point-to-plane ICP method is implemented to perform the scan-to-map registration. Then, at each key-frame, the sliding map is recorded as a local map. Ground points nearby are segmented and modeled as a planar landmark in the form of Closest Point (CP) parameterization. Finally, planar ground landmarks observed at different key-frames are associated. The ground landmark observation constraints are fused into the pose graph optimization framework to improve the LO performance. Experimental results in HIK and KITTI datasets demonstrate GCLO’s superior performances in terms of accuracy in indoor multi-floor parking lots and flat outdoor sites. The limitation of GCLO in adaptability for other environments is also discussed.

Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value. With the aid of gravity alignment, the degree of freedom in point cloud registration could be reduced to 4DoF, in which only the heading angle is required for rotation estimation. In this paper, we propose a fast and accurate global heading angle estimation method for gravity-aligned point clouds. Our key idea is that we generate a translation invariant representation based on Radon Transform, allowing us to solve the decoupled heading angle globally with circular cross-correlation. Besides, for heading angle estimation between point clouds with different distributions, we implement this heading angle estimator as a differentiable module to train a feature extraction network end-to-end. The experimental results validate the effectiveness of the proposed method in heading angle estimation and show better performance compared with other methods.

LoGG3D-Net: Locally Guided Global Descriptor Learning for 3D Place Recognition

3D点云重定位,描述子提取,KITTI测试

Retrieval-based place recognition is an efficient and effective solution for re-localization within a pre-built map, or global data association for Simultaneous Localization and Mapping (SLAM). The accuracy of such an approach is heavily dependent on the quality of the extracted scene-level representation. While end-to-end solutions - which learn a global descriptor from input point clouds - have demonstrated promising results, such approaches are limited in their ability to enforce desirable properties at the local feature level. In this paper, we introduce a local consistency loss to guide the network towards learning local features which are consistent across revisits, hence leading to more repeatable global descriptors resulting in an overall improvement in 3D place recognition performance. We formulate our approach in an end-to-end trainable architecture called LoGG3D-Net. Experiments on two large-scale public benchmarks (KITTI and MulRan) show that our method achieves mean F1max scores of 0.939 and 0.968 on KITTI and MulRan respectively, achieving state-of-the-art performance while operating in near real-time. The open-source implementation is available at: https://github.com/csiro-robotics/LoGG3D-Net.

Extrinsic Calibration and Verification of Multiple Non-Overlapping Field of View Lidar Sensors

We demonstrate a multi-lidar calibration framework for large mobile platforms that jointly calibrate the extrinsic parameters of non-overlapping Field-of-View (FoV) lidar sensors, without the need for any external calibration aid. The method starts by estimating the pose of each lidar in its corresponding sensor frame in between subsequent timestamps. Since the pose estimates from the lidars are not necessarily synchronous, we first align the poses using a Dual Quaternion (DQ) based Screw Linear Interpolation. Afterward, a Hand-Eye based calibration problem is solved using the DQ-based formulation to recover the extrinsics. Furthermore, we verify the extrinsics by matching chosen lidar semantic features, obtained by projecting the lidar data into the camera perspective after time alignment using vehicle kinematics. Experimental results on the data collected from a Scania vehicle [~ 1 Km sequence] demonstrate the ability of our approach to obtain better calibration parameters than the provided vehicle CAD model calibration parameters. This setup can also be scaled to any combination of multiple lidars.

LB-L2L-Calib: Accurate and Robust Extrinsic Calibration for Multiple 3D LiDARs with Long Baseline and Large Viewpoint Difference

Multi-LiDAR system is an important part of V2X (Vehicle to Everything) to enhance the perception information for unmanned vehicles. To fuse the information from multiple 3D LiDARs, accurate extrinsic calibration between the LiDARs is essential. However, the existing multi-LiDAR calibration methods mainly focus on short baseline scenarios, where multiple LiDARs are closely mounted on a single platform (e.g., an unmanned vehicle). Besides, most methods typically use a planar target for calibration. Some of the methods require the motion of the multi-LiDAR system. The above conditions severely limit the application of these methods to V2X, where LiDARs are non-movable, the baseline and viewpoint difference between the LiDARs can be very large. In order to meet these challenges, we propose an accurate and robust extrinsic calibration method for long baseline multi-LiDAR systems, named LB-L2L-Calib (Large Baseline LiDAR to LiDAR extrinsic Calibration). (1) We use a sphere as the calibration target for multiple LiDARs with large viewpoint difference, leveraging the viewpoint-invariance of the sphere. (2) A improved sphere detection and sphere center estimation strategy is introduced to detect and extract the sphere center from a cluttered point cloud in large-scale outdoor scenario. (3) A extrinsic parameter regression scheme is introduced. Both simulation and real experiments demonstrate that LB-L2L-Calib is highly accurate and robust. Quantitative results show that the rotation and translation error is less than 0.01m and 0.01 degree (in simulation, Gauss noise 0.03m, the distance and viewpoint difference between two LiDARs is more than 30m and 90 degrees).

DiSCo-SLAM: Distributed Scan Context-Enabled Multi-Robot LiDAR SLAM with Two-Stage Global-Local Graph Optimization

多机器人协同

We propose a novel framework for distributed, multi-robot SLAM intended for use with 3D LiDAR observations. The framework, DiSCo-SLAM, is the first to use the lightweight Scan Context descriptor for multi-robot SLAM, permitting a data-efficient exchange of LiDAR observations among robots. Additionally, our framework includes a two-stage global and local optimization framework for distributed multi-robot SLAM which provides stable localization results that are resilient to the unknown initial conditions that typify the search for inter-robot loop closures. We compare our proposed framework with the widely used distributed Gauss-Seidel (DGS) approach, over a variety of multi-robot datasets, quantitatively demonstrating its accuracy, stability, and data-efficiency.

Ensemble Kalman Filter Based LiDAR Odometry for Skewed Point Clouds Using Scan Slicing

In the presence of fast motion, point clouds obtained from mechanical spinning LiDAR can be easily distorted due to the slow scanning speed of the LiDAR. Existing LiDAR-only odometry algorithms generally ignore this distortion or compensate by linearly interpolating the estimated relative motion between scans. However, when there are abrupt and nonlinear motion changes, the linear interpolation method poorly compensates for the distortions, which can cause significant drift in motion estimates. In this work, we present a LiDAR-only odometry algorithm that estimates motion by slicing LiDAR scans into shorter times to compensate more agilely for point cloud distortions. Observations from only one small scan slice inevitably lack spatial uniqueness, so the multimodal problem needs to be addressed. For LiDAR-only odometry with small scan slices, we introduce the ensemble Kalman filter, a kind of Monte Carlo-based Bayesian filter. The proposed method makes it possible to perform odometry with only a very narrow field of view (FoV), and the robustness to point cloud distortion is improved. We demonstrate the effectiveness of the proposed method through Monte Carlo simulations and several tests with fast-moving scenarios. The experimental results prove the possibility of odometry with a very narrow FoV of down to 10 degrees and robustness against motion distortion.

Optimal Target Shape for LiDAR Pose Estimation

Targets are essential in problems such as object tracking in cluttered or textureless environments, camera (and multi-sensor) calibration tasks, and simultaneous localization and mapping (SLAM). Target shapes for these tasks typically are symmetric (square, rectangular, or circular) and work well for structured, dense sensor data such as pixel arrays (i.e., image). However, symmetric shapes lead to pose ambiguity when using sparse sensor data such as LiDAR point clouds and suffer from the quantization uncertainty of the LiDAR. This paper introduces the concept of optimizing target shape to remove pose ambiguity for LiDAR point clouds. A target is designed to induce large gradients at edge points under rotation and translation relative to the LiDAR to ameliorate the quantization uncertainty associated with point cloud sparseness. Moreover, given a target shape, we present a means that leverages the target’s geometry to estimate the target’s vertices while globally estimating the pose. Both the simulation and the experimental results (verified by a motion capture system) confirm that by using the optimal shape and the global solver, we achieve centimeter error in translation and a few degrees in rotation even when a partially illuminated target is placed 30 meters away. All the implementations and datasets are available at https://github.com/UMich- BipedLab/global_pose_estimation_for_optimal_shape.

FP-Loc: Lightweight and Drift-Free Floor Plan-Assisted LiDAR Localization

We present a novel framework for floor plan-based, full six degree-of-freedom LiDAR localization. Our approach relies on robust ceiling and ground plane detection, which solves part of the pose and supports the segmentation of vertical structure elements such as walls and pillars. Our core contribution is a novel nearest neighbour data structure for an efficient look-up of nearest vertical structure elements from the floor plan. The registration is realized as a pair-wise regularized windowed pose graph optimization. Highly efficient, accurate and drift-free long-term localization is demonstrated on multiple scenes.

LLOL: Low-Latency Odometry for Spinning Lidars

In this paper, we present a low-latency odometry system designed for spinning lidars. Many existing lidar odometry methods wait for an entire sweep from the lidar before processing the data. This introduces a large delay between the first laser firing and its pose estimate. To reduce this latency, we treat the spinning lidar as a streaming sensor and process packets as they arrive. This effectively distributes expensive operations across time, resulting in a very fast and lightweight system with a much higher throughput and lower latency. Our open source implementation is available at https://github.com/versatran01/llol.

MinkLoc3D-SI: 3D LiDAR Place Recognition with Sparse Convolutions, Spherical Coordinates, and Intensity

The 3D LiDAR place recognition aims to estimate a coarse localization in a previously seen environment based on a single scan from a rotating 3D LiDAR sensor. The existing solutions to this problem include hand-crafted point cloud descriptors (e.g., ScanContext, M2DP, LiDAR IRIS) and deep learning-based solutions (e.g., PointNetVLAD, PCAN, LPD-Net, DAGC, MinkLoc3D), which are often only evaluated on accumulated 2D scans from the Oxford RobotCar dataset. We introduce MinkLoc3D-SI, a sparse convolution-based solution that utilizes spherical coordinates of 3D points and processes the intensity of 3D LiDAR measurements, improving the performance when a single 3D LiDAR scan is used. Our method integrates the improvements typical for hand-crafted descriptors (like ScanContext) with the most efficient 3D sparse convolutions (MinkLoc3D). Our experiments show improved results on single scans from 3D LiDARs (USyd Campus dataset) and great generalization ability (KITTI dataset). Using intensity information on accumulated 2D scans (RobotCar Intensity dataset) improves the performance, even though spherical representation doesn’t produce a noticeable improvement. As a result, MinkLoc3D-SI is suited for single scans obtained from a 3D LiDAR, making it applicable in autonomous vehicles.

Fast and Robust Registration of Partially Overlapping Point Clouds

Real-time registration of partially overlapping point clouds has emerging applications in cooperative perception for autonomous vehicles and multi-agent SLAM. The relative translation between point clouds in these applications is higher than in traditional SLAM and odometry applications, which challenges the identification of correspondences and a successful registration. In this paper, we propose a novel registration method for partially overlapping point clouds where correspondences are learned using an efficient point-wise feature encoder, and refined using a graph-based attention network. This attention network exploits geometrical relationships between key points to improve the matching in point clouds with low overlap. At inference time, the relative pose transformation is obtained by robustly fitting the correspondences through sample consensus. The evaluation is performed on the KITTI dataset and a novel synthetic dataset including low-overlapping point clouds with displacements of up to 30m. The proposed method achieves on-par performance with state-of-the-art methods on the KITTI dataset, and outperforms existing methods for low overlapping point clouds. Additionally, the proposed method achieves significantly faster inference times, as low as 410ms, between 5 and 35 times faster than competing methods. Our code and dataset are available at https://github.com/eduardohenriquearnold/fastreg

OpenStreetMap-Based LiDAR Global Localization in Urban Environment without a Prior LiDAR Map

Using publicly accessible maps, we propose a novel vehicle localization method that can be applied without using prior light detection and ranging (LiDAR) maps. Our method generates OSM descriptors by calculating the distances to build- ings from a location in OpenStreetMap at a regular angle, and LiDAR descriptors by calculating the shortest distances to building points from the current location at a regular angle. Comparing the OSM descriptors and LiDAR descriptors yields a highly accurate vehicle localization result. Compared to methods that use prior LiDAR maps, our method presents two main advantages: (1) vehicle localization is not limited to only places with previously acquired LiDAR maps, and (2) our method is comparable to LiDAR map-based methods, and especially out- performs the other methods with respect to the top one candidate at KITTI dataset sequence 00.

CT-ICP: Real-Time Elastic LiDAR Odometry with Loop Closure

Multi-beam LiDAR sensors are increasingly used in robotics, particularly with autonomous cars for localization and perception tasks, both relying on the ability to build a precise map of the environment. For this, we propose a new real-time LiDAR-only odometry method called CT-ICP (for Continuous-Time ICP), completed into a full SLAM with a novel loop detection procedure. The core of this method, is the introduction of the combined continuity in the scan matching, and discontinuity between scans. It allows both the elastic distortion of the scan during the registration for increased precision, and the increased robustness to high frequency motions from the discontinuity. We build a complete SLAM on top of this odometry, using a fast pure LiDAR loop detection based on elevation image 2D matching, providing a pose graph with loop constraints. To show the robustness of the method, we tested it on seven datasets: KITTI, KITTI-raw, KITTI-360, KITTI-CARLA, ParisLuco, Newer College, and NCLT in driving and high-frequency motion scenarios. Both the CT-ICP odometry and the loop detection are made available online. CT-ICP is currently first, among those giving access to a public code, on the KITTI odometry leaderboard, with an average Relative Translation Error (RTE) of 0.59% and an average time per scan of 60ms on a CPU with a single thread.

Globally Consistent and Tightly Coupled 3D LiDAR Inertial Mapping

This paper presents a real-time globally consistent mapping framework based on LiDAR-IMU tight coupling. The proposed framework comprises a preprocessing module and three estimation modules: odometry estimation, local mapping, and global mapping, which are all based on the tight coupling of the GPU-accelerated voxelized GICP matching cost factor and the IMU preintegration factor. The odometry estimation module employs a keyframe-based fixed-lag smoothing approach for efficient and low-drift trajectory estimation, with a bounded computation cost. The global mapping module constructs a factor graph that minimizes the global registration error over the entire map with the support of IMU constraints, ensuring robust optimization in feature-less environments. The evaluation results on the Newer College dataset and KAIST urban dataset show that the proposed framework enables accurate and robust localization and mapping in challenging environments.

ART-SLAM: Accurate Real-Time 6DoF LiDAR SLAM

Real-time six degrees-of-freedom pose estimation with ground vehicles represents a relevant and well-studied topic in robotics due to its many applications such as autonomous driving and 3D mapping. Although some systems already exist, they are either not accurate or they struggle in real-time settings. In this paper, we propose a fast, accurate and modular LiDAR SLAM system for both batch and online estimation. We first apply downsampling and outlier removal, to filter out noise and reduce the size of the input point clouds. Filtered clouds are then used for pose tracking, possibly aided by a pre-tracking module, and floor detection, to ground optimize the estimated trajectory. Efficient multi-steps loop closure and pose optimization, achieved through a g2o pose graph, are the last steps of the proposed SLAM pipeline. We compare the performance of our system with state-of-the-art point cloud-based methods, LOAM, LeGO-LOAM, A-LOAM, LeGO-LOAM-BOR, LIO-SAM and HDL, and show that the proposed system achieves equal or better accuracy and can easily handle even cases without loops. The comparison is done evaluating the estimated trajectory displacement using the KITTI (urban driving) and Chilean (underground mine) datasets.

Interval-Based Visual-Inertial LiDAR SLAM with Anchoring Poses、

We present a novel interval-based visual-inertial LiDAR SLAM (i-VIL SLAM) method that solely assumes sensor errors to be bounded and propagates the error from the input sources to the estimated map and trajectory using interval analysis. The method allows us to restrict the solution set of the robot poses and the position of the landmarks to the set that is consistent with the measurements. If the error limits are not violated, it is guaranteed that the estimated set contains the true solution. The accumulation of the uncertainty is stabilized by anchoring poses derived from GNSS/INS data. Furthermore, for the first time we compare confidence ellipses determined by a classical SLAM graph optimization approach with the interval estimates of the robot poses provided by our method. In this work, we experimentally show that the marginal covariances computed by the classical SLAM graph optimization are too overconfident and underestimate the uncertainty of the poses. While the 99.9%-ellipsoids derived from the marginal covariances of the poses only enclose less than 64% of the ground truth in the worst case, our method provides interval bounds for the pose parameters that enclose the ground truth for more than 96% of all frames.

LT-Mapper: A Modular Framework for LiDAR-Based Lifelong Mapping

Long-term 3D map management is a fundamental capability required by a robot to reliably navigate in the non-stationary real-world. This paper develops open-source, modular, and readily available LiDAR-based lifelong mapping for urban sites. This is achieved by dividing the problem into successive subproblems: multi-session SLAM (MSS), high/low dynamic change detection, and positive/negative change management. The proposed method leverages MSS and minimizes potential trajectory error; thus, a manual or good initial alignment is not required for change detection. Our change management scheme preserves efficacy in both memory and computation costs, providing automatic object segregation from a large-scale point cloud map. We verify the framework’s reliability and applicability even under permanent year-level variation, through extensive real-world experiments with multiple temporal gaps (from day to year).

你可能感兴趣的:(slam,人工智能,算法,机器学习)