受到抨击_机器人如何制作地图简介，同时抨击同时定位和地图绘制

受到抨击

Remember the last time you’ve visited somewhere new? If you are reading this in 2020, you might have to rack your brain for those happy days. It may have been a holiday trip to an exotic place. You are excited to explore the area, check out all the nice restaurants and attractions. What would you do?

还记得上次您去过新地方吗？如果您在2020年阅读此书，则可能需要为那些快乐的日子绞尽脑汁。这可能是去异国风情的度假之旅。您很高兴探索该地区，查看所有餐厅和景点。你会怎么做？

Yes, you read a map! (…or so it used to be)

是的，您阅读了地图 ！ (...以前是这样)

Life has become too simplified nowadays with the intelligent little devices in our pockets, but let’s just imagine we are back in the stone ages. What if your phone has died and your only hope is a tourist guide in your backpack?

如今，口袋里装着智能小设备，生活变得太简单了，但让我们想象一下，我们正回到石器时代。如果您的手机没电了，而您唯一的希望是背包里的导游该怎么办？

The first challenge is to find where you are. If you don’t have a GPS, you can’t just follow the blue blinking dot. You will probably look for street names or distinct landmarks nearby, and eventually, locate yourself. In robotics terms, this is called localisation. Once you know where you are and you know where you are going, you’re all sorted.

第一个挑战是找到自己所在的位置。如果没有GPS，则不能仅跟随蓝色的闪烁点。您可能会在附近寻找街道名称或不同的地标，并最终找到自己 。用机器人技术术语，这称为localization 。一旦您知道自己的位置并且知道要去的地方，就可以对自己进行排序。

Well, what if you don’t have a map to begin with?

好吧，如果您没有地图开始怎么办？

In theory, we can always explore the area to figure out where everything is relative to each other and memorise it, constructing a mental map. Even if we don’t have a ruler and a protractor, we roughly know how long our strides are and what a 90 degrees’ turn looks like. As we move, we know how far we moved by identifying landmarks we have previously observed. Furthermore, as soon as we see new landmarks and scenes, we can add those to our mental map to expand our territory of known places.

从理论上讲，我们始终可以探索该区域以找出一切相对彼此的位置并加以记忆，从而构造出思维导图。即使我们没有尺子和量角器，我们也大致知道我们的步幅有多长，以及90度的转弯是什么样子。当我们移动时，通过识别先前观察到的地标，我们知道了移动了多远。此外，一旦我们看到新的地标和场景，便可以将其添加到我们的思维导图中，以扩展我们已知区域的范围。

Many things are happening here. We used both (i) the estimate of our motion dynamics, and (ii) triangulation (broadly speaking) from known landmarks. Moreover, we (iii) updated our map as we obtain new observations. (i) tells us the change in our position i.e. odometry. If we have a good idea of where we were at previously, we can use the estimated odometry to figure out where we are now. Most robots operate on wheels so it is fairly easy to measure how much they’ve moved. Some robots even have sensors called inertial measurement unit (IMU) that measure their body motion. Even so, we cannot just rely on odometry alone to know where you are. That is like walking with your eyes closed. At some point, you have to open your eyes and correct for the accumulated error. That is what (ii) does. For (ii), you need a half-built map already to correct for your localisation. But in (iii), you are using your current best guess of where you are in order to update the map. This is why localisation and mapping has to happen simultaneously.

这里发生了很多事情。我们既使用(i)运动动态的估计，又使用(ii)从已知地标进行三角剖分(广义上来说)。此外，我们(iii)在获得新观测值时更新了地图。 (i)告诉我们位置的变化，即里程表 。如果我们对以前的位置有一个很好的了解，则可以使用估计的里程表来确定现在的位置。大多数机器人都在轮子上操作，因此很容易测量它们移动了多少。一些机器人甚至还具有称为惯性测量单元(IMU)的传感器，用于测量其身体运动。即使这样，我们也不能仅仅依靠里程表来了解您的位置。这就像闭着眼睛走路一样。在某些时候，您必须睁开眼睛并纠正累积的错误。这就是(ii)所做的。对于(ii)，您已经需要半构建的地图才能更正您的本地化。但是在(iii)中，您正在使用当前对位置的最佳猜测来更新地图。这就是为什么本地化和映射必须同时发生的原因 。

Even if we take carefully measured strides and construct the map carefully, it is impossible to rule out all kinds of noise. After a couple of turns, the error accumulates and we are no longer certain of our location. How do we make a recovery then?

即使我们仔细测量了步幅并仔细构建了地图，也无法排除各种噪音。经过几转之后，错误不断累积，我们不再确定位置。那我们该如何恢复？

This is where loop closure comes into play. Assuming that buildings and trees don’t move, we know when we do a full loop and come back to the same place we have been before (loop detection). We can then correct our mental map and adjust for the noise that had accumulated (loop closure).

这是闭环起作用的地方。假设建筑物和树木不动，我们知道何时进行完整的循环并返回到以前的相同位置( 循环检测 )。然后，我们可以更正思维导图并针对已累积的噪声进行调整( 环路闭合 )。

The above is the core idea behind Simultaneous Localisation and Mapping, which is used very widely in any kind of robotics applications that require the robot to move around a new environment. It doesn’t have to be just robots — it can be used for smartphones and other devices with a camera too. Augmented Reality (AR), for instance, requires the virtual world to be mapped onto the real scenery with precision, so some advanced use cases would require SLAM as a part of the pipeline.

以上是“同步本地化和映射”背后的核心思想，该思想广泛用于要求机器人在新环境中移动的任何类型的机器人应用中。它并不仅是机器人，它还可以用于智能手机和其他带相机的设备。例如，增强现实(AR)要求将虚拟世界精确地映射到真实场景上，因此某些高级用例将需要SLAM作为管道的一部分。

Now that you have constructed a map, you are armed with knowledge about the environment, ready to be used for future navigation. If you ever need to visit a place you’ve previously visited, you don’t even have to take the same route. You can take shortcuts and plan strategically (similarly to how you optimise your shopping route).

现在您已经构建了地图，您已经掌握了有关环境的知识，可以随时用于将来的导航。如果您需要访问以前访问过的地方，甚至不必走相同的路线。您可以采取捷径并进行策略性计划(类似于优化购物路线的方式)。

Unlike humans, different robots have different sensors and use cases, and there are many approaches to solving the same problem. That is why there are so many variations of SLAM systems out there. Here, I will try to present a high level picture of some of the major approaches and how they differ from each other. For a more detailed, comprehensive and accurate description, please refer to some of the survey papers [1, 2, 3].

与人类不同，不同的机器人具有不同的传感器和用例，并且有许多解决相同问题的方法。这就是为什么SLAM系统种类繁多的原因。在这里，我将简要介绍一些主要方法以及它们之间的不同之处。有关更详细，全面和准确的描述，请参阅一些调查报告[1、2、3]。

完美还是效率？ — SLAM，SfM或VO (Perfection or efficiency? — SLAM, SfM or VO)

Even before we get started with different approaches in SLAM, let’s talk about the siblings of SLAM: Structure from Motion (SfM) [4] and Visual Odometry (VO) [1]. All these fields address the same problem —for a collection of images, estimate where the camera was when each image was taken, so that the images tell a consistent story when they are reverse-projected back into 3D world space. They share many procedural steps, such as feature detection, feature matching, and optimising the estimated pose based on a consistency metric. However, the constraints they assume are slightly different.

甚至在我们开始使用SLAM的不同方法之前，我们先来谈谈SLAM的兄弟姐妹：运动结构(SfM) [4]和视觉里程表(VO) [1]。所有这些领域都解决了相同的问题-对于一组图像，估计每幅图像拍摄时相机所在的位置，以便在将图像反向投影回3D世界空间时，这些图像能够说明一致的故事。它们共享许多程序步骤，例如特征检测，特征匹配以及基于一致性度量优化估计的姿势。但是，他们假设的约束条件略有不同。

Structure from Motion is a problem of reconstructing the 3D structure of the environment from a collection of 2D images. The images could be ordered or unordered, hence it cannot rely on odometry. Visual SLAM poses a more specific problem in which the images are captured sequentially by a camera that is moving through an environment, following some kind of trajectory. Some SfM algorithms focus more on accuracy and global consistency rather than time constraints, and can work at a very large scale if need be. For instance, SfM can be applied to reconstruct a subset of the globe in 3D from Google Street View images. Obviously, this takes a huge amount of time and resources, so such algorithms aren’t expected to run in real time. Hence, these are called offline methods. In contrast, the main purpose of SLAM is to aid the robot in its exploration and navigation, so real time execution is crucial. Hence, this falls under the category of online methods. (However, this distinction has become blurred with Incremental SfM which operates online, and many SLAM methods have borrowed the idea of Incremental SfM.)

运动产生的结构是一个从2D图像集合中重建环境3D结构的问题。图像可以是有序的也可以是无序的 ，因此它不能依赖于测距法。 Visual SLAM提出了一个更具体的问题，其中图像是通过遵循某种轨迹在环境中移动的相机顺序捕获的。一些SfM算法更多地关注准确性和全局一致性，而不是时间限制，并且可以在需要时大规模使用。例如，SfM可以应用于从Google街景图像以3D形式重建地球的子集。显然，这需要花费大量时间和资源，因此这种算法不能实时运行。因此，这些称为离线方法 。相比之下，SLAM的主要目的是帮助机器人进行探索和导航，因此实时执行至关重要。因此，这属于在线方法的类别。 (但是，在线运行的Incremental SfM使这种区别变得模糊了，许多SLAM方法都借鉴了Incremental SfM的思想。)

Visual Odometry simplifies the SLAM problem, focussing more on computational and resource efficiency. Rather than ensuring global consistency of the map it creates like SLAM does, VO’s main concern is about local consistency. Its goal is to estimate the odometry (differences in camera poses between two images taken one after another) without having to rely on sensors such as IMU or ground truth odometry measurements. If we don’t care about global consistency, the loop closure step can be taken away, saving us some computation and making the system simpler.

视觉Odometry简化了SLAM问题，将重点更多地放在计算和资源效率上。 VO并没有像SLAM那样确保它创建的地图的全局一致性，而是主要关注局部一致性 。其目标是无需依靠IMU或地面真实里程计测量等传感器，即可估算里程计(两个接一个拍摄的图像之间的相机姿态差异)。如果我们不在乎全局一致性，则可以取消循环关闭步骤，从而节省了一些计算时间，并使系统更简单。

为什么我们有两只眼睛？ —立体，单眼或深度相机 (Why do we have two eyes? — Stereo, Monocular, or Depth Camera)

Even if we were able to localise the camera, it would be a struggle to reconstruct the surroundings without knowing the depth of each pixel of the image i.e. how far the surface is from the camera. Humans and most animals have two eyes, because that allows us to figure out where objects are relative to us using triangulation. Similarly, the depth can be inferred using stereo cameras, with some extra work of finding correspondences between the two images by feature matching. There also exist depth cameras, which provides the depth information in addition to RGB, either by measuring the time of flight or the pattern density of the Infra Red light emitted from the camera. There is one disadvantage though — unlike stereo cameras, depth cameras only work for short distances (e.g. indoors).

即使我们能够定位相机，也要在不知道图像每个像素的深度(即表面距离相机多远)的情况下重建周围环境仍然是一件艰苦的工作。人类和大多数动物都有两只眼睛，因为这可以让我们使用三角测量法找出物体相对于我们的位置。类似地，可以使用立体相机推断深度，还需要进行一些额外的工作，即通过特征匹配来找到两个图像之间的对应关系。还存在深度相机，该深度相机通过测量飞行时间或从相机发出的红外光的图案密度来提供除RGB之外的深度信息。但是，有一个缺点-与立体相机不同，深度相机只能在短距离内(例如室内)工作。

Even with just a single camera, the depth can be inferred by comparing frames taken at different camera positions. In this case, the scale of the reconstructed map is unknown unless we use some other means to infer the scale.

即使只有一台摄像机，也可以通过比较在不同摄像机位置拍摄的帧来推断深度。在这种情况下，除非我们使用其他方法推断比例尺，否则重建地图的比例尺是未知的。

Some other SLAM systems are developed for LASER or LiDAR scanners.

还为激光或激光雷达扫描仪开发了其他一些SLAM系统。

我们如何找到帧之间的对应关系和一致性？ —基于特征的方法与直接方法 (How do we find correspondences and consistency between frames? — Feature-based Method vs Direct Method)

As we explore, we identify salient features such as landmarks and objects, which we use to match pieces of observations together and construct a consistent mental map. We now need to invent a convenient definition of “salient features” that our system can detect and match.

在探索过程中，我们确定了显着特征，例如地标和物体，这些特征用于将观察结果匹配在一起并构建一致的思维导图。现在，我们需要发明一种方便的“显着特征”的定义，我们的系统可以检测和匹配这些特征。

While landmarks such as buildings and sign posts work well for humans, it is much easier for machines to identify and match low level features such as corners, edges and blobs. More sophisticated feature definitions, together with detection algorithms and descriptors (a distinct feature representation) have been invented, such as Scale-invariant Feature Transform (SIFT) [5], Speeded Up Robust Features (SURF) [6] and Oriented FAST and Rotated BRIEF (ORB) [7, 8]. These features are designed to be robust to translation, rotation, variations in scale, view-point, lighting, etc.

虽然建筑物和路标等地标对人类而言效果很好，但机器识别和匹配低级特征(例如拐角，边缘和斑点)要容易得多。已经发明了更复杂的特征定义，以及检测算法和描述符(不同的特征表示)，例如尺度不变特征变换( SIFT )[5]，加速鲁棒特征( SURF )[6]和定向FAST和Rotated简报( ORB )[7，8]。这些功能旨在对平移，旋转，比例尺变化，视点，照明等具有鲁棒性。

演示地址

A limitation to the feature-based approach is that, once the features are extracted, all other information contained in the image is disregarded. This could be problematic when feature matching alone cannot offer robust reconstruction, e.g. in environments with too many or very few salient features, or with repetitive textures.

基于特征的方法的局限性在于，一旦特征被提取，图像中包含的所有其他信息都将被忽略。当单独的特征匹配无法提供鲁棒的重建功能时(例如，在具有太多或非常少的显着特征或具有重复纹理的环境中)，这可能会出现问题。

An alternative approach to feature-based matching is minimising the photometric (light intensity) error, which is called the direct method. Well-known examples are LSD-SLAM [9] and Direct Sparse Odometry [10]. These methods by-pass the computationally expensive process of feature extraction and matching, and can benefit from the rich information in the environment. However, it is sensitive to changes in lighting or large motions.

基于特征的匹配的另一种方法是最小化光度 (光强度) 误差，这称为直接方法 。众所周知的例子是LSD-SLAM [9]和直接稀疏测渗法 [10]。这些方法绕过了特征提取和匹配的计算昂贵过程，并且可以从环境中的丰富信息中受益。但是，它对光线变化或大动作很敏感。

演示地址

我们如何调整相机姿势？ — RANSAC，迭代最近点 (How do we adjust camera poses? — RANSAC, Iterative Closest Point)

Now that we have identified some correspondences between successive frames, it is time to find the camera motion that best explain those matches. While something as simple as least squares would work well if all our matches are correct, we have to bare in mind that our correspondences are noisy and inaccurate. RANSAC [11] is a method widely used to estimate a transform from one set of data points to another. It randomly samples the minimum number of paired coordinates (Xᵢ, Yᵢ) required to estimate a transformation from set X to set Y. For every estimated transformation, the transformation is applied to all remaining data points in set X to see if the transformed points are close enough to their corresponding matches in set Y (these are called inliers). This process is repeated and the best transform that yielded a good number of inliers is kept as a final estimate.

现在，我们已经确定了连续帧之间的一些对应关系，现在是时候找到最能说明这些匹配的摄像机运动了。如果我们所有的匹配正确，那么最小二乘法就可以很好地工作，但我们必须记住，我们的通信方式嘈杂且不准确。 RANSAC [11]是一种广泛用于估计从一组数据点到另一组数据点的转换的方法。它随机采样估计从集合X到集合Y的转换所需的最小配对坐标( X number， Yᵢ )数量。对于每个估计的变换，将变换应用于集合X中所有剩余的数据点，以查看变换后的点是否足够接近其在集合Y中的对应匹配项(这些称为Inliers)。重复此过程，并保留产生大量整数的最佳变换作为最终估计。

Once we apply RANSAC, we have a fairly good estimate of the camera motion between two frames X and Y. Note though, that this was estimated only from the feature matches that we have, which doesn’t utilise the entire information obtainable from the images. We can further refine our estimate by reverse-projecting images into 3D for each frame as point clouds, and trying to align these point clouds together.

一旦应用了RANSAC，我们就可以很好地估计两个帧X和Y之间的摄像机运动。但是请注意，这仅是根据我们具有的特征匹配来估算的，而没有利用可从图像中获得的全部信息。我们可以通过将每个帧的图像反向投影到3D中作为点云，然后尝试将这些点云对齐在一起，来进一步优化估计。

Iterative Closest Point (ICP) is a common technique in computer vision for aligning 3D points obtained by different frames. Using the estimate given by RANSAC as an initial transform estimate, ICP aligns the closest points in two point clouds, using a fast nearest neighbour search, and re-estimates the transformation to minimise the distances between the matches. This process is performed iteratively until convergence.

迭代最近点(ICP)是计算机视觉中用于对齐由不同帧获得的3D点的常用技术。 ICP使用RANSAC给出的估计作为初始变换估计，使用快速最近邻居搜索将两个点云中的最近点对齐，并重新估计变换以最小化匹配之间的距离。反复执行此过程，直到收敛为止。

我们如何校正地图？ —过滤器方法与套件调整 (How do we correct the map? — Filter Methods vs Bundle Adjustment)

As the robot moves around, we may want to adjust our previous estimates of the camera poses, especially when loop closing happens, i.e. when the robot revisits somewhere we’ve seen previously. To make the matter complicated, adjusting the camera pose will also mean amending the map we have built using that camera pose. Again, this is why localisation and mapping is a chicken and egg problem.

当机器人四处走动时，我们可能希望调整以前对摄像机姿势的估计，尤其是在发生闭环时，即当机器人重新访问我们先前见过的某个地方时。让事情变得复杂的是，调整摄影机姿态还意味着修改我们使用该摄影机姿态构建的地图。同样，这就是为什么定位和映射是鸡和鸡蛋的问题。

T: camera trajectory, T：相机轨迹， x: features x：功能

There are two main approaches to correcting the camera pose and map estimates — filter methods and bundle adjustment (BA) methods. The major difference is that, whereas bundle adjustment at its most basic form re-evaluates all the camera poses and feature poses in the map from scratch at every time step as a maximum likelihood estimate, filter methods represent its estimate of the camera and feature poses as a probability density function, allowing those estimates to be incrementally updated by taking new observations into account. Extended Karman Filters (EKF), which gives a Bayesian solution to the state estimate update given new observations, is typically used. A multivariate Gaussian is a common choice to model probabilities for the state estimate and the observation noise. Particle Filters, which do not assume the shape of the probability density function, is another option within the family of filter methods.

有两种主要方法可以校正相机的姿态和贴图估计：滤波方法和束调整(BA)方法。主要区别在于，尽管捆绑调整以其最基本的形式在每个时间步从头开始重新评估地图中的所有摄影机姿态和特征姿态，作为最大似然估计，但滤波方法代表其对摄影机和特征姿态的估计作为概率密度函数，可以通过考虑新的观察值来逐步更新这些估计。通常使用扩展卡尔曼滤波器(EKF) ，它会在给定新观测值的情况下为状态估计更新提供贝叶斯解决方案。多元高斯是对状态估计和观察噪声的概率建模的常见选择。不采用概率密度函数形状的粒子过滤器是过滤方法系列中的另一种选择。

While filter methods seem to have an advantage over BA methods in terms of reusing previous estimates, there is a hidden cost. Because filter methods only retain the most recent camera pose estimate together with feature nodes in the map, when it comes to updating the map based on accumulated error, we have to compute at how every feature node affects every other feature node. This is captured by the covariance matrix of the multivariate Gaussian, which grows as more and more features get added to the map.

尽管在重用以前的估计方面，过滤器方法似乎比BA方法具有优势，但存在隐藏成本。由于滤波方法仅将最新的相机姿态估计值与地图中的特征节点一起保留，因此在基于累积误差更新地图时，我们必须计算每个特征节点如何影响其他每个特征节点。这是由多元高斯的协方差矩阵捕获的，随着越来越多的要素添加到地图中，协方差矩阵会增长。

A more computationally efficient approach to BA would be to evaluate the optimal solution on a subset of past camera poses and features. Typically, key frames which have distinct features are chosen at regular intervals, and only these will be used to compute the optimal poses in order to reduce computational complexity. More techniques have been developed to reduce the computational complexity of real-time BA, such as Pose Graph Optimisation. For a more in-depth comparison, see [12].

BA的一种计算效率更高的方法是在过去的相机姿势和特征的子集上评估最佳解决方案。通常，以规则的间隔选择具有不同特征的关键帧，并且只有这些关键帧将用于计算最佳姿势，以降低计算复杂性。已经开发出了更多的技术来降低实时BA的计算复杂度，例如“姿态图优化”。有关更深入的比较，请参见[12]。

我们什么时候闭环？ —位置识别 (When do we close the loop? — Place Recognition)

As we observed at the beginning of this article, loop closure is key to ensure a global consistency of the map that we construct. This aspect also characterises SLAM from other methods such as Visual Odometry.

正如我们在本文开头所观察到的那样，循环闭合是确保所构造映射的全局一致性的关键。这方面还通过其他方法(如视觉测渗法)来表征SLAM。

There are three ways to find out whether you came back to the same place. You can either compare the current image frame to previous frames (image-to-image), compare the current frame against the map (image-to-map), or compare the current location in the map to other regions in the map (map-to-map).

有三种方法可以确定您是否回到了同一地点。您可以将当前图像帧与以前的帧进行比较(图像到图像)，将当前帧与地图进行比较(图像到地图)，或者将地图中的当前位置与地图中的其他区域(地图)进行比较到地图)。

For image-to-image comparison, Bag of Words (BoW) is an effective way to quickly narrow down candidates for corresponding frames. Bag of Words is a rather lazy way of describing images — basically it only cares about what kind of local features appears in the image, but throws away all the high level spatial context. However, this is effective for quickly looking up images that may have been taken from the same place.

对于图像之间的比较，单词袋(BoW)是一种快速缩小对应帧候选者的有效方法。 Bag of Words是描述图像的一种比较懒惰的方法-基本上，它只关心图像中出现了哪种局部特征，却丢弃了所有高级空间上下文。但是，这对于快速查找可能是从同一位置拍摄的图像非常有效。

SLAM面临的挑战 (Challenges in SLAM)

We have barely scraped the surface of research in SLAM. We have yet to discuss some of the typical challenges in SLAM and how some systems overcome them. The ultimate goal is to make SLAM handle environments with too many or very few salient features, large scale environments, dynamic environments (i.e. with lots of moving objects or people), non-smooth movements of the camera, occlusions, transparency and reflection, while meeting computation time, memory and sensory device constraints.

我们几乎没有涉及SLAM的研究。我们尚未讨论SLAM中的一些典型挑战以及某些系统如何克服它们。最终目标是使SLAM处理具有太多或非常少的显着特征的环境，大规模环境，动态环境(即，具有许多移动的物体或人)，摄像机的不平滑运动，遮挡，透明和反射，同时满足计算时间，内存和感觉设备的限制。

[1] K. Yousif et al. An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics (2015), Intelligent Industrial Systems

[1] K. Yousif等。视觉测程和视觉SLAM概述：移动机器人的应用程序 (2015年)，智能工业系统

[2] J. Fuentes-Pacheco et al. Visual Simultaneous Localization and Mapping: A Survey (2015), Artificial Intelligence Review

[2] J. Fuentes-Pacheco等。视觉同时定位和制图：一项调查 (2015年)，《人工智能评论》

[3] M. R. U. Saputra et al. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey (2018), ACM Computing Surveys

[3] MRU Saputra等。动态环境中运动的视觉SLAM和结构：一项调查 (2018)，ACM计算调查

[4] O. Ozyesil et al. A Survey of Structure from Motion (2017), Acta Numerica

[4] O. Ozyesil等。运动的结构概观 (2017)，美国数字学报

[5] D. G. Lowe Distinctive Image Featuresfrom Scale-Invariant Keypoints (2014), International Journal of Computer Vision

[5] 尺度不变关键点的 DG Lowe 独特图像特征 (2014年)，国际计算机视觉杂志

[6] H. Bay et al. SURF: Speeded Up Robust Features (2006), ECCV

[6] H. Bay等。 SURF：加速鲁棒功能 (2006)，ECCV

[7] E. Rublee et al. ORB: an efficient alternative to SIFT or SURF (2011), ICCV

[7] E. Rublee等。 ORB：有效替代SIFT或SURF (2011)，ICCV

[8] R. Mur-Artal et al. ORB-SLAM: A Versatile and Accurate Monocular SLAM System (2015), IEEE ROBIO

[8] R. Mur-Artal等。 ORB-SLAM：一种多功能且准确的单目SLAM系统 (2015年)，IEEE ROBIO

[9] J. Engel et al. LSD-SLAM: Large-ScaleDirect Monocular SLAM (2014), ECCV

[9] J. Engel等。 LSD-SLAM：大型直接单眼SLAM (2014)，ECCV

[10] J. Engel et al. Direct Sparse Odometry (2018), IEEE Transactions on Pattern Analysis and Machine Intelligence

[10] J. Engel等。 Direct Sparse Odometry (2018)，IEEE模式分析和机器智能交易

[11] M. Fischler et al. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography (1981), Communications with the ACM

[11] M. Fischler等。随机样本共识：模型拟合的范例，适用于图像分析和自动制图 (1981年)，与ACM进行通信

[12] H. Strasdat et al. Visual SLAM: Why Filter? (2012), Image and Vision Computing

[12] H. Strasdat等。 Visual SLAM：为什么要过滤？ (2012)，图像和视觉计算

翻译自: https://medium.com/swlh/how-robots-make-maps-an-intro-to-slam-simultaneous-localisation-and-mapping-37370c3e7dfe