喜乐boy

VIO系列——从滤波和优化的角度研究视觉惯性里程计

A review of visual inertial odometry from filtering and optimisation perspectives

摘要：

Visual inertial odometry (VIO) is a technique to estimate the change of a mobile platform in position and orientation overtime using the measurements from on-board cameras and IMU sensor. Recently, VIO attracts significant attentions from large number of researchers and is gaining the popularity in various potential applications due to the miniaturisation in size and low cost in price of two sensing modularities. However, it is very challenging in both of technical development and engineering implementation when accuracy, real-time performance, robustness and operation scale are taken into consideration. This survey is to report the state of the art VIO techniques from the perspectives of filtering and optimisation-based approaches, which are two dominated approaches adopted in the research area. To do so, various representations of 3D rigid motion body are illustrated. Then filtering-based approaches are reviewed, and followed by optimisation-based approaches. The links between these two approaches will be clarified via a framework of the Bayesian Maximum A Posterior. Other features, such as observability and self calibration, will be discussed.
视觉惯性里程计（VIO）是一种使用车载摄像头和 IMU 传感器的测量值估计移动平台随时间推移的位置和方向变化的技术。最近，VIO引起了大量研究人员的极大关注，并且由于两类传感模块的尺寸小型化和低成本的价格，在各种潜在应用中越来越受欢迎。然而，在技术开发和工程实施中，当考虑到准确性、实时性能、鲁棒性和操作规模时，这是非常具有挑战性的。本文将从研究领域采用的两种主要方法，即滤波和基于优化的方法的角度报告最先进的VIO技术。为此，列举了 3D 刚性运动体的各种表示形式。而后调研基于滤波的方法和基于优化的方法。这两种方法之间的联系将通过贝叶斯最大值A后验框架来阐明。其他特点也将被讨论，例如可观测性和自校准。

Introduction

Localisation and mapping are two fundamental problems in the research area of robot navigation and control. This is evidenced by the fact that simultaneous localisation and mapping (SLAM) techniques and structure from motion (SFM)
techniques [1] have been one of major topics in robotic and computer vision research communities, respectively, for many years. With the increased interest in applying these techniques to small-sized platforms, such as small-sized unmanned aerial vehicles (UAVs) or handhold mobile devices, the research focus of localisation and mapping is shifted towards the use of cameras and inertial measurement unit (IMU) sensor. These sensors are made available nowadays with high accuracy, miniaturised size, and low cost because of the fast-developing manufacturing of chips and micro-electromechanical systems (MEMS) devices. And they are complimentary with one another in a way which would be able to compensate for the errors made by each of them via the redundant information they provided. Furthermore, the evidence from biological studies shows that the navigation of human beings and some of animals is partly depending on various forms of the combination between motion sensing modalities and vision.[2–4]
定位和建图是机器人导航与控制研究领域的两个基本问题。多年来，同时定位与建图（SLAM）技术和运动结构恢复（SFM）技术[1]一直是机器人和计算机视觉研究界的主要课题之一。随着人们越来越关注将这些技术应用于小型平台，例如小型无人机（UAV）或手持移动设备，定位和建图的研究重点转向使用相机和惯性测量单元（IMU）传感器。由于芯片和微机电系统（MEMS）器件的快速发展，这些传感器如今具有高精度、小型化尺寸和低成本。它们彼此互补，能够通过他们提供的冗余信息来弥补它们各自的误差。此外，生物学研究的证据表明，人类和一些动物的导航部分取决于运动传感模式和视觉之间的各种形式的组合。[2–4]

Localisation and mapping problem is a state estimation problem in robotic community. Stochastic estimation algorithms for dynamic systems given noisy measurements, such as extended Kalman filter (EKF) or particle filters(PF), are the main stream tools being used. Some proprioceptive sensors provide the measurements on change in pose overtime and are formulated as a data-driven dynamic model of mobile platforms. The examples include optical encoders in ground mobile robots and IMUs in flying robots. They are well known for the accumulated errors and biased measurements. Some of exteroceptive sensors, such as cameras or laser ranging finders, are able to provide angular or range measurements. Based on triangulation or trilateration methods, they are able to estimate the position or orientation of mobile platforms. The estimation reliability heavily depends on environment conditions and the results are sensitive to the noise of measurements. However, they can be formulated as the measurement models in state estimation problems. Then, EKF or PF is able to propagate the estimated distribution from previous step to current step by fusing the distribution in dynamic model and the distribution in measurement models. In most cases, EKF is used instead of PF due to the large number of states to be estimated. Then, the means and covariances of Gaussian distributions are estimated.
定位和建图问题是机器人社区中的状态估计问题。给定噪声测量的动态系统的随机估计算法，例如扩展卡尔曼滤波器（EKF）或粒子滤波器（PF），是正在使用的主流工具。一些本体感受传感器提供姿态随时间变化的测量，并被表述为移动平台的数据驱动动态模型。这些示例包括地面移动机器人中的光学编码器和飞行机器人中的IMU。它们的累积误差和偏置测量众所周知。一些外部传感器，如相机或激光测距仪，能够提供角度或距离测量。基于三角测量或三边测量方法，他们能够估计移动平台的位置或方向。估计可靠性在很大程度上取决于环境条件，并且结果对测量噪声很敏感。但是，它们可以表述为状态估计问题中的测量模型。然后，EKF 或 PF 能够通过融合动态模型中的分布和测量模型中的分布，将估计分布从上一步传播到当前步骤。在大多数情况下，由于需要估计的状态数量很多，因此使用 EKF 代替 PF。然后，估计高斯分布的均值和协方差。

Motion estimation from vision is an optimisation problem in computer vision community.[5] The change in camera pose is iteratively computed via image alignment.[6,7] Normally, features are extracted from one image and reprojected to another image captured from a different view. Then, the error cost between features and reprojected features is minimised to find the pose change of camera and the structures of environment. The optimisation-based algorithms are mainly gradient based, such as Newton’s method or Gauss–Newton’s method. In most cases, the sparse structure of matrix is exploited to increase the computational efficiency. Only the values of estimated states are provided while the information on estimation distributions is not available.
视觉运动估计是计算机视觉社区中的一个优化问题。[5] 相机姿态的变化是通过图像对齐迭代计算的。[6，7] 通常，特征是从一张图像中提取的，然后重新投影到从不同视角捕获的另一张图像上。然后，最小化特征和重新投影特征之间的误差，以找到相机的姿势变化和环境结构。基于优化的算法主要是基于梯度的，例如牛顿方法或高斯-牛顿方法。在大多数情况下，利用矩阵的稀疏结构来提高计算效率。仅提供估计状态的值，而没有有关估计分布的信息。

The link between filtering and optimisation-based approaches can be built up within the framework of Bayesian inference. The optimisation-based approaches are viewed as a maximum likelihood (ML) formula where the state for which the total probability of measurements is highest is iteratively found. The filtering-based approaches are viewed as a Maximum A Posterior (MAP) formula, where the prior distribution of platform pose is constructed from the measurements of proprioceptive sensors and the likelihood distribution is built up with the measurements of exteroceptive sensors. For non-linear dynamic model and/or non-linear measurement model, iterated EKF is equivalent to the optimisation-based approaches where iterative updates on each single step in EKF are conducted. The optimisation-based approaches could be reformulated as a MAP problem from a ML problem by adding a regularisation term or prior term from the measurement of pro-prioceptive sensors or other sources. These are on-line or ‘causal’algorithms where the current estimation depends on current and previous measurements. The off-line or ‘non-casual’ algorithms are batch processing procedures where the current estimation depends on full data-set or it can be said they depend on not only current and previous measurements but also future measurements. In optimisation-based approaches, the batch processing is melt down to solving a group of linear algebraic equations. In filtering-based approaches, the Kalman smoother is able to find the posterior Gaussian distribution via a forward pass and a backward pass.
滤波和基于优化的方法之间的联系可以在贝叶斯推理的框架内建立起来。基于优化的方法被视为最大似然（ML）公式，其中迭代找到测量总概率最高的状态。基于滤波的方法被视为最大后验（MAP）公式，其中平台姿势的先验分布是根据本体感觉传感器的测量值构建的，可能性分布是通过外感受传感器的测量值建立的。对于非线性动态模型和/或非线性测量模型，迭代 EKF 等效于基于优化的方法，其中对 EKF 中的每个步骤进行迭代更新。基于优化的方法可以通过添加正则化项或来自亲预感受传感器或其他来源测量的先前项，从 ML 问题重新表述为 MAP 问题。这些是在线或“因果”算法，其中电流估计取决于当前和以前的测量。离线或“非随意”算法是批处理过程，其中当前的估计取决于完整的数据集，或者可以说它们不仅取决于当前和以前的测量，还取决于未来的测量。在基于优化的方法中，批处理被熔化为求解一组线性代数方程。在基于滤波的方法中，卡尔曼平滑器能够通过正向传递和向后传递找到后高斯分布。

There is a long list of state variables and parameters in a VIO problem, which needs to be estimated by given measurements from IMU and camera. Some of them are not observable or identifiable, leading to an error growing performance. The analysis of observability and identifiability of states and parameters provides a clear understanding of the estimation results, and potentially provides a guide to sufficiently exciting the platform via specialised motion patterns.
VIO 问题中有一长串状态变量和参数，需要通过 IMU 和相机的给定测量值来估计。其中一些是不可观察或识别的，导致误差增长表现。对状态和参数的可观测性和可识别性的分析提供了对估计结果的清晰理解，并可能为通过专门的运动模式充分激发平台提供指导。

The popular SLAM techniques are able to provide the estimated results on pose of platform and structure of environment. A process called loop closure detection is required to bound the accumulated errors caused by estimation algorithms.[8–10] However, the loop closure is not explicitly pursued in VIO techniques. Therefore, the estimated errors in VIO would be accumulated and could not be bounded. Due to no loop closure detection, long time and large-scale localisation become possible within small-sized device. Then, the research question is shifted towards how to exploit the techniques that could slow down the error increasing.
流行的SLAM技术能够提供平台姿态和环境结构的估计结果。需要一个称为闭环检测的过程来绑定估计算法引起的累积误差。[8–10]但是，在 VIO 技术中并未明确追求闭环。因此，VIO中的估计误差将累积并且无法限制。由于没有闭环检测，因此可以在小型设备中进行长时间和大规模定位。然后，研究问题转向如何利用可以减缓误差增加的技术。

This paper will present a survey of recent developments and advances of VIO techniques. It is organised from both filtering and optimisation perspectives, illustrating their fundamental models, algorithms and recent results. The survey will also highlight the links between two approaches and contribute an in-depth view of VIO techniques. The discussion on state observability and parameter identifiability will be provided. Our vision on this research area is summarised. In the following, filtering-based approaches are introduced in Section 2, followed by optimisation-based approaches in Section 3. The links between them are summarised in Section 4. The analysis of observability and identifiability will be addressed in Section 5. The conclusions and future work are briefly provided in Section 6.
本文将介绍VIO技术的最新发展和进展。它从过滤和优化的角度进行组织，说明了它们的基本模型、算法和最近的结果。该调查还将强调两种方法之间的联系，并深入探讨VIO技术。将提供关于状态可观测性和参数可识别性的讨论。总结了我们对这一研究领域的愿景。在下文中，第 2 节介绍了基于过滤的方法，然后在第 3 节中介绍了基于优化的方法。第4节总结了它们之间的联系。可观测性和可识别性的分析将在第5节中讨论。结论和今后的工作在第6节中简要介绍。

Filtering-based approaches

An EKF framework generally consists of a prediction step and an updating step. For a filtering-based VIO approach, inertial sensors are able to provide acceleration and rotational velocity measurements in three axes, which serve as the data-driven dynamic model or prior distribution for a 3D rigid body motion and make the motion prediction in prediction step. Cameras are able to provide the angular and ranging measurements between features and the mobile platform, which serve as the measurement model or likelihood distribution and update the prediction results in updating step.
EKF 框架通常由预测步骤和更新步骤组成。对于基于滤波的VIO方法，惯性传感器能够在三个轴上提供加速度和旋转速度测量，作为数据驱动的动态模型或3D刚体运动的先验分布，并在预测步骤中进行运动预测。相机能够提供要素和移动平台之间的角度和测距测量值，作为测量模型或似然分布，并在更新步骤中更新预测结果。

We assume a mobile platform, which is only equipped with a camera and an IMU, moves in an unknown environment. The spatial relationship between the camera and IMU is fixed and can be expressed as known position and attitude. The aim of an EKF based VIO algorithm is to provide the information of position and orientation of the platform using inertial measurements and visual observations of unknown environment. Next, we will present a full description of the EKF framework based on the work in [11], which includes the state presentation, IMU data-driven dynamics and the visual observations. This will pave a way for further analysis and discussion in the following sections.
我们假设一个仅配备摄像头和IMU的移动平台在未知环境中移动。相机和IMU之间的空间关系是固定的，可以表示为已知的位置和姿态。基于EKF的VIO算法的目的是使用惯性测量和对未知环境的视觉观察来提供平台的位置和方向信息。接下来，我们将基于 [11] 中的工作对 EKF 框架进行全面描述，其中包括状态呈现、IMU 数据驱动的动态和视觉观察。这将为以下各节的进一步分析和讨论铺平道路。

2.1. IMU data driven dynamic model

An IMU state vector of 3D rigid body at any time instant can be defined by a 16 × 1 vector, where IW q̄ is the unit quaternion describing the rotation from world frame {W} to IMU frame {I}, W pI and W v I are the position and velocity with respect to {W}, bg and ba are 3×1 vectors that describe the biases affecting the gyroscope and accelerometer measurements, respectively. The spatial relations between frames are shown in Figure 1.
3D 刚体在任何时间时刻的 IMU 状态向量都可以由 16 × 1 向量定义，其中 IW q̄ 是描述从世界帧 {W} 到 IMU 帧 {I} 的旋转的单位四元数，W pI 和 W v I 是相对于 {W} 的位置和速度，bg 和 ba 是 3×1 个向量，用于描述影响陀螺仪和加速度计测量的偏差，分别。帧之间的空间关系如图 1 所示。

Assuming that the inertial measurements contain noises with zero-mean Gaussian models, denoted as ng and na , the real angular velocity ω and the real acceleration a are related with gyroscope and accelerometer measurements in the following form:
假设惯性测量包含零均值高斯模型的噪声，表示为ng和na，实际角速度ω和实际加速度a与陀螺仪和加速度计测量相关，形式如下：

The data-driven dynamic model is a combination of 3D rigid body dynamics and the above IMU measurements, which can be represented by the following form:
数据驱动的动态模型是3D刚体动力学和上述IMU测量的组合，可以用以下形式表示：

where CIW q̄ denotes a rotational matrix described by IW q̄ and g as the gravity vector in world frame {W}. ω = ωx ω y ωz
is the angular velocityexpressed in IMU frame {I}, while is the quaternion kinematic matrix 0−ωT with ω× representing the skew-symmetric matrix. The IMU biases are modelled as random walk process, driven by the Gaussian noise, nwg and nwa . Let unit quaternion be q̄ := q0 , q T and its corresponding rotational matrix be Cq̄ . Two orientation representations can be linked via the equation below:
其中 CIW q̄ 表示由 IW q̄ 和 g 描述为世界帧 {W} 中的重力矢量的旋转矩阵。ω = ωx ω y ωz
是用 IMU 框架 {I} 表示的角速度，而四元数运动矩阵 0−ωT 与 ω× 表示斜对称矩阵。IMU偏差被建模为随机游走过程，由高斯噪声，nwg和nwa驱动。设单位四元数为 q̄ ：= q0 ， q T 及其对应的旋转矩阵为 Cq̄ 。两个方向表示可以通过以下等式链接：

Apart from the current IMU state x I mentioned above, a widely used state vector for filtering-based approaches includes T scale λ (slow drift) and current camera a spatial pose WC q̄ T W pTC . Here, we present a system state only containing a camera pose and it can be expressed as a 24-element vector. However, in various methods, the whole system state may include more than one past camera pose and keeps a moving window to limit them during state updating.
除了上面提到的当前IMU状态x之外，用于基于滤波方法的广泛使用的状态向量包括T尺度λ（慢漂移）和当前相机的空间姿态WC q̄ T W pTC。在这里，我们呈现一个仅包含相机姿势的系统状态，它可以表示为 24 个元素向量。但是，在各种方法中，整个系统状态可能包括多个过去的相机姿势，并在状态更新期间保持移动窗口以限制它们。

with other three differential equations,
与其他三个微分方程，

Applying the expectation operator in above equations, we obtain the prediction results using the IMU data-driven dynamic model:
应用上述方程中的期望运算符，我们使用IMU数据驱动的动态模型获得预测结果：

This can be abstracted as a non-linear system function with camera measurement z and a process noise term n p ∼ N (0, Q),
这可以抽象为具有相机测量z和过程噪声项n p ∼ N （0， Q）的非线性系统函数，

2.2. Error state representation and updating

For the position, velocity and bias state variables, the arithmetic difference can be applied (i.e. the error in the estimate x̂ of a quantity x is defined as x̃ = x − x̂), but the error quaternion should be defined under the assumption as local minimal situation.[12] If q̄ˆ is the estimated value of quaternion q̄, then the orientation error is described by the error quaternion δ q̄, which is defined by the relation q̄ = δ q̄ ⊗ q̄ˆ ⇒ δ q̄ = q̄ ⊗ q̄ˆ −1 . In this expression, the symbol ⊗? denotes quaternion multiplication. Intuitively, the quaternion δ q̄ describes a small rotation that causes the true and estimated attitude to coincide. Since attitude corresponds to three Degree of Freedom (DoF), δθ is used to describe the attitude errors, which is a minimal represen- tation. The error quaternion δ q̄ can be written as
对于位置、速度和偏置状态变量，可以应用算术差（即数量 x 的估计 x̂ 中的误差定义为 x̃ = x − x̂），但误差四元数应在假设下定义为局部最小情况。[12] 如果 q̄ˆ 是四元数 q̄ 的估计值，则取向误差由误差四元数 δ q̄ 描述，该误差由关系 q̄ = δ q̄ ⊗ q̄ˆ ⇒ δ q̄ = q̄ ⊗ q̄ˆ −1 定义。在这个表达式中，符号⊗？表示四元数乘法。直观地说，四元数 δ q̄ 描述了一个小的旋转，导致真实和估计的姿态重合。由于姿态对应于三个自由度（DoF），因此δθ用于描述姿态误差，这是一个最小的反驳。q̄ δ误差四元数可以写为

Thus, the error state vector containing 22 elements is
因此，包含 22 个元素的错误状态向量为

The differential equations for the continuous time error state are
连续时间误差状态的微分方程为

with ω̂ = ωm − b̂g and â = am − b̂a . By stacking the differential equations for error state, the linearised continuous time error state equation can be formed,
ω̂ = ωm − b̂g 和 â = am − b̂a .通过将误差状态的微分方程堆叠起来，可以形成线性化的连续时间误差状态方程，

with the noise vector n = naT , nTwa , nTg , nTwg . And the covariance matrix of n depends on the noise characteristics of IMU, Q c = diag σ 2na , σ 2nwa , σ 2ng , σ 2nwg . In practical, the inertial measurements for state propagation are obtained from IMU in discrete form, thus we assume the signals from gyroscopes and accelerometers are sampled with time interval t, and the state estimate is propagated using numerical integration like Runge–Kutta methods. Moreover, the covariance matrix of EKF filter is defined as
噪声向量 n = naT ， nTwa ， nTg ， nTwg .而n的协方差矩阵取决于IMU的噪声特性，Q c = 诊断σ 2na，σ 2nwa，σ 2ng，σ 2nwg。实际上，状态传播的惯性测量值是以离散形式从IMU获得的，因此我们假设来自陀螺仪和加速度计的信号以时间间隔t采样，并且状态估计是使用Runge-Kutta方法等数值积分传播的。此外，EKF滤波器的协方差矩阵定义为

where P I I k|k is the covariance matrix of the IMU state, P C C k|k is the 6N × 6N covariance matrix of the camera pose estimates and P I C k|k is the correlation between errors in IMU state and camera pose estimates. With this notation, the covariance matrix can be propagated by
其中 P I |k 是 IMU 状态的协方差矩阵，P C C k|k 是相机姿态估计值的 6N × 6N 协方差矩阵，P I C k|k 是 IMU 状态误差与相机姿态估计值之间的相关性。使用此表示法，协方差矩阵可以通过以下方式传播：

where the state-transition matrix can be calculated by assuming F c and G c to be constant over the integration time interval between two consecutive steps,
其中，状态转移矩阵可以通过假设F c和G c在两个连续步骤之间的积分时间间隔内恒定来计算，

and the discrete-time covariance matrix Q d can also be derived through numerical integration,
离散时间协方差矩阵Q d也可以通过数值积分推导出，

Thus, the mean and covariance propagation process as of the EKF-based VIO framework is summarized as follows,
因此，基于 EKF 的 VIO 框架的均值和协方差传播过程总结如下：

(a) when IMU data, ωm and am , in a certain sample frequency, are available to the filter, the state vector is propagated using numerical integration on Equation (1).
(b) calculate F d and Q d according to (4) and (5) respectively.
© the propagated state covariance matrix is com-puted from (3).
（a）当特定采样频率的 IMU 数据 ωm 和 am 可供滤波器使用时，状态矢量使用公式（1）上的数值积分传播。
（b）分别根据（4）和（5）计算F d和Q d。
（c）传播状态协方差矩阵由（3）计算而成。

2.3. Visual measurement model and updating

Due to the biases and noises in IMU data, the prediction results from prediction step become worse and worse with time increasing. The measurements from visual sensors would be able to provide key information to bound the increased errors. To do so in an EKF framework, key information extracted from images should be cast into measurement equations. There are various methods to build a measurement model. For example, the loosely coupled methods where image alignment is used to directly obtain the position and orientation changes fuse two estimated results together via an EKF framework.[13] The so-called tightly coupled methods advocate the use of key information extracted from images. The key information could be the features extracted from images via feature detectors,[14] direct light intensity with depth information (point cloud),[15,16] or semi-direct light intensity with depth information.[17] The key information is modelled as the measurement equation so that an analytic relationship between the key information and the state variables is provided. In general, a non-linear algebraic equation can be viewed as the measurement equation:
由于IMU数据中的偏差和噪声，预测步骤的预测结果随着时间的增加变得越来越差。来自视觉传感器的测量将能够提供关键信息来限制增加的误差。要在EKF框架中做到这一点，从图像中提取的关键信息应转换为测量方程。有多种方法可以构建测量模型。例如，使用图像对齐直接获取位置和方向变化的松散耦合方法通过EKF框架将两个估计结果融合在一起。[13]所谓的紧密耦合方法提倡使用从图像中提取的关键信息。关键信息可能是通过特征检测器从图像中提取的特征，[14]具有深度信息的直接光强度（点云）[15，16]或具有深度信息的半直射光强度。[17] 关键信息被建模为测量方程，以便提供关键信息和状态变量之间的分析关系。通常，非线性代数方程可以视为测量方程：

where nm models the Gaussian noise with zero mean and covariance R in visual measurement. After linearisation, the measurement error is expressed in a linear form:
其中nm在视觉测量中对均值为零和协方差R的高斯噪声进行建模。线性化后，测量误差以线性形式表示：

where H is the Jacobian matrix, and the noise term is Gaussian distributed and uncorrelated to the state error. ẑ and z represent the prediction and real measurement, respectively.
其中 H 是雅可比矩阵，噪声项是高斯分布，与状态误差无关。ẑ 和 z 分别表示预测和实际测量。

For using features as the visual information, the most straightforward way is to exploit the pinhole model to build up the measurement equation in camera frame {C}. A point u = [u v]T in an image is extracted to represent the key information. First, the spatial relationship between a point u in the camera frame and a point W pu in the world frame {W} is established as following equation:
对于使用特征作为视觉信息，最直接的方法是利用针孔模型在相机帧{C}中建立测量方程。提取图像中的点u = [u v]T以表示关键信息。首先，相机帧中的点 u 与世界帧 {W} 中的点 W pu 之间的空间关系建立为以下等式：

Strictly speaking, the visual world {V} is different to the real world {W}, but they are linked by initialising fixed translation W pV and rotation W q̄ V as shown in Figure 1. Here, for simplicity, we only use the world frame {W}.
严格来说，视觉世界 {V} 与现实世界 {W} 不同，但它们通过初始化固定平移 W pV 和旋转 W q̄ V 进行链接，如图 1 所示。在这里，为简单起见，我们只使用世界框架 {W}。

The position in the image frame is linked to the state variables W pC and W q̄ C in the world frame via the camera pinhole model, described in Equation (10). The measurement model thus can be expressed briefly as
图像帧中的位置通过相机针孔模型链接到世界帧中的状态变量 W pC 和 W q̄ C，如公式（10）中所述。因此，测量模型可以简单地表示为

And the estimation of this point in the world frame W p̂u can be calculated by iterative minimisation methods beforehand.[18]
而世界坐标系W p̂u中该点的估计可以通过迭代最小化方法预先计算出来。[18]

For other key visual information, such as pixel intensity or gradient, the measurement model should build up the link between a measurement and the state variables, then the non-linear measurement equation is linearised to obtain the Jacobian matrix.
对于其他关键的视觉信息，如像素强度或梯度，测量模型应该在测量和状态变量之间建立联系，然后线性化非线性测量方程以获得雅可比矩阵。

At this point, the updating step is ready for updating the prediction results made in prediction step. The Kalman gain is calculated as
此时，更新步骤已准备好更新在预测步骤中所做的预测结果。卡尔曼增益计算公式为

and final correction x̃ k+1 = K · z̃. After the correction, we can get the updated state estimate x k . Lastly, the error state covariance is updated as
和最终校正 x̃ k+1 = K ·z̃。修正后，我们可以得到更新的状态估计 x k 。最后，误差状态协方差更新为

The full update process is summarised as follows:
完整的更新过程总结如下：

(a) when visual data, normally raw image, in a certain sample frequency are available, some image processing procedures are adopted to extract the key information.
(b) calculate the residual as (7).
© compute the Kalman gain as (8).
(d) update the state by adding the correction.
(e) update the error state covariance as (9).
（a）当具有一定采样频率的视觉数据（通常是原始图像）可用时，采用一些图像处理程序来提取关键信息。
（b）将残差计算为（7）。
（c）计算卡尔曼增益为（8）。
（d）通过添加更正来更新状态。
（e）将误差状态协方差更新为（9）。

Optimisation-based approaches

The optimisation-based approaches mainly rely on the techniques of image processing for feature extraction and optimisation for image alignment, while inertial measurement is treated as prior, regularisation terms or totally ignored. In most cases, there are two stages in an optimisation-based approach: mapping and tracking.[15] In mapping stage, features, such as corners, edges or other landmarks in 3D space, are extracted from an image via various features detectors. Then a reprojected error is defined between two images for all the features detected. The error is used as a cost function to be optimised in order to find the coordinates of features or landmarks.[7] In tracking stage, the coordinates of features and landmarks in the map are used to define a reprojected error between two images, and an optimisation algorithm is applied again to find the changes in position and orientation of the mobile platform. The idea of separating the estimation problem into two stages is to obtain a fast tracking result while the mapping processing is time consuming.[14,15] Simultaneously optimising a cost defined using reprojected error between two images against coordinates of 3D features and pose changes of the mobile platform is possible [19] while using the concept of keyframes is able to marginalise old states to maintain a bounded optimisation window for real-time operation.
基于优化的方法主要依赖于图像处理技术进行特征提取和优化图像对齐，而惯性测量则被视为先验、正则化项或完全忽略。在大多数情况下，基于优化的方法有两个阶段：映射和跟踪。[15] 在映射阶段，通过各种特征检测器从图像中提取特征，例如 3D 空间中的角落、边缘或其他地标。然后，针对检测到的所有特征在两个图像之间定义重新投影误差。该误差用作要优化的成本函数，以便查找要素或地标的坐标。[7] 在跟踪阶段，地图中特征和地标的坐标用于定义两个图像之间的重新投影误差，并再次应用优化算法来查找移动平台的位置和方向的变化。将估计问题分为两个阶段的想法是在映射处理耗时的情况下获得快速跟踪结果。[14，15] 同时优化使用两个图像之间根据移动平台的3D特征坐标和姿势变化的重新投影误差定义的成本是可能的[19]，而使用关键帧的概念能够边缘化旧状态，以保持实时操作的有限优化窗口。

3.1. Feature alignment

Iterative non-linear optimisation is formulated to find the camera pose changes and/or feature coordinates by minimising a reprojection error of observed regions in images. Normally, a map consists of features identified in a number of keyframes in which significant features are found. The map is presented by a series of 3D coordinate vectors of features. When a new image is obtained, a decision should be made about whether or not it is a keyframe. If so, the coordinates of new features found in this new image is computed via an image alignment algorithm and added to the map together with current camera pose. Otherwise, the map keeps unchanged.
迭代非线性优化旨在通过最小化图像中观察到区域的重投影误差来查找相机姿势变化和/或特征坐标。通常，地图由在多个关键帧中标识的要素组成，在这些关键帧中可以找到重要要素。该地图由一系列要素的 3D 坐标矢量表示。获取新图像时，应决定它是否为关键帧。如果是这样，则通过图像对齐算法计算在此新图像中找到的新要素的坐标，并与当前相机姿势一起添加到地图中。否则，地图将保持不变。

As shown in Figure 2, a 3D rigid body transformation T ∈ S E(3) denotes rotation and translation in 3D:
如图 2 所示，3D 刚体变换 T ∈ S E（3）表示 3D 中的旋转和平移：

The optimisation purpose in image alignment is to find the transformation T in each time step, i.e. T is regarded as the camera pose. A minimal representation for camera pose is better for optimisation purpose. The Lie algebra se(3) corresponding to the tangent space of S E(3) at the identity is used as the minimal representation. The algebra element is called twisted coordinates ξ = [ωT υ T ]T ∈ R6 . The map from Lie algebra se(3) to Lie group S E(3) is the exponential map T (ξ ) = exp(ψ (ξ )) and its inverse map is the logarithm map ψ (ξ ) = log T (ξ ), where ψ (ξ ) is the wedge operator,
图像对齐的优化目的是在每个时间步长中找到变换T，即T被视为相机姿势。相机姿势的最小表示形式更适合优化目的。对应于恒等式 S E（3）的切空间的李代数 se（3）用作最小表示。代数元素称为扭曲坐标 ξ = [ωT υ T ]T ∈ R6 。从李代数 se（3）到李群 S E（3）的映射是指数映射 T （ξ） = exp（ψ （ξ ）），它的逆映射是对数映射 ψ （ξ） = log T （ξ ），其中 ψ （ξ）是楔形算子，

A 3D point with homogeneous vector C pu in the camera frame maps to the image coordinate u via the pinhole camera projection model:
相机帧中具有均匀矢量 C pu 的 3D 点通过针孔相机投影模型映射到图像坐标 u：

where u 0 , v0 and f u , f v are the principal point and focal length, respectively, representing camera intrinsic parameters which can be calibrated beforehand. Given a depth information du for a point u, the 3D point in the camera frame can be recovered from an image coordinate:
其中U 0、V0和F U、F V分别是主点和焦距，代表可以事先校准的相机内在参数。给定点 u 的深度信息 du，可以从图像坐标中恢复相机帧中的 3D 点：

The mapping from a point W pu in the world frame to a point in the camera frame is C pu = exp(ψ (ξ ))W pu . For monocular camera, the visual measurement model becomes z = [u v]T = u(ξ ,W pu ). The depth information of a pixel is obtained by triangulating two consecutive keyframes using a set of independent filter.[14,20] For stereo cameras, z = [u l vl u r ]T , which is also a function of ξ and W pu . The depth information is obtained via stereo vision techniques. The reprojection error is defined as the difference between a measurement z and its estimate ẑ(ξ̂ ,W p̂u ):
从世界帧中的点 W pu 到相机帧中的点的映射为 C pu = exp（ψ （ξ ））W pu 。对于单目相机，视觉测量模型变为 z = [u v]T = u（ξ ，W pu ）。像素的深度信息是通过使用一组独立的滤波器对两个连续的关键帧进行三角测量而获得的。[14，20] 对于立体相机，z = [u l vl you are ]T ，这也是 ξ 和 W pu 的函数。深度信息是通过立体视觉技术获得的。重投影误差定义为测量值 z 与其估计值 ẑ（ξ̂ ，W p̂u ）之间的差值：

The cost function η ξ̂ ,W p̂u is the sum of all squared errors z̃ with a weighting matrix W :
成本函数 η ξ̂ ，W p̂u 是所有平方误差 z̃ 与加权矩阵 W 的总和：

where j from 1 to m is the index of points within a frame, and i is the number of frames indexing a set with size n. The optimisation problem is defined as
其中 j 从 1 到 m 是帧内点的索引，i 是索引大小为 n 的集合的帧数。优化问题定义为

The method used to solve the above problem is called Bundle Adjustment.[7]
用于解决上述问题的方法称为束调整。[7]

In PTAM,[14] the optimisation process is separated into two parallel threads: mapping and tracking. Given the tracking results ξ̂ , the optimisation problem in the mapping thread is:
在PTAM中，[14]优化过程分为两个并行线程：映射和跟踪。给定跟踪结果 ξ̂ ，映射线程中的优化问题是：

Given the mapping results W p̂u , the optimisation problem in the tracking thread is:
给定映射结果 W p̂u ，跟踪线程中的优化问题是：

In Levenberg–Marquardt (LM) algorithm, the key is to find an increment δ in each iterative step, then update the optimised state variables. The solution to δ is found by:
在Levenberg-Marquardt（LM）算法中，关键是在每个迭代步骤中找到增量δ，然后更新优化的状态变量。δ的解决方案如下：

where H is the Jacobian of z and α is the LM damping parameter.
其中 H 是 z 的雅可比矩阵，α 是 LM 阻尼参数。

In most cases, only the features in sparse keyframes are maintained in the computer in order to limit the optimisation complexity as the same scale as filtering-based approaches.[19]
在大多数情况下，计算机中仅保留稀疏关键帧中的特征，以限制优化复杂性，与基于过滤的方法相同。[19]

3.2. Dense alignment

The reprojection error is defined using the coordinates of features or landmarks within images in the above discussion. However, it requires a feature detection process, which ignores most parts of an image as the features are only extracted sparsely. The feature extraction process is often badly conditioned, noisy and not robust therefore relying on higher level robust estimation techniques. Since all these estimation steps are not on the level of raw image measurements (intensities), they systematically propagate feature extraction errors and accumulate drifts.
重投影误差是使用上述讨论中影像中要素或地标的坐标定义的。但是，它需要一个特征检测过程，该过程忽略图像的大部分部分，因为特征只是稀疏地提取。特征提取过程通常条件恶劣、噪声大且不鲁棒，因此依赖于更高水平的鲁棒估计技术。由于所有这些估计步骤都不在原始图像测量（强度）的水平上，因此它们系统地传播特征提取误差并累积漂移。

Appearance and optical flow-based techniques, on the other hand, are image-based and minimise an error directly based on raw image measurements,[21] i.e. the photometric(brightness or intensity) function is used, and therefore are called direct or dense methods. Dense methods aim at using the whole image for alignments. Non-linear optimisation techniques are used to find the transformation between two scenes. It is increasingly clear that it is possible to get more complete, accurate and robust results using dense methods for both mapping and tracking. The work in [22,23] minimises the photometric error via image alignment for visual odometry using Kinect RGB-D cameras. The work in [24] includes both of photometric error and depth error to minimise.
另一方面，基于外观和光流的技术是基于图像的，并且直接基于原始图像测量来最小化误差，[21]即使用光度（亮度或强度）功能，因此称为直接或密集方法。密集方法旨在使用整个图像进行对齐。非线性优化技术用于查找两个场景之间的转换。越来越清楚的是，使用密集的映射和跟踪方法可以获得更完整、更准确和更可靠的结果。[22，23] 中的工作通过使用 Kinect RGB-D 相机进行视觉测程的图像对齐，最大限度地减少了光度误差。[24]中的工作包括光度误差和深度误差，以尽量减少。

With RGB-D camera available, such as Kinect sensors, the depth information of a pixel makes the alignment of multiple scans possible by minimising distance measures between all of the data in each image, rather than limited number of features or landmarks. Such dense scan is able to reconstruct surface scenes in the environment and track the pose. Given two successive dense depth measurements, surfaces or 3D point clouds of the same static scene observed from different viewpoints, one can find the change in pose of the camera by obtaining the rigid transformation that best maps one point cloud onto the other.
借助可用的 RGB-D 相机（如 Kinect 传感器），像素的深度信息通过最小化每个图像中所有数据之间的距离测量值（而不是限制数量的要素或地标），使多次扫描的对齐成为可能。这种密集扫描能够重建环境中的表面场景并跟踪姿势。给定从不同视点观察的同一静态场景的两个连续密集深度测量、表面或 3D 点云，可以通过获得将一个点云最好地映射到另一个点云的刚性变换来找到相机姿势的变化。

The Iterative Closest Point (ICP) is popular algorithm to match the scans through optimising the rigid transformation.[25] The ICP works in this way: given two corresponding point sets: β = {β1 , · · · , βn } and γ = {γ1 , · · · , γn } and the translation and rotation between them are t and R, respectively, t and R are found by minimising the sum of squared errors:
迭代最近点（ICP）是一种流行的算法，通过优化刚性变换来匹配扫描。[25] ICP 以这种方式工作：给定两个相应的点集：β = {β1 ， · · · ， βn } 和 γ = {γ1 ， · · · ， γn } 并且它们之间的平移和旋转分别为 t 和 R，t 和 R 是通过最小化平方误差的总和找到的：

If the correct correspondences are known, the rotation and translation can be calculated in a closed form. If correct correspondences are unknown as in most cases, it is generally impossible to find the optimal relative rotation and translation in one step. The iterating to find the alignment [25] is sought via finding the closest points’correspondence. The ICP converges if starting positions are close enough. One work using ICP for visual odometry is reported in [26].
如果已知正确的对应关系，则可以以封闭形式计算旋转和平移。如果在大多数情况下正确对应是未知的，通常不可能一步找到最佳的相对旋转和平移。通过查找最近点的对应关系来寻求查找对齐 [25] 的迭代。如果起始位置足够接近，则 ICP 会收敛。[26]报道了1项使用ICP进行视觉里程计的工作。

The dense alignment can also be done by matching current image against a scene model. KinectFusion in [27] is the one which provides a real-time solution for dense alignment via separated mapping and tracking threads in GPU where a truncated signed distance function is employed for scene model. However, performing ICP on the full point cloud is computationally expensive, and does not provide a real-time solution on a general PC.
也可以通过将当前图像与场景模型进行匹配来完成密集对齐。[27] 中的 KinectFusion 通过 GPU 中的分离映射和跟踪线程为密集对齐提供了实时解决方案，其中场景模型采用截断的有符号距离函数。但是，在整个点云上执行ICP的计算成本很高，并且无法在普通PC上提供实时解决方案。

For a monocular camera, the depth information is not available. An inverse depth map is estimated in [15] by minimising a photometric error via non-convex optimisation algorithm in the mapping thread. In the tracking thread, the image alignment is used. Estimating the depth information in a monocular camera is also conducted via a Bayesian filter followed by an optimisation process to smooth the depth map in [28].
对于单眼相机，深度信息不可用。在 [15] 中，通过在映射线程中通过非凸优化算法最小化光度误差来估计反深度图。在跟踪线程中，使用图像对齐。单目相机中的深度信息估计也通过贝叶斯滤波器进行，然后进行优化过程以平滑[28]中的深度图。

To further simplify the computational complexity and maintain the accuracy of dense methods, semi-dense or semi-direct methods have been proposed recently [29,30] for monocular camera. A semi-dense depth map covering all image regions with non-negligible gradients. The inverse depth map is estimated using the Bayesian filter while the tracking is obtained by directly minimising the dense photometric error. As stated in [30], semi-direct or semi-dense methods use hundreds of small patches to increase the robustness and allow for neglecting the patch normals.
为了进一步简化计算复杂度并保持密集方法的准确性，最近提出了用于单目相机的半密集或半直接方法[29，30]。覆盖所有图像区域的半密集深度图，梯度不可忽略。使用贝叶斯滤波器估计反深度图，同时通过直接最小化密集的光度误差来获得跟踪。如[30]所述，半直接或半密集方法使用数百个小斑块来提高鲁棒性并允许忽略贴片法线。

3.3. Inertial measurement term

The measurement from an IMU sensor is the data source to the data-driven 3D rigid motion dynamics in filtering-based approaches, and is fused with the measurement from cameras via Kalman filter. This is tightly coupled as the cross variances between two parts are taken into consideration.[11] The loosely coupled fusion is to maintain a constant processing time by fusing the already estimated pose from visual sensor with the predicted pose from IMU via an EKF.[31] In optimisation-based approaches, the fusion between two parts can also be made tightly, i.e. no explicit pose estimates from camera is required. The predicted result from IMU driven dynamics in Equation (1) is viewed as a Gaussian distribution. The error between predicted results and true state is cast as the square error weighted by the covariance, then added to the cost function as a regularisation term.[19] In term of the Bayesian inference, the regularisation term is viewed as the prior while the image alignment term as the likelihood. The optimised result is a posteriori distribution, which is the one resulted from the ML estimation result of pure image alignment smoothed by the IMU prior.
IMU传感器的测量是基于滤波的方法中数据驱动的3D刚性运动动力学的数据源，并通过卡尔曼滤波器与相机的测量融合在一起。这是紧密耦合的，因为考虑了两个零件之间的交叉方差。[11] 松散耦合融合是通过 EKF 将视觉传感器的已估计姿势与来自 IMU 的预测姿势融合来保持恒定的处理时间。[31] 在基于优化的方法中，两个部件之间的融合也可以紧密进行，即不需要从相机进行明确的姿态估计。公式（1）中IMU驱动动力学的预测结果被视为高斯分布。预测结果与真实状态之间的误差被转换为由协方差加权的平方误差，然后作为正则化项添加到成本函数中。[19] 就贝叶斯推理而言，正则化项被视为先验项，而图像对齐项被视为可能性。优化结果是后验分布，这是由 IMU 先验平滑的纯图像对齐的 ML 估计结果得出的结果。

Links between filtering and optimisation based approaches

Both of filtering-based and optimisation-based approaches can be formed under the Bayesian inference. When the succession of approximation linearisation is necessary, their link can be made explicitly via the iterated EKF. When the approximation linearisation is just a single step, the smoother-based approaches which include a forward pass and a backward pass are equivalent to the optimisation-based approaches which are solved via the Cholesky decomposition of information matrix of least square problems. To reduce the computational complexity, reducing the state variables to be estimated is implemented by maintaining only keyframes or a sliding window. In particular, the sliding window scheme or moving horizon estimation divides the cost into two parts, and one popular way to implement two parts is optimising the result in one part and marginalising out oldest states in another part with an EKF.
基于过滤和基于优化的方法都可以在贝叶斯推理下形成。当需要连续的近似线性化时，可以通过迭代的EKF显式建立它们的链接。当近似线性化只是一个步骤时，基于平滑的方法（包括正向传递和反向传递）等效于通过最小二乘问题的信息矩阵的Cholesky分解求解的基于优化的方法。为了降低计算复杂度，通过仅维护关键帧或滑动窗口来实现减少要估计的状态变量。特别是，滑动窗口方案或移动地平线估计将成本分为两部分，实现两部分的一种流行方法是优化一部分的结果，并使用 EKF 边缘化另一部分的旧状态。

4.1. Iterated EKF update
The core of filtering based approaches is the Kalman filter and the core of optimisation-based approaches is the Gauss–Newton method. The link between them is the iterated EKF (IEKF).[32] An EKF has two steps: prediction and update. Let the result of prediction step is x̂ k ∼ N (x k , P k ) at current time k. The difference between EKF and IEKF is that there is an iterative loop in the update step of IEKF while a single loop is executed in the update step of EKF. It is the iterative loop of IEKF which is able to drive the error caused by model linearisation as close as possible to the counterpart in optimisation-based approaches.
基于滤波的方法的核心是卡尔曼滤波器，基于优化的方法的核心是高斯-牛顿方法。它们之间的链接是迭代的EKF（IEKF）。[32] EKF有两个步骤：预测和更新。设预测步长的结果是当前时间 k 的 x̂ k ∼ N （x k， P k）。EKF 和 IEKF 的区别在于，在 IEKF 的更新步骤中有一个迭代循环，而在 EKF 的更新步骤中执行单个循环。IEKF的迭代循环能够在基于优化的方法中尽可能接近模型线性化引起的误差。

In the following, we will show the equivalence between the iterative loop in update step of IEKF and the Gauss–Newton method of optimisation approaches from maximising likelihood (ML).
在下文中，我们将展示IEKF更新步骤中的迭代循环与最大化似然（ML）优化方法的高斯-牛顿方法之间的等价性。

At time k, the IEKF has x k , x̂ k|k−1 and z k as the current state, the current state estimate and the measurement, respectively. The measurement model is the same as Equation (6) and x̂ k|k−1 ∼ N (x k , P k|k−1 ).
在时间 k 处，IEKF 分别将 x k、x̂ k|k−1 和 z k 作为当前状态、当前状态估计和测量值。测量模型与公式（6）和x̂ k|k−1 ∼ N （x k， P k|k−1）相同。

Define an error vector as in quadratic cost function with a free variable μ.
定义一个误差向量，如在二次成本函数中，使用自由变量μ。

……

In summary, the above iterative loop of Gauss–Newton method is viewed as the update step of IEKF, i.e.
综上所述，上述高斯-牛顿方法的迭代循环被视为IEKF的更新步骤，即

When only one iterative loop is executed, the above is the update step of EKF. Their relationship can be viewed clearly in Figure 3.
当只执行一个迭代循环时，以上是 EKF 的更新步骤。它们之间的关系可以在图3中清楚地看到。

4.2. Smoothing-based approaches
The filtering-based approaches are a sequence iterative process where the current estimated results depend on the current measurements and the past estimated results (the past measurements are condensed into the past estimated results), i.e. future measurements do not make any contributions to the current estimated results. If this is not the case, i.e. future measurements do make contributions to the current estimated results, the smoothing-based approaches should be applied. This is more like a batch algorithm where the estimation is made when all the measurements are available.[33]
基于过滤的方法是一个序列迭代过程，其中当前估计结果取决于当前测量值和过去的估计结果（过去的测量值被压缩为过去的估计结果），即未来的测量值对当前估计结果没有任何贡献。如果不是这种情况，即未来的测量确实对当前的估计结果有所贡献，则应应用基于平滑的方法。这更像是一种批处理算法，当所有测量值都可用时进行估计。[33]

The IMU data-driven dynamics (2) and the camera measurement Equation (6) are still adopted. The full probability is
仍然采用IMU数据驱动的动态（2）和相机测量公式（6）。完全概率是

The MAP problem is
MAP问题是

Linearising f (·) and h(·), the above minimisation problem is rewritten as a group of linear equations in the general form
直线化f（·）和h（·），上述最小化问题被改写为一组一般形式的线性方程

This can be solved by Cholesky decomposition of AT A.[33] This is the perspective of optimisation-based approaches to the batch processing.
这可以通过AT A的Cholesky分解来解决[33]这是基于优化的批处理方法的观点。

On the other hand, the batch processing can be interpreted from the perspective of filtering-based approaches, which is based on stochastic treatment.
另一方面，批处理可以从基于随机处理的基于过滤的方法的角度进行解释。

The Kalman smoother experiences forward and backward passes, as shown in Figure 3. The forward pass computes p(x k |z 0:k ) given the data available so far. The backward pass computes p(x k |z 0 , · · · , z n ) given all the data.
卡尔曼平滑器可体验向前和向后传递，如图 3 所示。前向传递根据目前可用的数据计算 p（x k |z 0：k）。向后传递计算给定所有数据的 p（x k |z 0 ， · · · ， z n）。

Traditionally, the Kalman filter is expressed as
传统上，卡尔曼滤波器表示为

After linearising, it becomes:
线性化后，它变成：

The p(x k |z 0:k ) only takes into account the past information relative to x k . If we incorporate it with the future observations, more refined state estimates can be found. Estimators that take into account both past and future are often called smoother, which can be written as following.
p（x k |z 0：k）只考虑相对于 x k 的过去信息。如果我们将其与未来的观测相结合，可以找到更精细的状态估计。同时考虑过去和未来的估计器通常称为更平滑，可以写成如下。

As an extension to Kalman smoother, it is possible to use an EM algorithm to learn the parameters of the system. The parameters include the noise covariances Q, R and the F, H.
作为卡尔曼平滑的扩展，可以使用EM算法来学习系统的参数。参数包括噪声协方差 Q、R 和 F、H。

Although the above algorithm goes through two passes, there is no attempt to solve a succession of linear approximation to the non-linear problem. If there is no good linearisation point, the bad result would be expected. Also as the trajectory goes longer and longer, the scale of the linearised equations is unmanaged.
尽管上述算法经历了两次传递，但没有尝试求解非线性问题的一系列线性近似。如果没有好的线性化点，则会出现不好的结果。此外，随着轨迹越来越长，线性方程的尺度是不受管理的。

4.3. Marginalisation to keyframes

If all the state variables and features encountered during the course of operation are maintained, like the above scenario, the computational complexity becomes larger and larger with the increasing of exploring trajectory. However, not all the sensory data are useful. There are some of them which contain key information on tracking and mapping, while others could be redundant or contain trivial information which could be ignored. There are two ways to select useful information from the past historical data. One is a fixed window scheme in which only recent N data measurements are kept while all the data before N measurements are simply marginalised out. Another one is to select N data measurements from all the data-set according to some criteria, which is illustrated in Figure 4. The most popular criteria are the critical information contained in the data so that those keyframes are maintained during the entire processing.[19,34]
如果像上述场景一样，保持运行过程中遇到的所有状态变量和特征，计算复杂度随着探索轨迹的增加而变得越来越大。然而，并非所有的感官数据都是有用的。其中一些包含有关跟踪和映射的关键信息，而另一些可能是多余的或包含可以忽略的琐碎信息。有两种方法可以从过去的历史数据中选择有用的信息。一种是固定窗口方案，其中仅保留最近的N个数据测量值，而N个测量值之前的所有数据都被边缘化。另一种方法是根据某些标准从所有数据集中选择 N 个数据测量值，如图 4 所示。最常用的标准是数据中包含的关键信息，以便在整个处理过程中维护这些关键帧。[19,34]

Marginalising out states is equivalent to applying the Schur compliment to the least squared problem. For example,
边缘化状态等同于将舒尔的赞美应用于最小二乘问题。例如

Changing to
更改为

After this forward substitution step, the smaller system is
在此前向替换步骤之后，较小的系统是

Marginalising out the state x 1 will induce dependencies between other states that are dependent on x 1 like A11 and A12 . And marginalising out the oldest pose from the full solution may cause fill-in in three places [35]:
边缘化状态 x 1 将导致依赖于 x 1 的其他状态之间的依赖关系，例如 A11 和 A12 .从完整解决方案中边缘化最古老的姿势可能会导致三个地方的填充[35]：

• between any landmarks that are visible from that pose;
• between the states of the next oldest pose;
• between the next oldest pose and all landmarks seen by the removed pose.
• 从该姿势可见的任何地标之间;
•在下一个最古老姿势的状态之间;
•在下一个最旧的姿势和移除的姿势看到的所有地标之间。

The matrices in the problem solution would become dense and affect the computational efficiency.
问题解中的矩阵会变得密集并影响计算效率。

4.4. Moving horizon estimation
Moving horizon estimation (MHE) approaches separates the MAP cost into two parts. Only recent N terms from step k to κ = k − N + 1 are optimised each step, while the oldest terms are summarised into an arrival cost, which is approximated by an EKF, as shown in Figure 4.
移动地平线估计（MHE）方法将 MAP 成本分为两部分。每一步仅优化从步骤 k 到 κ = k − N + 1 的最近 N 项，而最旧的项汇总为到达成本，该成本由 EKF 近似，如图 4 所示。

Let the time intervals be [0, k − N ] and [κ, k]. The MAP problem is
设时间间隔为 [0， k − N ] 和 [κ， k]。MAP问题是

With the sum of first N terms is an arrival cost, the MHE problem is
前 N 项的总和是到达成本，MHE 问题是

where κ refers to the starting time of MHE window, φκ is the arrival cost at time τ ∈ [κ, k]. The arrival cost can be approximated by an EKF:
其中 κ 是指 MHE 窗口的开始时间，φκ 是时间 τ ∈ [κ， k] 的到达成本。到达费用可以用EKF近似计算：

where the covariance P is propagated by
其中协方差 P 传播者为

State observability and parameter identifiability

The problem which VIO is tackling with is to recover the platform trajectory in the global frame on-line given the measurements from inertial and visual sensors. However, this problem is solved not straightforward in terms of practical implementation. Apart from the complexity of filteringor optimisation-based approaches, some parameters play a crucial role in the success of dynamic state estimation representing the platform trajectory. These parameters include:
VIO正在解决的问题是根据惯性和视觉传感器的测量结果，在线恢复全局框架中的平台轨迹。但是，就实际实施而言，这个问题的解决并不简单。除了基于过滤或优化的方法的复杂性之外，一些参数在表示平台轨迹的动态状态估计的成功中起着至关重要的作用。这些参数包括：

• Camera intrinsic parameters: focal length, principal points and lens distortion.
• IMU parameters: acceleration and gyroscope biases.
• Spatial parameters: the transform between IMU and camera.
• Temporal parameter: the time delay between IMU and camera measurements
• 相机固有参数：焦距、主点和镜头畸变。
• IMU 参数：加速度和陀螺仪偏置。
• 空间参数：IMU 和相机之间的转换。
• 时间参数：IMU 和相机测量之间的时间延迟

They are treated as time-invariant variables except the IMU biases and called parameters of the system. In contrast, the platform trajectory is represented by time-variant variables: states.
它们被视为除IMU偏差之外的时不变变量，并称为系统参数。相比之下，平台轨迹由时变变量表示：状态。

Given only the measurements from IMU and camera, the question whether these states and parameters can be recovered can be answered from the analysis of observability and identifiability. Traditionally, states are dynamic variables and their estimation problem is analysed using observability while parameters are static variables and their calibration is analysed using identifiability. Modelling the parameters with random walk processes is able to analyse them using observability, like the IMU biases.
仅给定来自IMU和相机的测量值，是否可以从可观测性和可识别性分析中回答是否可以恢复这些状态和参数的问题。传统上，状态是动态变量，它们的估计问题使用可观测性进行分析，而参数是静态变量，它们的校准使用可识别性进行分析。使用随机游走过程对参数进行建模能够使用可观测性（如 IMU 偏差）对其进行分析。

5.1. State observability

Observability is a fundamental property which reflects the possibility of estimating states on the basis of input/output data. Let y (t, t0 , x 0 , z(t)) denotes the output trajectory from an initial state x 0 , initial time t0 and measurement reading z(t) for the continuous time system. Two initial states x 10 and 2 x 0 are defined to be indistinguishable if y t, t0 , x 10 , z(t) = y t, t0 , x 20 , z(t) for t0 < t < t N , and all admissible measurement z(t). If states are distinguishable, they can be estimated from the outputs and the known measurements. If states are indistinguishable, they are called unobservable and their corresponding variance will grow without bound and blow up.
可观测性是一个基本属性，它反映了根据输入/输出数据估计状态的可能性。设 y （t， t0， x 0， z（t））表示连续时间系统从初始状态 x 0 、初始时间 t0 和测量读数 z（t）开始的输出轨迹。Two initial states x 10 and 2 x 0 are defined to be indistinguishable if y t, t0 , x 10 , z(t) = y t, t0 , x 20 , z(t) for t0 < t < t N , and all admissible measurement z(t).如果状态是可区分的，则可以根据输出和已知测量值来估计它们。如果状态是不可区分的，它们被称为不可观察的，它们相应的方差将不受限制地增长并爆炸。

Observability of a linear system is a global property that can be determined either from the rank of the observability matrix or from the rank of Gramian matrix. However, observability of a non-linear system is determined locally about a given state.[36] The local observability is stronger than the observability. The local observability distinguishes states from their neighbours. The local weak observability instantaneously distinguishes states from their neighbours without large excursions.[37]
线性系统的可观测性是一个全局属性，可以从可观测性矩阵的秩或格莱米矩阵的秩确定。但是，非线性系统的可观测性是围绕给定状态局部确定的。[36] 局部可观测性强于可观测性。本地可观测性将国家与其邻国区分开来。局部较弱的可观测性立即将国家与其邻国区分开来，而无需进行大型偏移。[37]

The advantage of local weak observability is the availability of Lie derivative algebraic test. For a non-linear system composed by Equations (2) and (6). The Lie derivative of h(·) with respect to f (·) is
局部弱可观测性的优点是李导数代数检验的可用性。对于由方程（2）和（6）组成的非线性系统。h（·）关于 f（·）的李导数是

The Lie derivatives can be defined recursively,
李导数可以递归定义，

The zero-order Lie derivative of any function is the function itself L 0 h(x) = h(x).
任何函数的零阶李导数是函数本身 L 0 h（x） = h（x）。

The observability matrix O is formed by stacking the Lie derivatives of h(x) as its rows.
可观测性矩阵 O 是通过将 h（x）的李导数堆叠为其行而形成的。

The system is locally weakly observable at x 0 if O has full column rank at x 0 . As the state and measurement equations are infinitely smooth, the observability matrix can have infinite number of rows. If sufficient number of rows which are linearly independent can be found, O is full rank and the system is locally weakly observable.
如果 O 在 x 0 处具有完整的列秩，则系统在 x 0 处局部弱可观测。由于状态和测量方程是无限平滑的，因此可观测性矩阵可以有无限行数。如果可以找到足够数量的线性独立行，则 O 是全秩，系统是局部弱可观测的。

Based on the observability rank condition, there are a number of publications which show that the platform pose in global frame is unobservable and the rotation around gravity vector (yaw) is unobservable. Some other states are observable, which include the platform pose in the initial camera frame, the gravity vector in the initial camera frame and the features in the camera frame. The intuitive interpretation is that the visual camera is a bearing only sensor, and the IMU is only a double integrator of pose, which are not able to provide the pose and yaw information in the global frame.
基于可观测性秩条件，有许多出版物表明平台在全局框架中的姿态是不可观测的，围绕重力矢量（yaw）的旋转是不可观测的。其他一些状态是可观察到的，其中包括初始相机帧中的平台姿势、初始相机帧中的重力矢量以及相机帧中的特征。直观的解释是，视觉相机只是一个方位传感器，而IMU只是姿势的双重积分器，它们无法提供全局帧中的姿态和偏航信息。

The observability analysis allows us to find ways to sufficiently excite the system, such as using three non co-linear features with given world coordinates, or pseudo-measurement equations to make some unobservables observable.[1] Further, it has the potential to provide clues on how to make the unobservables blow-up as slowly as possible.
可观测性分析允许我们找到充分激励系统的方法，例如使用具有给定世界坐标的三个非线性特征，或伪测量方程使一些不可观测值可观测。[1]此外，它有可能提供有关如何使不可观察的东西尽可能缓慢地爆炸的线索。

For filtering-based approaches, the linearised system is employed in EKF. However, the observability of linearised system is different to the corresponding non-linear system. More specifically, the yaw is observable in the linearised systems. This unexpected observable DOF leads to an in-consistent result: the estimated covariance is better than actual measurement result. One direction for improving the consistency is to impose a constraint in the erroneously observable direction due to model linearisation.[38]
对于基于滤波的方法，EKF中采用线性化系统。然而，线性化系统的可观测性与相应的非线性系统不同。更具体地说，偏航是线性化系统中可以观察到的。这种意外的可观察自由度会导致不一致的结果：估计的协方差优于实际测量结果。提高一致性的一个方向是由于模型线性化而在错误可观察的方向上施加约束。[38]

The observability was also analysed using a concept of continuous symmetries, which is able to find the analytical derivation of observable state and identifiable parameters.[39]
还使用连续对称性的概念分析了可观测性，该概念能够找到可观测状态和可识别参数的分析推导。[39]

5.2. Parameter identiﬁability

In most of visual related localisation techniques, the camera intrinsic parameters are known in advance. They can also be calibrated on-line, such as the methods introduced in [40].
在大多数视觉相关的定位技术中，相机的内在参数是事先知道的。它们也可以在线校准，例如[40]中介绍的方法。

The IMU biases vary with time, and are modelled as time-variant states in most cases. The observability rank condition shows they are observable.
IMU 偏差随时间变化，在大多数情况下被建模为时变状态。可观测性排名条件显示它们是可观测的。

The spatial parameters between IMU and camera are six DOF transformation and must be known precisely to stabilise the estimation results. They are modelled as random walk processes and treated as time-variant states. The analysis of observability rank condition shows that six DOF camera IMU transformation, along with IMU biases, gravity vector and the metric scene structure are all observable. Full observability requires sufficient excitation, i.e. the platform at least undergoes rotation and acceleration along two IMU axes.[41–43]
IMU和相机之间的空间参数是六自由度变换，必须精确地知道以稳定估计结果。它们被建模为随机游走过程，并被视为时变状态。可观测性等级条件分析表明，6自由度相机IMU变换、IMU偏差、重力矢量和度量场景结构均可观测。完全可观测性需要足够的激励，即平台至少沿两个 IMU 轴进行旋转和加速。[41–43]

The temporal parameter between inertial and visual sensor measurements is the time delay. The local identifiability of time delay is established by constructing constraint equations which involve the time delay and other quantities. The rank of the Jacobian matrices of the constraints is checked for local identifiability. The result shows that the time delay is locally identifiable in general trajectories. The critical trajectories that cause loss of identifiability are characterised.[44] The time delay between inertial and visual sensor measurements is also estimated by a variant of ICP algorithm in [45].
惯性和视觉传感器测量之间的时间参数是时间延迟。通过构建涉及时间延迟和其他量的约束方程来建立时间延迟的局部可识别性。检查约束的雅可比矩阵的秩的局部可识别性。结果表明，在一般轨迹中，时间延迟是局部可识别的。描述了导致可识别性丧失的关键轨迹。[44] 惯性和视觉传感器测量之间的时间延迟也是通过 [45] 中 ICP 算法的变体估计的。

Conclusions

This paper has presented an overview of the state-of-the art visual inertial odometry methods. It was presented from two perspectives: filtering-based and optimisation-based approaches. It was also described by handling visual images with two ways: feature based and dense based. The broad knowledge of EKF and image alignment has been unified into the same framework. The links between two approaches have been characterised by iterated EKF, smoother, and marginalisation, and a deep insight into two approaches has been unveiled. The state observability and parameter identifiability are analysed. The paper has also summarised a range of techniques used in this research area, including EKF, MAP, IEKF, BA, ICP, MHE, etc. In the future, exploring the computational complexity of these variants would provide more intuitive guide to the practitioners in this area. More implementation details should be provided as an integrated part of this work.
本文概述了最先进的视觉惯性里程计方法。它从两个角度提出：基于过滤和基于优化的方法。它还通过两种方式处理视觉图像来描述：基于特征和基于密集。EKF和图像对齐的广泛知识已统一到同一框架中。两种方法之间的联系的特点是迭代的EKF，更平滑和边缘化，并且已经揭示了对两种方法的深刻见解。分析了状态可观测性和参数可识别性。该论文还总结了该研究领域使用的一系列技术，包括EKF，MAP，IEKF，BA，ICP，MHE等。未来，探索这些变体的计算复杂性将为该领域的从业者提供更直观的指导。应提供更多的实施细节，作为这项工作的一个组成部分。

你可能感兴趣的:(SLAM,计算机视觉)

Halcon 和 opencv比有什么区别与优劣 yuanpan opencv 人工智能计算机视觉
Halcon和OpenCV都是机器视觉领域的重要工具，但它们的设计目标、功能特点和适用场景有所不同。以下是两者的详细对比：1.定位与目标用户Halcon：定位：商业机器视觉软件，专注于工业应用。目标用户：工业自动化、质量控制、机器人引导等领域的专业开发者。OpenCV：定位：开源计算机视觉库，适用于通用图像处理和计算机视觉任务。目标用户：学术研究、教育、初创公司以及需要低成本解决方案的开发者。2.
【产品小白】什么是AI产品经理百事不可口y 产品经理的一步一步人工智能产品经理学习产品运营内容运营用户运营
一、AI产品经理的定义与角色定位AI产品经理是人工智能技术与商业应用之间的核心桥梁，负责将复杂的AI技术转化为满足市场需求的产品。需同时具备技术理解力、商业洞察力和用户思维，既要参与算法选型与数据建模，又要定义产品功能与市场策略，是贯穿产品全生命周期的关键角色。与传统互联网产品经理相比，AI产品经理的独特之处在于：技术深度参与：需理解机器学习、自然语言处理（NLP）、计算机视觉等技术原理，并参与数
使用TensorFlow、OpenCV和Pygame实现图像处理与游戏开发 UwoiGit tensorflow opencv pygame
在本篇文章中，我们将介绍如何结合使用TensorFlow、OpenCV和Pygame来进行图像处理和游戏开发。这三个工具在机器学习、计算机视觉和游戏开发领域都非常流行，并且它们的结合可以提供强大的功能和无限的创造力。我们将逐步介绍如何安装和配置这些工具，并提供相关的源代码示例。安装TensorFlowTensorFlow是一个基于数据流图的开源机器学习框架，提供了丰富的工具和库来构建和训练各种深度
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算【超级详细版】 AI筑梦师计算机视觉算法深度学习人工智能机器学习计算机视觉 python
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算1.引言1.1研究背景在计算机视觉、模式识别、医学影像分析和自动驾驶等领域，形状匹配是核心任务之一。然而，现实世界的形状往往存在可变性（Variability），主要体现在以下几个方面：形变（Deformation）：物体可能由于柔性材料、外力作用或生物运动发生非刚性形变。尺度变化（ScaleVariation）：目标形状在不同场景下可能大
基于PyTorch和ResNet18的花卉识别实战（附完整代码）意.远 pytorch 人工智能 python 深度学习
一、项目背景与效果花卉分类是计算机视觉的经典任务。本文使用PyTorch框架，基于ResNet18模型实现了102种花卉的分类任务。完整代码可直接复制运行，最终验证集准确率达8.2%，文中同步分析性能瓶颈与优化方案。二、环境配置与数据准备1.环境要求#主要依赖库importtorchfromtorchimportnn,optimfromtorchvisionimporttransforms,dat
【论文精读】PatchTST-基于分块及通道独立机制的Transformer模型打酱油的葫芦娃时序预测算法时序预测 PatchTST Transformer 预训练微调表征学习
《ATIMESERIESISWORTH64WORDS:LONG-TERMFORECASTINGWITHTRANSFORMERS》的作者团队来自PrincetonUniversity和IBMResearch，发表在ICLR2023会议上。动机Transformer模型因其自注意力机制在处理序列数据方面的优势，在自然语言处理（NLP）、计算机视觉（CV）、语音等多个领域取得了巨大成功。这种机制使得模型
消融实验（Ablation Study） xwhking 深度学习机器学习深度学习消融实验
消融实验（AblationStudy）定义：消融实验是一种科学研究方法，通过逐步移除模型、算法或系统中的某个组件（如模块、层、特征、数据等），观察其对整体性能的影响，从而验证该组件的必要性和有效性。其名称来源于医学领域的“消融术”（切除部分组织以研究功能），在计算机视觉、机器学习和深度学习中被广泛用于分析模型设计。为什么要做消融实验？1.验证组件的有效性核心目的：确认模型中某个设计（如注意力机制、
RAG 在多模态数据处理中的应用探索：结合图像与文本生成 hy098543 AIGC
目录引言多模态数据处理的挑战与需求数据异质性与融合难题多样化应用场景的需求RAG在图像与文本生成中的应用架构图像检索与文本生成协同跨模态特征融合与生成关键技术与实现细节图像特征提取与表示文本检索与语义理解跨模态生成模型训练应用案例分析智能设计辅助医疗影像报告生成结论引言随着信息技术的飞速发展，数据呈现出多模态的特性，即包含文本、图像、音频、视频等多种形式。在自然语言处理（NLP）和计算机视觉（CV
OpenCV多分辨率模板匹配与容错优化实战指南追寻向上 opencv 人工智能计算机视觉
第一章：问题背景与挑战1.1传统模板匹配的局限性模板匹配（TemplateMatching）是计算机视觉中基础且广泛使用的技术，其核心思想是通过滑动窗口在目标图像中寻找与模板最相似的位置。然而，传统方法（如OpenCV的cv2.matchTemplate）在实际应用中存在以下问题：尺寸敏感性当目标的实际尺寸与模板不一致时，匹配结果会严重偏离。例如，在工业检测中，摄像头与物体的距离变化会导致目标缩放
AI API：快速集成智能化功能的开发利器桂花饼 AIGC AI API 人工智能 AIGC 语言模型 AI作画
AIAPI（ArtificialIntelligenceApplicationProgrammingInterface，人工智能应用程序接口）是应用程序接口的一种，专门用于提供人工智能相关功能的开发接口。它允许开发者利用现有的AI模型、工具或服务，将这些功能集成到自己的应用程序中，并为用户带来智能化的体验。AIAPI的核心功能主要与AI技术相关，比如自然语言处理（NLP）、计算机视觉、语音处理、机
图像分割基础：使用Python和scikit-image库 0dayNu1L 机器学习项目实战 python 人工智能机器学习
大家好，今天我们将一起探讨图像分割的基础知识，并使用Python编程语言以及scikit-image库来实现一个简单的图像分割示例。图像分割是图像处理中的一项重要技术，它允许我们将图像划分为多个部分或对象，这对于图像分析和计算机视觉任务至关重要。0dayNu1L-CSDN博客目录一、环境准备二、图像分割示例1.导入必要的库2.读取并显示图像3.创建标签数组并进行阈值分割4.使用颜色表示标签三、结果
【人工智能基础2】Tramsformer架构、自然语言处理基础、计算机视觉总结 roman_日积跬步-终至千里人工智能习题人工智能自然语言处理计算机视觉
文章目录七、Transformer架构1.替代LSTM的原因2.Transformer架构：编码器-解码器架构3.Transformer架构原理八、自然语言处理基础1.语言模型基本概念2.向量语义3.预训练语言模型的基本原理与方法4.DeepSeek基本原理九、计算机视觉七、Transformer架构1.替代LSTM的原因处理极长序列时，效率下降：虽然LSTM设计的初衷是解决长期依赖问题，即让模型
【北上广深杭大厂AI算法面试题】计算机视觉篇...详解目标检测中的多尺度训练和测试? 努力毕业的小土博^_^ AI算法题库人工智能计算机视觉算法深度学习神经网络目标检测
【北上广深杭大厂AI算法面试题】计算机视觉篇…详解目标检测中的多尺度训练和测试?【北上广深杭大厂AI算法面试题】计算机视觉篇…详解目标检测中的多尺度训练和测试?文章目录【北上广深杭大厂AI算法面试题】计算机视觉篇...详解目标检测中的多尺度训练和测试?前言多尺度训练核心思想：优点与注意点：多尺度测试核心思想：优点与注意点：综合作用参考示例总结欢迎铁子们点赞、关注、收藏！祝大家逢考必过！逢投必中！上
AI 大模型应用数据中心建设：高性能计算与存储架构 AI智能涌现深度研究 AI大模型应用入门实战与进阶 java python javascript kotlin golang 架构人工智能
AI大模型、数据中心、高性能计算、存储架构、分布式训练、GPU加速、数据管理1.背景介绍近年来，人工智能（AI）技术取得了飞速发展，特别是深度学习模型的突破性进展，催生了一系列基于大规模数据训练的强大AI模型，例如GPT-3、BERT、DALL-E等。这些AI大模型在自然语言处理、计算机视觉、语音识别等领域展现出强大的应用潜力，但也对计算资源和数据存储提出了极高的要求。传统的计算架构难以满足AI大
AI大模型学习路线：从入门到精通的完整指南【2025最新】 AI大模型-大飞人工智能学习大模型 LLM AI 程序员大模型学习
引言近年来，以GPT、BERT、LLaMA等为代表的AI大模型彻底改变了人工智能领域的技术格局。它们不仅在自然语言处理（NLP）任务中表现卓越，还在计算机视觉、多模态交互等领域展现出巨大潜力。本文旨在为开发者、研究者和技术爱好者提供一条清晰的学习路径，帮助读者逐步掌握大模型的核心技术并实现实际应用。一、基础阶段：构建知识体系数学与理论基础线性代数：矩阵运算、特征值与奇异值分解是大模型参数优化的基础
探索未来架构：基于AWS的响应式微服务框架柏赢安Simona
探索未来架构：基于AWS的响应式微服务框架reactive-refarch-cloudformationReactiveMicroservicesArchitectureswithAmazonECS,AWSLambda,AmazonKinesisStreams,AmazonElastiCache,andAmazonDynamoDB项目地址:https://gitcode.com/gh_mirror
YOLOv5+UI界面在车辆检测中的应用与实现深度学习&目标检测实战项目 YOLOv5实战项目 YOLO ui 分类数据挖掘目标跟踪人工智能
1.引言随着智能交通系统（ITS）的快速发展，车辆检测已成为计算机视觉领域的重要研究方向。车辆检测技术广泛应用于交通流量监控、车辆违章抓拍、无人驾驶等场景中。近年来，深度学习技术的突破，特别是卷积神经网络（CNN）的崛起，使得目标检测技术取得了显著进展。其中，YOLO（YouOnlyLookOnce）系列模型以其高效的实时检测能力和出色的性能成为车辆检测领域的首选方法之一。在本文中，我们将基于YO
热门AI创作助手推荐【第一期】量子星澜文心一言 AI写作 chatgpt
星游AI创作助手人工智能在现代科技中的应用非常广泛，涵盖了诸多领域，包括但不限于以下几个方面：1.语音识别和自然语言处理：人工智能技术被广泛应用于语音识别和自然语言处理领域，例如智能助手、翻译系统、语音交互系统等。2.机器学习和数据分析：人工智能的机器学习算法被用于数据分析、预测建模、用户个性化推荐等领域，帮助企业做出更准确的商业决策。3.计算机视觉：人工智能在计算机视觉领域的应用包括图像识别、视
计算机视觉入门 109702008 人工智能 #深度学习计算机视觉人工智能
计算机视觉（ComputerVision）是一门涉及使机器能够从图像或者多维数据中提取信息，解释、理解并对物体或场景进行处理的学科。以下是一个基本的计算机视觉入门学习路线，旨在为刚刚接触这一领域的学习者提供指导。1.基础知识储备数学基础：线性代数、概率论和数理统计、微积分、优化理论。编程语言：掌握至少一门编程语言，Python是目前在计算机视觉领域最流行的语言，其次是C++。2.计算机视觉基础数字
计算机视觉（Computer Vision, CV）的入门到实践的详细学习路线云梦优选计算机数据库大数据计算机视觉学习人工智能
一、基础准备1.数学基础线性代数深入矩阵运算，理解矩阵乘法、转置、逆等基本概念。掌握特征值与特征向量的几何意义，理解其在图像压缩、特征提取中的应用。学习奇异值分解（SVD）及其在降维和数据压缩中的具体应用。概率与统计熟悉贝叶斯定理及其在分类任务中的应用，如朴素贝叶斯分类器。理解常见概率分布（如正态分布、二项分布）及其性质。学习统计推断方法，如假设检验、置信区间估计，以评估模型性能。微积分掌握梯度、
机器视觉|手势识别：基于YOLOv5的手部检测与MediaPipe的关键点估计 RockLiu@805 机器视觉 YOLO
手势识别：基于YOLOv5的手部检测与MediaPipe的关键点估计在实时计算机视觉应用中，手部检测与关键点估计是实现手势识别的重要基础。本文将介绍一种基于深度学习的手势识别技术方案，通过结合YOLOv5物体检测网络和MediaPipe关键点检测框架，实现实时的手部定位与关键点提取。技术背景gesturerecognition作为计算机视觉领域的重要研究方向，在HCI（人机交互）、遥控行为分析、虚
Python 在人工智能领域的实际6大案例 Solomon_肖哥弹架构人工智能机器学习 python
Python作为一种功能强大且易于学习的编程语言，在人工智能（AI）领域得到了广泛的应用。从机器学习到深度学习，从自然语言处理到计算机视觉，Python提供了丰富的库和框架，使得开发者能够快速实现各种AI应用。本文将通过多个实际案例，展示Python在人工智能领域的强大功能和应用前景。二、案例一：手写数字识别（MNIST）1.背景介绍手写数字识别是机器学习领域的经典入门项目，MNIST数据集包含了
《Python深度学习》第四讲：计算机视觉中的深度学习 earthzhang2021 2025讲书课专栏 python 深度学习计算机视觉 1024程序员节 numpy 算法人工智能
计算机视觉是深度学习中最酷的应用之一，它让计算机能够像人类一样“看”和理解图像。想象一下，计算机可以自动识别照片中的物体、人脸，甚至可以读懂交通标志。这一切听起来是不是很神奇？其实，这一切都离不开深度学习中的卷积神经网络（CNN）。今天，我们就来深入了解一下CNN是如何工作的。5.1卷积神经网络简介先来看下卷积神经网络（CNN）是什么。CNN是一种专门用于处理图像数据的神经网络。它的灵感来源于人类
解决OpenCV读取目标图像，cv2.imshow出现闪退的问题写python的鑫哥 OpenCV入门与进阶 opencv 人工智能计算机视觉 python 图像显示闪退
前言本文是该专栏的第17篇，后面将持续分享OpenCV计算机视觉的干货知识，记得关注。最近有粉丝朋友询问到OpenCV读取目标图像出现的一个问题，在基于python语言“使用OpenCV读取目标图像的时候，利用cv2.imshow函数出现闪退”的情况。而本文，笔者将详细介绍针对上述问题，给出一个详细的应对思路以及解决方法。废话不多说，具体的细节部分以及详细的解决方案，跟着笔者直接往下看正文详细内容
模型蒸馏：从复杂到精简，AI技术的“瘦身”秘籍 lmtealily 人工智能
引言在人工智能的浪潮中，大型模型如BERT、GPT系列等在自然语言处理（NLP）、计算机视觉（CV）等领域取得了显著的成果。然而，这些“庞然大物”通常拥有数十亿甚至数千亿个参数，计算和存储成本极高，难以部署到资源受限的设备上。为了解决这一问题，模型蒸馏技术应运而生。模型蒸馏是一种将大型复杂模型的知识迁移到小型简单模型的技术，旨在保持高性能的同时大幅减少模型的参数量和计算复杂度。本文将带你深入了解模
OpenCV 深度学习模块 cv2.dnn 与其他深度学习框架的优缺点对比及适用场景白.夜深度学习 opencv
OpenCV提供了一个深度学习模块cv2.dnn，让开发者能够在计算机视觉项目中轻松加载和推理深度学习模型。相比于TensorFlow、PyTorch等其他深度学习框架，cv2.dnn有其独特的优点与缺点，适用于不同的应用场景。在这篇文章中，我们将详细分析cv2.dnn的优缺点，并讨论它的适用场景。一、cv2.dnn的优点1.简单易用cv2.dnn提供了一个相对简单且易于使用的接口，适合已经在使用
计算机视觉算法实战——驾驶员玩手机检测（主页有源码）喵了个AI 计算机视觉实战项目计算机视觉算法智能手机
✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨1.领域简介：玩手机检测的重要性与技术挑战驾驶员玩手机检测是智能交通安全领域的核心课题。根据NHTSA数据，美国每年因手机使用导致的交通事故超过3000起，中国公安部的统计显示开车使用手机的事故率是正常驾驶的23倍。该技术通过实时监测驾驶员手部动作和视线方向，识别非法使用手机行为，在以
目标检测中衡量模型速度和精度的指标：FPS和mAP asdfg1258963 目标检测_ai 目标检测人工智能
“FPS”和“mAP”分别衡量了模型的速度和精度。FPS（FramesPerSecond）定义：FPS是“每秒传输帧数”的缩写，用于衡量计算机视觉系统（如目标检测、图像识别等）的实时性能。它表示系统每秒钟能够处理的图像或视频帧的数量。重要性：在实时应用中，如自动驾驶、视频监控等，FPS是一个关键指标。高FPS意味着系统能够快速处理输入的图像数据，实现实时响应。计算方式：FPS可以通过以下公式计算：
CIR-DFENet：结合跨模态图像表示和双流特征增强网络进行活动识别是Dream呀神经网络计算机视觉人工智能神经网络深度学习
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学业升学和求职工作的先行者！【优惠信息】•新专栏订阅前200名享9.9元优惠•订阅量破200
cv君独家视角 | AI内幕系列七：EfficientViT模型：基于多尺度线性注意力模块，实现高效的高分辨率密集预测 cv君 cv君独家视角 AI内幕系列原创项目级实战项目深度学习与计算机视觉精品 1024程序员节 EfficientViT 高分辨率密集预测任务高分辨率视觉模型 Transformer 人工智能计算机视觉
专题概况cv君独家视角|AI内幕系列是一个专注于人工智能领域的深度专题，旨在为读者揭开AI所有领域技术的神秘面纱，展示其背后的科学原理和实际应用。通过一系列精心策划的文章，我们将带您深入了解AI的各个领域，从计算机视觉到文本语音等多模态领域，从基础理论到前沿技术，从行业应用到未来趋势。无论您是AI领域的工程师或者专家，还是对这一领域充满好奇的读者，这个系列都将为您提供高价值的见解和启发，为您带来横
java线程Thread和Runnable区别和联系 zx_code java jvm thread 多线程 Runnable
我们都晓得java实现线程2种方式，一个是继承Thread，另一个是实现Runnable。模拟窗口买票，第一例子继承thread，代码如下 package thread; public class ThreadTest { public static void main(String[] args) { Thread1 t1 = new Thread1(
【转】JSON与XML的区别比较丁_新 json xml
1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ，用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。 XML使用DTD(document type definition)文档类型定义来组织数据;格式统一，跨平台和语言，早已成为业界公认的标准。 XML是标
c++ 实现五种基础的排序算法 CrazyMizzz C++c 算法
#include<iostream> using namespace std; //辅助函数，交换两数之值 template<class T> void mySwap(T &x, T &y){ T temp = x; x = y; y = temp; } const int size = 10; //一、用直接插入排
我的软件麦田的设计者我的软件音乐类娱乐放松
这是我写的一款app软件，耗时三个月，是一个根据央视节目开门大吉改变的，提供音调，猜歌曲名。1、手机拥有者在android手机市场下载本APP，同意权限，安装到手机上。2、游客初次进入时会有引导页面提醒用户注册。（同时软件自动播放背景音乐）。3、用户登录到主页后，会有五个模块。a、点击不胫而走，用户得到开门大吉首页部分新闻，点击进入有新闻详情。b、
linux awk命令详解被触发 linux awk
awk是行处理器: 相比较屏幕处理的优点，在处理庞大文件时不会出现内存溢出或是处理缓慢的问题，通常用来格式化文本信息 awk处理过程: 依次对每一行进行处理，然后输出 awk命令形式: awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file [-F|-f|-v]大参数，-F指定分隔符，-f调用脚本，-v定义变量 var=val
各种语言比较 _wy_ 编程语言
Java Ruby PHP 擅长领域
oracle 中数据类型为clob的编辑知了ing oracle clob
public void updateKpiStatus(String kpiStatus,String taskId){ Connection dbc=null; Statement stmt=null; PreparedStatement ps=null; try { dbc = new DBConn().getNewConnection(); //stmt = db
分布式服务框架 Zookeeper -- 管理分布式环境中的数据矮蛋蛋 zookeeper
原文地址： http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/ 安装和配置详解本文介绍的 Zookeeper 是以 3.2.2 这个稳定版本为基础，最新的版本可以通过官网 http://hadoop.apache.org/zookeeper/来获取，Zookeeper 的安装非常简单，下面将从单机模式和集群模式两
tomcat数据源 alafqq tomcat
数据库 JNDI(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。没有使用JNDI时我用要这样连接数据库： 03. Class.forName("com.mysql.jdbc.Driver"); 04. conn
遍历的方法百合不是茶遍历
遍历在java的泛
linux查看硬件信息的命令 bijian1013 linux
linux查看硬件信息的命令一.查看CPU： cat /proc/cpuinfo 二.查看内存： free 三.查看硬盘： df linux下查看硬件信息 1、lspci 列出所有PCI 设备； lspci - list all PCI devices:列出机器中的PCI设备（声卡、显卡、Modem、网卡、USB、主板集成设备也能
java常见的ClassNotFoundException bijian1013 java
1.java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory 添加包common-logging.jar2.java.lang.ClassNotFoundException: javax.transaction.Synchronization
【Gson五】日期对象的序列化和反序列化 bit1129 反序列化
对日期类型的数据进行序列化和反序列化时，需要考虑如下问题： 1. 序列化时，Date对象序列化的字符串日期格式如何 2. 反序列化时，把日期字符串序列化为Date对象，也需要考虑日期格式问题 3. Date A -> str -> Date B,A和B对象是否equals 默认序列化和反序列化 import com
【Spark八十六】Spark Streaming之DStream vs. InputDStream bit1129 Stream
1. DStream的类说明文档： /** * A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous * sequence of RDDs (of the same type) representing a continuous st
通过nginx获取header信息 ronin47 nginx header
1. 提取整个的Cookies内容到一个变量，然后可以在需要时引用，比如记录到日志里面， if ( $http_cookie ~* "(.*)$") { set $all_cookie $1; } 变量$all_cookie就获得了cookie的值，可以用于运算了
java-65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 bylijinnan java
参考了网上的http://blog.csdn.net/peasking_dd/article/details/6342984 写了个java版的： public class Print_1_To_NDigit { /** * Q65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 * 1.使用字符串
Netty源码学习-ReplayingDecoder bylijinnan java netty
ReplayingDecoder是FrameDecoder的子类，不熟悉FrameDecoder的，可以先看看 http://bylijinnan.iteye.com/blog/1982618 API说，ReplayingDecoder简化了操作，比如： FrameDecoder在decode时，需要判断数据是否接收完全： public class IntegerH
js特殊字符过滤 cngolon js特殊字符 js特殊字符过滤
1.js中用正则表达式过滤特殊字符, 校验所有输入域是否含有特殊符号function stripscript(s) { var pattern = new RegExp("[`~!@#$^&*()=|{}':;',\\[\\].<>/?~！@#￥……&*（）——|{}【】‘；：”“'。，、？]"
hibernate使用sql查询 ctrain Hibernate
import java.util.Iterator; import java.util.List; import java.util.Map; import org.hibernate.Hibernate; import org.hibernate.SQLQuery; import org.hibernate.Session; import org.hibernate.Transa
linux shell脚本中切换用户执行命令方法 daizj linux shell 命令切换用户
经常在写shell脚本时，会碰到要以另外一个用户来执行相关命令，其方法简单记下： 1、执行单个命令：su - user -c "command" 如：下面命令是以test用户在/data目录下创建test123目录 [root@slave19 /data]# su - test -c "mkdir /data/test123"
好的代码里只要一个 return 语句 dcj3sjt126com return
别再这样写了：public boolean foo() { if (true) { return true; } else { return false;
Android动画效果学习 dcj3sjt126com android
1、透明动画效果方法一：代码实现 public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { View rootView = inflater.inflate(R.layout.fragment_main, container, fals
linux复习笔记之bash shell (4)管道命令 eksliang linux管道命令汇总 linux管道命令 linux常用管道命令
转载请出自出处： http://eksliang.iteye.com/blog/2105461 bash命令执行的完毕以后，通常这个命令都会有返回结果，怎么对这个返回的结果做一些操作呢？那就得用管道命令‘|’。上面那段话，简单说了下管道命令的作用，那什么事管道命令呢？答：非常的经典的一句话，记住了，何为管
Android系统中自定义按键的短按、双击、长按事件 gqdy365 android
在项目中碰到这样的问题：由于系统中的按键在底层做了重新定义或者新增了按键，此时需要在APP层对按键事件（keyevent）做分解处理，模拟Android系统做法，把keyevent分解成： 1、单击事件：就是普通key的单击； 2、双击事件：500ms内同一按键单击两次； 3、长按事件：同一按键长按超过1000ms（系统中长按事件为500ms）； 4、组合按键：两个以上按键同时按住；
asp.net获取站点根目录下子目录的名称 hvt .net C#asp.net hovertree Web Forms
使用Visual Studio建立一个.aspx文件(Web Forms)，例如hovertree.aspx,在页面上加入一个ListBox代码如下： <asp:ListBox runat="server" ID="lbKeleyiFolder" /> 那么在页面上显示根目录子文件夹的代码如下： string[] m_sub
Eclipse程序员要掌握的常用快捷键 justjavac java eclipse 快捷键 ide
判断一个人的编程水平，就看他用键盘多，还是鼠标多。用键盘一是为了输入代码（当然了，也包括注释），再有就是熟练使用快捷键。曾有人在豆瓣评《卓有成效的程序员》：“人有多大懒，才有多大闲”。之前我整理了一个程序员图书列表，目的也就是通过读书，让程序员变懒。写道程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可
c++编程随记 lx.asymmetric C++笔记
为了字体更好看，改变了格式…… &&运算符： #include<iostream> using namespace std; int main(){ int a=-1,b=4,k; k=(++a<0)&&!(b--
linux标准IO缓冲机制研究音频数据 linux
一、什么是缓存I/O(Buffered I/O)缓存I/O又被称作标准I/O,大多数文件系统默认I/O操作都是缓存I/O。在Linux的缓存I/O机制中，操作系统会将I/O的数据缓存在文件系统的页缓存(page cache)中，也就是说，数据会先被拷贝到操作系统内核的缓冲区中，然后才会从操作系统内核的缓冲区拷贝到应用程序的地址空间。1.缓存I/O有以下优点:A.缓存I/O使用了操作系统内核缓冲区，
随想生活暗黑小菠萝生活
其实账户之前就申请了，但是决定要自己更新一些东西看也是最近。从毕业到现在已经一年了。没有进步是假的，但是有多大的进步可能只有我自己知道。毕业的时候班里12个女生，真正最后做到软件开发的只要两个包括我，PS：我不是说测试不好。当时因为考研完全放弃找工作，考研失败，我想这只是我的借口。那个时候才想到为什么大学的时候不能好好的学习技术，增强自己的实战能力，以至于后来找工作比较费劲。我
我认为POJO是一个错误的概念 windshome java POJO 编程 J2EE 设计
这篇内容其实没有经过太多的深思熟虑，只是个人一时的感觉。从个人风格上来讲，我倾向简单质朴的设计开发理念；从方法论上，我更加倾向自顶向下的设计；从做事情的目标上来看，我追求质量优先，更愿意使用较为保守和稳妥的理念和方法。 &