目录
0、论文信息
1、 Overview
2、A.Robust Visual SLAM
2.1、Motion Segmentation
2.1.1、Background/Foreground Initialization
2.1.2、Geometric Constraints—与翻转检测对应
2.1.3、Optical Flow
2.1.4、Ego-Motion Constraints
2.1.5、Deep Learning
2. 2、Localization and 3D Reconstruction
2.2.1、 Feature Based
2.2.2、Deep Learning
3、 B.Dynamic Object Segmentation and 3D Tracking
3.1、Dynamic Object Segmentation
3.1.1、Statistical Model Selection—与翻转前景 && 前景tracking对应
3.1.2、Subspace Clustering
3.1.3、Geometry
3.1.4、Deep Learning
3.2、3D Tracking of Dynamic Objects
3.2.1、 Trajectory Triangulation
3.2.2、Particle Filter
4、Joint Motion Segmentation and Reconstruction
4.1、 Factorization
4.1.1、Multibody Structure from Motion (MBSfM)
4.1.2、 Nonrigid Structure from Motion (NRSfM)—略
Saputra, Muhamad R U , Markham, et al. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey.
发表于2018年,目前找到的引用较高的关于动态场景的处理(slam && 3d reconstruction)
overall framework
2. 相关方法评价表
3. note:相关标记-方法要点、方法优点、方法缺点
This approach will work well if the static features are in the majority. When the dynamic objects in front of the camera are dominant or the captured scene is occluded by a large moving object, these types of approaches may fail
inertial measurement unit (IMU)-- ego-motion.
prior knowledge(B/F)
real-time(NO)-- moving objects(exhaustively match)
F:tracking-by-detection[14,89]
B:background subtraction; Initialization: without F
temporarily stationary.—retrack, move again
degenerate motion
4类几何约束:
极线:should lie on the corresponding epipolar line
geometric model(F):model-similarity
三角化:noise, back-projected rays from the tracked features do not meet—与立体匹配剔除背景有关
重投影误差:distance(pixel ), appearance differences
2. degenerate motion:moves along the epipolar line---Flow Vector Bound (FVB)
3. there is no additional computational burden in performing the segmentation, and thus real-time implementation is common
4. handle temporary stopping(NO)--only motion
5. approach cannot differentiate between the residual error caused by the moving object or caused by the false correspondence (outliers) since both conditions result in high geometric errors
6. motion in degenerate(NO)
consecutive images
motion metric computed from the optical flow
The graph-cut algorithm is utilized to segment the moving objects based on the motion metric
scene flow (3D version of optical flow--Mahalanobis distance—residual(low—static object)
real time
brightness constancy assumption, which is sensitive to changes in lighting conditions [62].
sensitive to a large pixel movement
degenerate motion(NO)
assuming that the camera moves according to particular parameterization(planar and circular)
temporary stopping(NO)
classifying static features can be done by fitting feature points that match with the camera motion constraints
real time.
degenerate motion(OK)
estimating optical flow
scene flow estimation--stereo images
geometric features--spatiotemporal features cannot learn
only static features resulting from techniques described in Section 3.1 are employed. All dynamic features are regarded as outliers and excluded from the computation
matching:
short baselines: optical flow-based techniques
long baselines: e.g., SIFT [99], SURF [8], BRIEF [17], BRISK [91], etc.
outliers: e.g., RANSAC [37], PROSAC [22], MLESAC [158], etc.
3. 优化求解:
midpoint method [9] or least-square-based method-- drifting problem
BA:Gauss-Newton method(Gauss-Newton method)
V-SLAM中常用的优化求解方法
local bundle adjustment
PTAM: choosing key frames, different threads
binary descriptors
metric topological mapping such that large-scale mapping can operate in real time
ORB-SLAM [113]( parallel computing, ORB features [131], statistical model selection [155], loop closures based on bag-of-words place recognition [26, 41], local bundle adjustment [111], and graph optimization [81])
End-to-end: pose estimation(OK)
(1)Supervised Learning
classification problem over the discretized space of translation and rotation of the camera
regression network
optical flow-based networks
(2)Unsupervised Learning
Instead, the network learns to predict the camera pose by minimizing the photometric error similar
End-to-end:3D reconstruction(NO)—depth(OK)
clusters feature correspondences into different groups based on their motion and tracks their trajectories in 3D
multibody motion segmentation [73, 132, 153]or eorumotion segmentation [133]) clusters all feature correspondences into n number of different object motions
In order to estimate the motion of the object, the features should be clustered first; on the other hand, the motion models for all moving objects are required to cluster the features. The problem is compounded by the presence of noise, outliers, or missing feature correspondences due to occlusion, motion blur, or losing tracked features
degenerate motion
consecutive images
Motion models can be based on one of the following categories: fundamental matrix (F), affine fundamental matrix (FA), essential matrix (E), homography/projectivity (H), or affinity (A).
Sample-fit(best)-remaining: sample again(repete)
(1) Sample iteration: RANSAC [37] or the Monte-Carlo
(2) Find the Best Model:
Akaike’s information criterion:(likelihood && number of parameters)
Bayes Information Criterion (BIC)
其他:
Minimum Description Length (MDL)
Geometric Information Criterion (G-AIC, or in some literature called GIC)
Geometrically Robust Information Criterion (GRIC)
4. several perspective image(VS two image sequences)
temporal coherence is enforced by connecting only essential matrices with similar inlier sets
5. degenerate motion(OK)
6. the number of moving objects is automatically captured when the whole data is described by n different motion models
7. noise and outliers are automatically tackled
8. real-time(NO)
9. fitting a motion model from randomly sampled data is computationally expensive
10. Finally, dependent motion remains a challenging problem for statistical model selection since a group of features can be part of two different motion models.
estimating the subspace parameters and clustering the data into different subspaces should be done simultaneously
independent rigid body motion lies in a linear subspace-- rank constraint-- each linear subspace can be recovered.
but a subspace is fitted instead of a motion model—AIC—merging two subspaces into one group.
multiple linear subspace-- Generalized Principal Component Analysis (GPCA)
a linear subspace-- PCA
fitting – finding normal(subspace)—segmentation(similarity—normal vectors—spectral clustering)
independent, articulated, rigid, nonrigid, degenerate, and nondegenerate motions-- Local Subspace Affinity (LSA)
Projection into a lower-dimensional subspace is also carried out before the subspace is estimated
cheaper in computation
dependent motions
cannot run sequentially (except [161, 185]) or in real time since they need the whole sequence to be available before processing (batch mode)
Information about the number of motions in the scene or the dimension
affine camera model(perspective effect-- a motion might lie in a nonlinear manifold--NO)
noise [35, 177], outliers [35, 185], and missing data
there are a set of fundamental matrices {Fi } associated with each moving object such that the following multibody epipolar constraint is satisfied
multibody epipolar constraints
(1)
bilinear problem again by mapping the polynomial equation into a vector containing Mn monomials using the veronese map
(2)If n is known, reordering,be estimated by least squares
(3)Subsequently, the motion segmentation of dynamic features can be done by assigning each feature correspondence with the correct fundamental matrix [167]
3. extended the multibody SfM formulation from two views into three views by introducing the multibody trilinear constraint and multibody trifocal tensor.(区别不大,F变为Tensor)
4. perspective camera model
5. handle degenerate motion(NO)image pairs-- grows exponentially with respect to the number of motions
6. grows exponentially with respect to the number of motions
predefined number of rigid body motions
produce dense object masks
object trajectory is known or satisfies a parametric form(unknown 3D line)-- finding a 3D line that intersects projected rays from t views
handle outliers and missing data(NO)
rigid body motion
其他运动轨迹假设:a conic section,curve
Prior knowledge about the camera motion is not needed, although some approaches [5, 6] assume that the camera pose is available
The particles are spread along the ray of projection and are constrained by the estimated/predefined ground plane and maximum/minimum allowed depth value.
(默认以static物体作为全局坐标参考)
Bearing-only-Tracking (BOT) problem(monocular camera)
moving object is segmented—单视图(particles are spread uniformly along the ray of projection)
the weight of the particle is updated by projecting each particle into the current frame and computing the projection error compared to the actual feature position.
Particle filters are probably the only technique for doing 3D reconstruction and tracking of dynamic objects that can work in real time so far
object trajectory is not needed
nonrigid or articulated reconstruction(NO)
factorization can do both simultaneously
short sequences of static scenes, a measurement matrix, a matrix containing all tracked feature points through all frames, is at most of rank four (or rank three if using the orthographic projection model under Euclidean coordinates)
W:featue M:motion S: shape(3D structure)
(SVD分解- O(fp2) complexity)
(contains n motions,Without noise, each Wi,where i = {1, 2, ... , n}, lies in a subspace of at most rank four [25]. Then, as eachWi can be factorized into a motion and shape matrix)
camera motion is not needed
orthography or the affine camera mode(perspective effect--NO)
real time(NO)
prior knowledge, such as the number of moving objects in the scene, rank of the measurement matrix, or the dimension of the object
sensitive to noise and outliers
missing data
同上(W分解)
cluster the structure by maximizing the sum-of-squares entries of a block diagonal subject to the constraint that each block represents a physical object
projective cameras—trickier