更新于2018.8
回环检测的关键在于重新识别到同一个地方,这块领域被称为位置识别(Place Recognition)。
此文主要总结大视角变化和场景变化中的位置识别。
位置识别模块分为三个部分:图像处理部分;已知地图表达部分;置信决策部分(判断该图像是否属于该地图/环境)
scene recognition:识别场景的类别(很大程度上,本质是一个分类问题);同一场景可能来自于不同的地方?
图像识别(image recognition)又称为:Content-based image retrieval (CBIR)
place recognition:位置表示一块区域,比如厨房;识别是否是以前来过的地方(本质是一个图像检索问题)
object recognition: 目标表示一个有明显边界的饿哦东西,比如桌子,显示屏
topological informaion: the node and the relation of each node.只关心两个节点是否连通,而不关心这两个点的位置关系
metric information: distance, direction, or both on the map edge(node edge).关心节点的物理位置,节点之间的相对物理位置关系
回环识别面对的主要困难:
- viewpoint change (little overlap)
- condition change (appearance different)
- high-efficiency
目前为侧重应对大视角变化和场景变化的方法分为3类方法:
- 使用更好的学习型特征;
- 使用high level + local level(学习型特征+传统特征;语义型+中间网络层输出);
- 加入三维信息,得到估计视角下的合成/渲染图
位置描述分为两类:
- 选择性地挑选部分图像内容进行表达(BoW);
- 将整幅图片作为一个向量进行处理(HOG,Gist)
others libraries: FAB-MAP(SIFT, SURF, more than 10000), SqeSLAM
其中:
鲁邦的特征有:U-SURF,基于边的特征,CNN
局部特征方法:
- 词袋方法
- fisher vector
- VLAD(Vectors of locally aggregated descriptor(less computable)
全局特征方法:
- GIST(2006)
- CNN: VGG-16 [18], Net-VLAD [1] and PoseNet,NetVLAD
词袋方法(比如BoW,FAB-MAP)不太受视角变化影响(pose invariant)
全局特征(语义特征或整张图特征)比局部特征更加容易受视角(也就是观测者/相机的位姿)影响;而后者更容易受光照影响。
但总体来讲:
局部特征对位移、旋转 鲁棒性较好
全局特征对关照变换、图像模糊、大小变化 鲁棒性较好
所以有结合全局和局部特征的方法。
可以有单纯图像检索(image retrival),拓扑地图,拓扑度量地图三种地图表达方法。
大概有两种,基于单纯图像检索图像检索和基于几何位置关系(检索离现在定位比较近的关键帧图像)。
其中一半要保证空间几何一致性(即图像匹配度最好)时间一致性(即前后几张都能几何一致)
用机器学习的方法来做场景变化时的识别:将appearance所有可能都考虑进来,比如上午下午,光照、视角变化,其中用神经网络的方法来表征变化前后整张图的映射关系。
中等层次的特征对表面变化(appearance changes) 比较鲁棒,高层特征对视角变化比较鲁棒。
These CNN features yield a high matching
quality but are rather high-dimensional, i.e., comparisons are
computationally expensive
traditional image description methodology developed in the past exploit techniques from image retrieval field. Recently. the rapid advances of related fields such as object detection and image classification have inspired a new technique to improve visual place recognition system, i.e., con- volutional neural networks (CNNs)
时间上,一般是1s左右,比如ETH那篇semantic visual localization,以及DenseVLAD(1.4),Camera Pose Voting(3.7s), PoseNet很快0.005s
The time means the retrieve time.
C LANGUAGE:
- Disloc(BoW paradigm and Hamming embedding,Improved in 2016)
- Dow2
- FAB-MAP,OpenFABMAP
- DenseVLAD(24/7 place recognition by view synthesis. Based on RootSIFT)
- ACG-Localizer
MATLAB:
- 2018 LosxX: Lost-Visual Place Recognition using Visual Semantics for Opposite Viewpoints across Day and Night
- SeqSLAM, OpenSeqSLAM(C++),Fast SeqSLAM:The appearance-robust methods like SeqSLAM are invariant to challenging environmental conditions, but at the cost of viewpoint-dependence and velocity-sensitivity
- bccn Bilinear CNN Models for Fine-grained Visual Recognition,8 frames/sec
- Lightweight, Viewpoint-Invariant Visual PlaceRecognition in Changing Environments(BoW+VLAD)
ConvNet method:
- 1.2018 calc: Convolutional Autoencoder for Loop Closure(HOG features; caffe; 1.47ms)
- 2.2017 place-recognition:Deep Learning Features at Scale for Visual Place Recognition; pytorch; only 10% increase
- 3.2017 vpr_relocalization; 30−80 ms
- NetVLAD(MATLAB)、NetVLAD(tensorflow)。NetVLAD比DenseVLAD快(前者0.8h,后者2.5h)。后续改进(更快):Appearance-invariant place recognition by discriminatively training a convolutional neural
network
-place365
- retrieval-2016-icmr
- 2016 deep-retrieval(Caffe实现的提取光照、视角不变性特征)
2016 tinghuiz: A deep learning framework for synthesizing novel views of objects and scenes(caffe)
Torsten Sattler:Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization、InLoc、Image-based localization using LSTMs for structured feature correlation、传统特征与深度学习特征评测等的作者
place challenge
Traditional approaches either focus on the use of 2D3D matches, known as structure-based pose estimation or solely on 2D-2D matches (structure-less pose estimation).
F1值(F1-Measure)就是精确值和召回率的调和均值, 也就是
\frac{2}{F_1} = \frac{1}{P} + \frac{1}{R}
Max-F1 就是所有测试的F1值中 最大的那个F1值