《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解

转载请注明出处。

作者:Peng Sun Wenhu Zhang Huanyu Wang Songyuan Li Xi Li

论文地址:[2103.11832] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (arxiv.org)

2021CVPR

作者提出了一个具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测网络,主要创新在于提出了a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects 和 an automatic architecture search approach

整个网络结构图的设计如下:

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第1张图片

其中,The RGB branch 是基于VGG-19,the depth branch is a lightweight depth network。

网络设计的三个原则如下:

1)The features from different modalities of the same scale are always fused, while features in different scales are selectively fused.

2)Low-level features are always combined with high-level features before the fifinal prediction, as low-level features are rich in spatial details but lack semantic information and vice versa.

3)Attention mechanism is necessary when performing the feature fusion of different modalities.

Depth-Sensitive Attention

论文propose a depth-sensitive RGB feature modeling scheme, including the depth decomposition and the depth sensitive attention module.

对于depth decomposition,the raw depth map is decomposed into T + 1 regions with the following steps:First, we quantize the raw depth map into the depth histogram, and choose the T largest depth distribution modes (corresponding to the T depth interval windows) of the depth histogram. Then, using these depth interval windows, the raw depth map can be decomposed into T regions, and the remaining part of the histogram naturally forms the last region, as shown in Fig. 3(a). Finally, each region is normalized into [0,1] as a spatial attention mask for the subsequent process.(简单来说,根据深度图的直方图将其分为T+1个区域,然后对每一个区域进行normalized作为空间注意力mask。) 

对于depth sensitive attention module,如Fig. 3(b)

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第2张图片

 其中,Pooling是max-pooling operation to align the masks to the size of Frgb k as 

 通过这种方式,DSAM不仅为RGB特征提供了深度方面的几何先验知识,而且还消除了棘手的

background distraction ( e.g . cluttered objects or similar texture).

 

 Auto Multi-Modal Multi-Scale Feature Fusion

论文 design four types of cells:the multimodal fusion (MM), multi-scale fusion (MS), global context aggregation (GA) and spatial information restoration (SR) cells。First, we use MM cells to directly perform multi-modal feature fusion between RGB and depth branches. Second, we use MS cells for the dense multi-scale feature fusion. Third, we utilize GA cell to aggregate seamlessly the outputs of the MS cells for capturing the global context. Finally, we introduce SR cells to combine the low-level and high-level features to remedy the spatial detail loss caused by downsampling.

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第3张图片

 Experiments

 Comparison with State-of-the-art

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第4张图片

 《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第5张图片

 Ablation Analysis

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第6张图片

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第7张图片

《Deep RGB-D Saliency Detection with Depth-Sensitive Attentionand Automatic Multi-Modal Fusion》阅读理解_第8张图片

 

 

 

 

你可能感兴趣的:(显著性目标检测)