立体匹配CNN篇(一) :[LW-CNN] look wider to match image patches by cnn

  • Abstract
  • Introduction
  • related work
  • method
    • A Per-pixel Pyramid Pooling 4P
    • B proposed model
  • 实验
  • 未解决
  • 参考文献

Abstract

2016 IEEE SPL 目前在middlebury上排名第二

  • 提出一种新的CNN 模式,通过一个较大尺寸窗口来学习一个matching cost 与以往的池化层不同(with strides),the proposed per-pixel pyramid-pooling layer can cover a larger area without a loss of resolution and detail.因此cost的学习函数可以利用更大区域的信息,避免引入fattening effect.
  • 创新点: 类似于SPP, 加入multi-scale 的pool,然后对信息进行融合,得到的feature map 不会丢失细节信息。
  • 相比于mc-cnn的改进之处,加入4P模型后,在弱纹理区域,包含了更大的窗口。

1 Introduction

解决window-based在视差不连续区域的不可靠性:One method to handle this trade-off is to make the window-based versatile to its input patterns [10], [11], [12]. making the shape of the matching template adaptive so that it can discard the information from the pixels that are irrelevant to the target pixel.
但是knowing the background pixels before the actual matching is difficult.
existing method are based on AlexNet 或者VGG ,这些都是为识别设计的而不是为匹配设计的。 这类CNN的困难在于增大patches的size
而patch的有效size又直接与感知野的空间区域联系, 并可以通过以下方式扩大:
1) include a few strided pooling /convolution layers
2)在每一层采用更大的卷积核
3)增加层数
然而,使用strided pooling 或者卷积层会让结果降采样,丢失一些细节信息。Although the resolution can be recovered by applying fractional-strided convolution [17], reconstructing small or thin structures is still difficult if once they are lost after downsampling.

2 related work

关于matching cost的学习 [13,14,22]
[13] mc-cnn :11*11 window,没有使用池化,得到的cost比较noisy,,因此后面使用了CROSS-based cost ggregation+SGM
[14] learning to compare patches.. 采用了multiple pooling layers and spatial-pyramid-pooling (SPP) [24] to process larger patches.
但结果会引入fattening effect,这是由于pooling的信息丢失导致的。
本文的创新点:引入一个新的池化方法,可以在更大的感知野上处理而不丢失细节信息。
类似的尝试在语义分割中已经有所体现:[25,26,27] 这些方法都是将高层和底层的信息进行结合,使得object-level的信息能够精确到pixel-level
这些方法可以在大物体上取得较好的效果,但是对那些小的物体则失效。
FlowNet [28] 可以将low-level的flow上采样到原始尺寸。
与本文最接近的工作是【24】 (何恺明的SPP)
SPP中,去掉了卷积层之间的池化层,而先对几个卷积层串联输出的结果进行pool,得到high-level和mid-level的信息计算高度非线性的feature-map
尽管【14】也用到了SPP,但是它仍然有卷积层之间夹着的池化层,因此也是丢失了信息的。

3 method

输入: 两个patches
输出: matching cost

A. Per-pixel Pyramid Pooling (4P)

池化层的作用大家都知道,可以使map的尺寸指数缩小,缺点是在获得更大的感知野的同时丢失了一些细节信息。

采用大的池化窗口来替代一个带stride的小池化窗口,以达到同样大的感知野。
进行多个不同窗口尺寸的池化,并将输出连接得到新的feature maps
注意,这个multi-scale pooling operation 是对每个像素进行,而stride =1! 

立体匹配CNN篇(一) :[LW-CNN] look wider to match image patches by cnn_第1张图片

B. proposed model

s=[27,9,3,1]
选取的对比算法是mc-cnn
立体匹配CNN篇(一) :[LW-CNN] look wider to match image patches by cnn_第2张图片

4 实验

采用mc-cnn的框架,不同之处在于
1)patch size: 3737
2)只fine-tune 最后面三个11卷积层,这比随机初始化的效果好,
3) lr : 0.003->0.0003
4)后处理与mc-cnn一模一样

5 未解决

如何做到大尺寸窗口下在视差不连续处判断的准确性。(?待整理)

参考文献

[10] K. Wang, “Adaptive stereo matching algorithm based on edge detection,” in ICIP, vol. 2. IEEE, 2004, pp. 1345–1348.
[11] K.-J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” PAMI, vol. 28, no. 4, pp. 650–656, 2006.
[12] F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classification and evaluation of cost aggregation methods for stereo correspondence,” in CVPR. IEEE, 2008, pp. 1–8.
[13] J. ˇZbontar and Y. LeCun, “Stereo matching by training a convolutional neural network to compare image patches,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2287–2318, 2016.
[14] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in CVPR, June 2015, pp. 4353–4361.
[17] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[18] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Object detectors emerge in deep scene cnns,” arXiv preprint arXiv:1412.6856, 2014.
[22] L. Ladick`y, C. H¨ane, and M. Pollefeys, “Learning the matching function,” arXiv preprint arXiv:1502.00652, 2015.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in ECCV. Springer,
2014, pp. 346–361.
[25] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
[26] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in CVPR, June 2015, pp. 447–456.
[27] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” arXiv preprint arXiv:1505.04366, 2015.

你可能感兴趣的:(stereo,matching)