[Paper] Learning a Deep Convolutional Network for Image Super-Resolution
[Year] CVPR 2015
[Authors] Evan Shelhamer, Jonathan Long, Trevor Darrell
[Pages]
https://github.com/shelhamer/fcn.berkeleyvision.org (official)
https://github.com/MarvinTeichmann/tensorflow-fcn (tensorflow)
https://github.com/wkentaro/pytorch-fcn (pytorch)
[Description]
1) 首篇(?)使用end-to-end CNN实现Semantic Segmentation,文中提到FCN与提取patch逐像素分类是等价的,但FCN中相邻patch间可以共享计算,因此大大提高了效率
2) 把全连接视为一种卷积
3) 特征图通过deconvolution(初始为bilinear interpolation)上采样,恢复为原来的分辨率
4) 使用skip connection改善coarse segmentation maps
[Paper] U-Net: Convolutional Networks for Biomedical Image Segmentation
[Year] MICCAI 2015
[Authors] Olaf Ronneberge, Philipp Fischer, Thomas Brox
[Pages]
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
https://github.com/orobix/retina-unet
[Description]
1) encoder-decoder结构,encode设计参考的是FCN,decode阶段将encode阶段对应的特征图与up-conv的结果concat起来
2) 用于医学图像分割,数据集小,因此做了很多data augmentation,网络结构也较为简单
[Paper] Feedforward semantic segmentation with zoom-out features
[Year] CVPR 2015
[Authors] Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich
[Pages] https://bitbucket.org/m_mostajabi/zoom-out-release
[Description]
1) 以超像素为最小单位,逐步zoom out提取更大尺度的信息,zoom out特征是从CNN不同层提取的特征得到的
2) 特征在超像素的范围内进行average pooling,并concat不同level的特征得到该超像素最后的特征向量。用样本集中每一类出现频率的倒数加权loss。
[Paper] Multi-Scale Context Aggregation By Dilated Convolutions
[Year] ICLR 2016
[Authors] Fisher Yu , Vladlen Koltun
[Pages] https://github.com/fyu/dilation
[Description]
1) 系统使用了dilated convulution,其实现已被Caffe收录
[Paper] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
[Year] ICLR 2015
[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
[Pages] https://bitbucket.org/deeplab/deeplab-public
[Description]
1) 在保证感受野大小的同时,输出dense feature。做法是把VGG16后两个pool stride设置为1,用Hole算法(也就是Dilation卷积)控制感受野范围
2) 输出用全局CRF后处理,一元项为pixel的概率,二元项为当前pixel与图像中除自己外的每个pixel的相似度,考虑颜色和位置,使用高斯核。全连接CRF参考Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
3) 与FCN相似,也使用了多尺度预测
[Paper] Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation
[Year] ICCV 2015
[Authors] George Papandreou, Liang-Chieh Chen, Kevin Murphy, Alan L. Yuille
[Paper] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
[Year] arXiv 2016
[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
[Pages]
http://liangchiehchen.com/projects/DeepLab.html
https://github.com/DrSleep/tensorflow-deeplab-resnet (tensorflow)
https://github.com/isht7/pytorch-deeplab-resnet (pytorch)
[Description]
1) 与V1相比的不同是:不同的学习策略,多孔空间金字塔池化(ASPP),更深的网络和多尺度。ASPP就是使用不同stride的dilated conv对同一特征图进行处理
[Paper] Rethinking Atrous Convolution for Semantic Image Segmentation
[Year] arXiv 1706
[Authors] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab
[Description]
1) 使用串联和并行的atrous cov,使用bn,结构优化,达到了soa的精度(080116)
[Paper] Rethinking Atrous Convolution for Semantic Image Segmentation
[Year] arXiv 2017
[Authors] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam
[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab
[Description]
1) 在DeepLab-V3作为encoder的基础上, 加入了一个简单的decoder, 而不是直接上采样; 采用Xception作为backbone
2) VOC上分割任务达到soa (0800314), 效果好
[Paper] Conditional Random Fields as Recurrent Neural Networks
[Year] ICCV 2015
[Authors] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr
[Pages] http://www.robots.ox.ac.uk/~szheng/CRFasRNN.html
[Description]
1) 将CRF推断步骤用卷积, softmax等可微模块替代, 并使用RNN的递归迭代, 将CRF用类似RNN的结构近似. 整个模型都可以end-to-end的优化.
2) 全连接CRF及其推断是在Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials的基础上设计的. 待深入研究CRF后应再仔细阅读这篇paper.
[Paper] Learning Deconvolution Network for Semantic Segmentation
[Year] ICCV 2015
[Authors] Hyeonwoo Noh, Seunghoon Hong, Bohyung Han
[Pages]
http://cvlab.postech.ac.kr/research/deconvnet/
https://github.com/fabianbormann/Tensorflow-DeconvNet-Segmentation (tensorflow)
[Description]
1) encoder-decoder的代表模型之一, conv-pool特征提取, unpool-deconv恢复分辨率.
[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
[Year] arXiv 2015
[Authors] Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla
[Pages] http://mi.eng.cam.ac.uk/projects/segnet/
[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
[Year] PAMI 2017
[Authors] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla
[Description]
1) encoder-decoder的代表模型之一,特点是将encoder中的pooling indices保存下来,decoder上采样时用这些indices得到sparse feature map,再用trainable conv得到dense feature map
[Paper] Efficient piecewise training of deep structured models for semantic segmentation
[Year] CVPR 2016
[Authors] Guosheng Lin, Chunhua Shen, Anton van dan Hengel, Ian Reid
[Pages]
[Description]
1) 粗读. CRF部分没怎么看懂.
2) FeatMap-Net接受multi-scale的输入, 生成feature map; 基于feature map设计了CRF的unary和pairwise potential, pairwise中考虑了surrounding和above/below两种context.
3) CRF training提出了基于piecewise learning的方法.
[Paper] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
[Year] arXiv 1606
[Authors] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello
[Pages] https://github.com/e-lab/ENet-training
[Description]
1) 一种快速的encoder-decoder分割网络
2) 大encoder,小decoder; PReLU代替ReLU; 1xn和nx1卷积代替nxn卷积
[Paper] ParseNet: Looking Wider to See Better
[Year] ICLR 2016
[Authors] Wei Liu, Andrew Rabinovich, Alexander C. Berg
[Pages] https://github.com/weiliu89/caffe/tree/fcn
[Description]
1) 一种简单的加入global context的方法. 将feature map进行global pooling和L2 norm, 将得到的向量unpool成与原feature map相同尺寸, 再concatenate到也进行了L2 norm的feature map上.
2) 通过简单实验, 提出实际感受野往往远小于理论感受野. 很多paper都引用了这一类观点, 但是感觉缺乏理论论证-_-||
[Paper] FoveaNet: Perspective-aware Urban Scene Parsing
[Year] ICCV 2017 Oral
[Authors] Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng
[Pages]
[Description]
1) 提出了一种perspective-aware parsing network, 以解决 heterogeneous object scales问题, 提高远处小物体的分割精度, 减少近处大物体的”broken-down”现象.
2) 为更好解析接近vanishing point(即远离成像平面处)的物体, 提出了perspective estimation network(PEN). 通过PEN得到距离的heatmap, 根据heatmap得到包含大多数小目标的fovea region. 将fovea region放大, 与原图并行地送入网络解析. 解析出来的结果再放回原图.
3) 为解决近处目标的”broken-down”问题, 提出了perspective-aware CRF. 结合PEN得到的heatmap和目标检测, 使属于近处目标的像素有更大的pairwise potential, 属于远处目标的像素有更小的parwise potential, 有效缓解了”broken-down”和过度平滑的问题.
[Paper] Pyramid Scene Parsing Network
[Year] CVPR 2017
[Authors] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
[Pages] https://hszhao.github.io/projects/pspnet/
[Description]
1) 提出了pyramid pooling module结合不同尺度的context information。PSPNet把特征图进行不同尺度的pooling(类似spatial pyramid pooling),再将所有尺度的输出scale到相同尺寸,并concat起来
2) 再res4b22后接了一个auxiliary loss,使用resnet网络结构
[Paper] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
[Year] CVPR 2017
[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo , Xiaoou Tang, Chen Change Loy
[Pages] https://github.com/guosheng/refinenet
[Description]
1) encoder为4组残差块, 逐渐降低分辨率; decoder部分为论文提出的RefineNet. 作者认为提出的模型对高分辨率图像的细节信息有更好的分辨能力;
2) RefineNet前半部分为multi-resolution fusion, 类似于UNet, decoder的每一级模块都利用了对应的encoder模块的信息;
3) RefineNet后半部分为Chained residual pooling, 目的是”capture background context from a large image region”.
[Paper] Large Kernel Matters—— Improve Semantic Segmentation by Global Convolution
[Year] CVPR 2017
[Authors] Peng Chao, Xiangyu Zhang Gang Yu, Guiming Luo, Jian Sun
[Pages]
[Description]
1) 文章认为, segmentation包括localization和classification两部分, 分类需要全局信息, localization需要保证feature map的分辨率以保证空间准确度, 因此二者存在矛盾. 本文提出的解决办法就是用large kernel, 既可以保持分辨率, 又能近似densely connections between feature maps and per-pixel classifiers;
2) 文中使用k*1+1*k和1*k+k*1代替k*k的大kernel. 引入boundary refinement模块, 使用残差结构, 捕捉边界信息;
3) 只根据实验说明提出的模型由于k*k kernel和多个小kernel堆叠的策略, 但是并没什么理论支持;
4) 一点不明白: 为什么提出的基于残差结构的BR可以model the boundary alignment?
[Paper] Representation of the pixels, by the pixels, and for the pixels
[Year] TPAMI 2017
[Authors] Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
[Pages] http://www.cs.cmu.edu/~aayushb/pixelNet/
[Description]
1) 粗读. 使用hypercolumn思想, 速度快. 适用于segmentation, 边缘检测, normal estimation等low-level到high-level的多种问题.
2) hypercolumn即: 对于一个pixel, 将每一层feature map中其对应位置的feature连接起来组成一个vector, 用MLP对该vector分类.
3) 文中提出, 训练时 just sampling a small number of pixels per image is sufficient for learning. 这样一个mini-batch里就可以从多张图片中采样, 增加了diversity.
[Paper] LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
[Year] arXiv 1707
[Authors] Abhishek Chaurasia, Eugenio Culurciello
[Pages] https://codeac29.github.io/projects/linknet/
[Description]
1) 还没读, 大致是一个类似U-Net的结构, 速度快
[Paper] Stacked Deconvolutional Network for Semantic Segmentation
[Year] arXiv 1708
[Author] Jun Fu, Jing Liu, Yuhang Wang, Hanqing Lu
[Pages]
[Description]
1) 粗读. 效果好, 未开源.
2) 以DenseNet为基础, 构建了stacked的encoder-decoder模型, 论文中认为这能更好的捕捉multi-scale context. 网络充满了inter和intra的unit connections, 并加入了hierarchical supervisions, 使非常深的SDN能够成功训练.
[Paper] From Image-level to Pixel-level Labeling with Convolutional Networks
[Year] CVPR 2015
[Authors] Pedro O. Pinheiro, Ronan Collobert
[Pages]
[Description]
1) 一种weakly supervised方法,用图像类别标签训练分割模型,分割中每个类别的特征图用log-sum-exp变换为分类任务中每个类别的概率,通过最小化分类的loss优化分割模型
2) 推断时为抑制False Positive现象,使用了两种分割先验:Image-Level Prior(分类概率对分割加权)和Smooth Prior(超像素,bounding box candidates,无监督分割MCG)。
[Paper] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
[Year] ICCV 2015
[Authors] Jifeng Dai, Kaiming He, Jian Sun
[Pages]
[Description]
1) 弱监督语义分割,用bounding box结合region proposal(MCG)生成初始groundtruth mask,再交替更新分割结果和mask.
[Paper] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation
[Year] AAAI 2018
[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy
[Pages] http://mmlab.ie.cuhk.edu.hk/projects/M&M/
[Description]
1) self-supervision可分为proxy stage和fine-tuning stage两个阶段. 先用无需标签数据的proxy task(如图像上色)进行预训练, 学到某种语义特征, 再用少量的标记数据进行微调. 但由于proxy task和target task之间存在semantic gap, 自监督方法性能明显较监督方法差.
2) 论文提出了”mix-and-match”策略, 利用少数标记数据提升自监督预训练网络的性能. mix step: 从不同图像中随机提取patch. match step: 在训练时通过on-the-fly的方式构建graph, 并生成triplet, triplet包括anchor , positive, negative patch三个元素. 据此可定义一triplet loss, 鼓励相同类别的patch更相似, 不同类别的patch差别更大.
3) 对自监督了解不够深入, 看代码有助理解. segmentation部分采用的hypercolumn方法论文中貌似没仔细说, 以后可以再研究研究.
[Paper] Convolutional Oriented Boundaries
[Year] ECCV 2016
[Author] K.K. Maninis, J. Pont-Tuset, P. Arbeláez, L.Van Gool
[Pages] http://www.vision.ee.ethz.ch/~cvlsegmentation/cob/index.html
[Description]
1) 由边缘概率得到分割结果, 整体流程来自伯克利的gPb-owt-ucm, 将前面得到概率图的部分用CNN代替
2) CNN部分使用多尺度模型预测coarse和fine的8方向的概率
3) UCM部分提出了sparse boundaries representation, 加快了速度
[Paper] Contour Detection and Hierarchical Image Segmentation
[Year] TPAMI 2011
[Authors] Pablo Arbelaez, Michael Maire , Charless Fowlkes , Jitendra Malik
[Pages] https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
[Reference] http://blog.csdn.net/nature_XD/article/details/53375344?locationNum=9&fps=1
[Description]
1) gPb(Global Probability of Boundary):由mPb和sPb组成
2) OWT:对分水岭变换得到的arc上的像素依据其方向重新计算gPb
3) UCM:貌似和MST聚类差不多?
4) sPb还没看懂
VOC2012
MSCOCO
ADE20K
MITScenceParsing
Cityscapes
PASCAL VOC
ILSVRC2016
https://handong1587.github.io/deep_learning/2015/10/09/segmentation.html
https://github.com/mrgloom/awesome-semantic-segmentation
https://blog.csdn.net/zziahgf/article/details/72639791