CRSNet: Dilated Convolutional Neural Networks for Underatanding the Highly Congested Scenes

CRSNet: Dilated Convolutional Neural Networks for Underatanding the Highly Congested Scenes

针对复杂场景拥挤场景理解提出了一个CSRNet,该网络主要包含两个部分,前端使用一个卷积网络用于2D特征提取,后端采用一个dilated CNN.该网络在几个公开人群密度估计数据库上取得了不错的效果。

拥挤场景解析的发展从简单的人群数值估计到人群密度图估计,人群密集图可以提供额外的信息,因为同样数量的人可以分布在不同位置。

CRSNet: Dilated Convolutional Neural Networks for Underatanding the Highly Congested Scenes_第1张图片

人在图像中不是只占用一个像素面密度图需要保持局部邻域的连续性。

以non-effctive bran前基于CNN网络的人群密度估计主要采用了multi-scale architectures,存在两个问题:当网络变深时,the large amount of training time and non-effective branch structure. A deeper,regular network with the similar amount of parameters.

 

人群密度估计方法:

  1. detection-based methods
  2. regression-based methods
  3. density estimation-based methods
  4. The fundamental idea of the proposed design is to deploy a deeper CNN for capturing high-level features with larger receptive fields and generating high-quality density maps without brutally expanding network complexity.

 

CSRNet architecture

前端采用VGG-16

Abstract: The proposed CSRNet is composed of two major components: a convolutional neural network CNN as the font-end for 2D feature extraction and a dilated CNN for the back-end,which uses dilated kernels to deliver larger reception fields and to replace pooling operations.

 

1. Intorduction

It’s challenging to generate accurate distribution patterns.One major difficulty comes from the prediction manner,多尺度的方法不够好,两个主要的问题,讨论深度模型。三条支路出来的结果都是类似的,这就背离了多尺度之前的初衷,用来学习不同的特征。

In this paper,we design a deeper network called CSRNet for counting crowed and generating high-quality density maps.使用来自VGG16的前十层作为前端,使用dilated 卷积作为后端去扩大接受域和在没有分辨率损失的条件下提取深度特征,pooling操作是没有被使用的。

2. Related work

多尺度结构并不能有效的提取不同级别的特征,相反提取的特征大多是相似的,因此不建议采用多尺度

3. Proposed solution

3.1 CRSNet architectures

面使用VGG,出来之后是原图的1/64,后面使用dilated convolution,在保持分辨率上,dilated convolution展示了清晰的优势相比较于使用convolution + pooling + deconvolution.The output is shared the same dimension as the input(meaning pooling and deconvolutional layers are not required).Most importantly,the output from dilated convolution contains more detailed information.

3.2 Training methods

MAE,MSE

 

CRSNet: Dilated Convolutional Neural Networks for Underatanding the Highly Congested Scenes_第2张图片

https://github.com/tlokeshkumar/CSRNet-tf

input pipline we use is an efficient Data API pipline for parsing tfrecoeds files.

 

 

 

 

你可能感兴趣的:(论文研读)