人群计数:DRSAN--Crowd Counting using Deep Recurrent Spatial-Aware Network

Goal:

estimating the total number of people in unconstrained crowded scenes.

**

Highlight:

**
Now there are two difficulties in the crowd counting, one is the variation of crowd scale, the other is camera perspective that causes huge appearance variations in people’s scales and rotations. In this paper, we solve the two questions.
we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process.

**Specifically: ** our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components:

i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation;

ii) a Local Refinement Network that refines the density map of the attended region with residual learning.

**

Contribution:

**
• We provide an adaptive mode to simultaneously handle the effect of both scale and rotation variation by introducing a spatial transform module for crowd counting. To the best of our knowledge, we are the first to address the issue of the rotation variation on this task.

• We propose a novel deep recurrent spatial-aware network framework to recurrently select a region (with learnable scale and rotation parameters) from an initial density map for refinement, dependent on feature warping and residual learning.

**

Architecture:

**
including a Global Feature Embedding (GFE) module and a Recurrent Spatial-Aware Refinement (RSAR) module. Specifically, the GFE module takes the whole image as input for global feature extraction, which is further used to estimate an initial crowd density map. And then the RSAR module is applied to iteratively locate image regions with a spatial transformer-based attention mechanism and refine the attended density map region with residual learning

人群计数:DRSAN--Crowd Counting using Deep Recurrent Spatial-Aware Network_第1张图片

There are two models in the architecture: GFE and RSAR

Global Feature Embedding
Goal: transform the input image into high-dimensional feature maps, which is further used to generate an initial crowd density map of the image.

GFE module is composed of three columns of CNNs, each of which has seven convolutional layers with different kernel sizes and channel numbers as well as three max-pooling layers.
Given an image I, we extract its global feature g by feeding it into GFM and concatenating the outputs of all the columns. After obtaining the global feature g, we generate the initial crowd density map M0 of image I using a convolutional layer with a kernel size of 1 × 1.
在这里插入图片描述

Recurrent Spatial-Aware Refinement:

Recurrent Attentive Refinement (RSAR) module to iteratively refine the crowd density map. Our proposed RSAR consists of two alternately performed components:
i) a Spatial Transformer Network dynamically locates an attentional region from the crowd density map;
ii) a Local Refinement Network refines the density map of the selected region with residual learning.
A high-quality crowd density map with accurately estimated crowd number would be acquired after a refinement of n iterations.

The two architecture:
人群计数:DRSAN--Crowd Counting using Deep Recurrent Spatial-Aware Network_第2张图片

你可能感兴趣的:(Crowd,Counting,人群计数,人群密度估计,卷积神经网络,深度学习)