人群计数:Switch-CNN--Switching Convolutional Neural Network for Crowd Counting

**

innovation point:

**

Now everyone know we should choose the best network for each image, but they do not think deeply the root of doing this work. This is why Switch-CNN succeed, it chose the best network for each patch.

firstly, the input image is divided into 9 non-overlapping patches. for each patch, we use a Switch-CNN to see what kind it belongs to, and then relay the patch to a particular regressor.

So it leverages the variation of crowd density within an image to improve the quality and localization of the predicted crowd count and robust to large scale and perspective variations.

goal

proposed a model that maps a given crowd scene to its density.

**

contribution

**
**• A novel generic CNN architecture, Switch-CNN trained end-to-end to predict crowd density for a crowd scene. **

• Switch-CNN maps crowd patches from a crowd scene to independent CNN regressors to minimize count error and improve density localization exploiting the density variation within a scene.

**

Architecture

**
人群计数:Switch-CNN--Switching Convolutional Neural Network for Crowd Counting_第1张图片

An example better understand: In Figure 2, the switch classifier relays the patch highlighted in red to regressor R3. The patch has a very high crowd density. Switch relays it to regressor R3 which has smaller receptive field: ideal for detecting blob like abstractions characteristic of patches with high crowd density.

**

The Switch layer:

**

We use an adaptation of VGG16 [14] network as the switch classifier to perform 3-way classification. The fully-connected layers in VGG16 are removed. We use global average pool (GAP) on Conv5 features to remove the spatial information and aggregate discriminative features. GAP is followed by a smaller fully connected layer and 3-class softmax classifier corresponding to the three regressor networks in Switch-CNN。

The generation of density map is as same as the way of MCNN.

**

training :

**

The three CNN regressors R1 through R3 are pretrained separated to regress destiny maps. The loss is

在这里插入图片描述
This is a typial L2 loss .

你可能感兴趣的:(Crowd,Counting,人群计数,人群密度估计,卷积神经网络,深度学习)