mobilenet v1

In this article, I will explain about MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications paper from Google. They developed a class of efficient models called MobileNets which mainly focuses on mobile and embedded vision applications.

在这篇文章中，我将解释关于高效卷积神经网络的移动视觉应用：MobileNets从谷歌文件。他们开发了一种称为MobileNets的高效模型，该模型主要关注于移动和嵌入式视觉应用。

In one word the main focus of their model was to increase the efficiency of the network by decreasing the number of parameters by not compromising on performance.

一言以蔽之，他们的模型的主要重点是通过不牺牲性能而减少参数数量来提高网络效率。

主题 (Topics)

Depthwise Separable Convolution
深度可分离卷积
Network Architecture
网络架构
Width Multiplier
宽度乘数
Resolution Multiplier
分辨率乘数
Performance comparison
性能比较

1.深度可分离卷积 (1. Depthwise Separable Convolution)

This is the core basis of MobileNet paper. It is a depthwise convolution followed by a pointwise convolution. Before getting to depthwise convolution and pointwise convolution, let us understand how normal convolution works.

这是MobileNet论文的核心基础。这是深度卷积，然后是点状卷积 。在进行深度卷积和点式卷积之前，让我们了解普通卷积的工作原理。

正常卷积如何完成？ (How normal convolution is done?)

Normal convolution 正卷积

Here we have an input image size of 12x12x3. If we do convolution using a 5x5x3 kernel with stride=1, we will get an output size of 8x8x1. Usually, during convolution operations we specify we need N number of channels in output. During that time what happens is the same operation is repeated N times with different kernels. Suppose N = 10. Then the total computational cost become 12 x 12 x 5 x 5 x 3 x 10 = 108000

在这里，我们的输入图像尺寸为12x12x3。如果我们使用stride = 1的5x5x3内核进行卷积，则输出大小将为8x8x1。通常，在卷积运算期间，我们指定在输出中需要N个通道。在这段时间里，相同的操作被不同的内核重复了N次。假设N =10。则总计算成本为12 x 12 x 5 x 5 x 3 x 10 = 108000

Normal convolution 正卷积

We can say that a standard convolution layer input takes input as a Df x Df x M feature map and produce the Df x Df x N output feature map where Df is the spatial width and height of the square input feature map. M is the number of input channels and N is the number of output channels. The standard convolutional layer is parameterized by convolution kernel K having the size of Dk x Dk x M x N. So the total computational cost becomes Dk x Dk x M x N x Df x Df.

可以说，标准卷积层输入将输入作为Df x Df x M特征图，并生成Df x Df x N输出特征图，其中Df是正方形输入特征图的空间宽度和高度。 M是输入通道数，N是输出通道数。标准卷积层由大小为Dk x Dk x M x N的卷积核K参数化。因此，总计算成本为Dk x Dk x M x N x Df x Df。

computational cost standard convolution 计算成本标准卷积

深度可分离卷积 (Depthwise Separable Convolution)

What if we will be able to divide this convolution procedure based on depth. Depth wise separable convolution consists of 2 parts:

如果我们能够基于深度划分此卷积过程，该怎么办。深度智能可分离卷积包括2个部分：

Depthwise convolution
深度卷积
Pointwise convolution
点向卷积

Depthwise convolution 深度卷积

Here we have 3 channels for input. consider we have 3 5 x 5 x 1 kernels.Here what happens is 5x5x1 kernel iterates over the first channel of the input image to produce 8x8x1 output. Each 5 x 5x 1 kernel do the operation to the corresponding channel in the input image. Now we stack all three such outputs to get 8 x 8 x 3 output. This is how depth wise convolution works.

在这里，我们有3个输入通道。考虑我们有3个5 x 5 x 1内核，这里发生的是5x5x1内核在输入图像的第一个通道上迭代以产生8x8x1输出。每个5 x 5x 1内核对输入图像中的相应通道进行操作。现在，我们将所有三个这样的输出堆叠起来，以获得8 x 8 x 3的输出。这就是深度卷积的工作方式。

The next one is the pointwise convolution. We will do a convolution using a 1 x 1 x 3 kernel on an 8 x 8 x 3 image obtained. This will produce a feature map. We will repeat this using 10 different 1 x 1 x 3 kernels to produce 10 feature maps and we will stack them together.

下一个是逐点卷积 。我们将在获得的8 x 8 x 3图像上使用1 x 1 x 3内核进行卷积。这将产生一个特征图。我们将使用10个不同的1 x 1 x 3内核重复此操作，以产生10个特征图，并将它们堆叠在一起。

pointwise convolution 点向卷积

Here the total number of computations is 12 x 12 x 5 x 5 x 3 + 8 x 8 x 3 x 10 = 10800 + 1920 = 12720.

在这里，计算总数为12 x 12 x 5 x 5 x 3 + 8 x 8 x 3 x 10 = 10800 + 1920 = 12720。

Total computational cost is

总计算成本为

computational cost depth separable convolution 计算成本深度可分离卷积

Thus the computational reduction is

因此，计算约简为

Note: By using 3x3 kernel about 8 to 9 times less computational reduction can be achieved(108000/12720 =~ 8.45).

注意：通过使用3x3内核，可以减少大约8到9倍的计算量(108000/12720 =〜8.45)。

2.网络架构 (2. Network Architecture)

The network consists of 28 convolutional layers and 1 fully connected layer followed by a softmax layer.
该网络由28个卷积层和1个完全连接的层组成，后面是softmax层。
It is noted that batch normalization and ReLU is applied after convolution
注意，卷积后应用批归一化和ReLU

3.宽度乘数 (3. Width Multiplier)

A width multiplier alpha α is introduced to further reduce computational cost. So M becomes αM. So depth wise separable computational cost becomes

引入宽度乘数αα以进一步降低计算成本。因此M变为αM。因此，深度明智的可分离计算成本变为

computational cost with width multiplier. 宽度乘数的计算成本。

where α is between 0 to 1.Typically values of α are 1,0.75,0.5 and 0.25. When α = 1 we have baseline MobileNet.

其中α在0到1之间。α的值通常为1,0.75,0.5和0.25。当α= 1时，我们有基准MobileNet。

performance with different α values α值不同时的性能

4.分辨率乘数 (4. Resolution Multiplier)

Resolution Multiplier ρ is introduced to control the image resolution of the network. With ρ computational cost becomes

引入分辨率乘数ρ来控制网络的图像分辨率。随着ρ计算成本变为

computational cost with resolution multiplier 分辨率乘数的计算成本

where ρ is between 0 to 1. The corresponding resolutions are 224, 192, 160, and 128. When ρ=1, it is the baseline MobileNet.

其中ρ在0到1之间。相应的分辨率为224、192、160和128。当ρ= 1时，它是基准MobileNet。

5.性能比较 (5. Performance Comparison)

MobileNet-224 outperforms GoogLeNet (Winner of ILSVRC 2014) and VGGNet (1st Runner Up of ILSVRC 2014) and also parameters were lower.
MobileNet-224的表现优于GoogLeNet(2014年ILSVRC冠军)和VGGNet(ILSVRC 2014年第一名)，并且参数也较低。

When smaller MobileNet (0.50 MobileNet-160) is used, it outperforms AlexNet and Squuezenet with fewer Adds and parameters.
当使用较小的MobileNet( 0.50 MobileNet-160 )时，其Adds和参数较少，其性能优于AlexNet和Squuezenet。

For object detection tasks using the mobile net as the backbone, performance is as follows
对于以移动网络为骨干的目标检测任务，性能如下

You can check my notebook for PyTorch implementation of MobileNet v1.

您可以检查我的 笔记本 中是否有MobileNet v1的PyTorch实施。

https://arxiv.org/pdf/1704.04861.pdf
https://arxiv.org/pdf/1704.04861.pdf
https://towardsdatascience.com/review-mobilenetv1-depthwise-separable-convolution-light-weight-model-a382df364b69
https://towardsdatascience.com/review-mobilenetv1-depthwise-separable-convolution-light-weight-model-a382df364b69
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

获得访问专家视图的权限- 订阅DDI Intel (Gain Access to Expert View — Subscribe to DDI Intel)

翻译自: https://medium.com/datadriveninvestor/review-on-mobilenet-v1-abec7888f438