原文地址:Network In Network - Home (kobiso.github.io)
NIN的网络结构不复杂,深度学习的初学者可以从它入手,这里附上pytorch版的源码:
"""
NIN - Network In Network
**To use an MLP on the channels for each pixel separately.**
The idea behind NiN is to apply a fully-connected layer at each pixel location (for each height and width).
If we tie the weights across each spatial location, we could think of this as a 1×1 convolutional layer
or as a fully-connected layer acting independently on each pixel location. Another way to view this is to think
of each element in the spatial dimension (height and width) as equivalent to an example and a channel as equivalent
to a feature.
NIN has introducted the 1x1 Convolution.
Smaller batch size results in better performance even though it is slow.
"""
import torch.nn as nn
class NIN(nn.Module):
def __init__(self, input_channel, n_classes):
super().__init__()
def NINBlock(input_channel, out_channel, kernel_size, strides, padding):
return nn.Sequential(
nn.Conv2d(input_channel, out_channel, kernel_size=kernel_size, stride=strides, padding=padding),
nn.ReLU(),
nn.Conv2d(out_channel, out_channel, kernel_size=1),
nn.ReLU(),
nn.Conv2d(out_channel, out_channel, kernel_size=1),
nn.ReLU())
self.layers = nn.Sequential(
NINBlock(input_channel, 96, kernel_size=11, strides=4, padding=0),
nn.MaxPool2d(3, stride=2),
NINBlock(96, 256, kernel_size=5, strides=1, padding=2),
nn.MaxPool2d(3, stride=2),
NINBlock(256, 384, kernel_size=3, strides=1, padding=1),
nn.MaxPool2d(3, stride=2),nn.Dropout(0.5),
NINBlock(384, n_classes, kernel_size=3, strides=1, padding=1),
nn.AdaptiveAvgPool2d((1,1)),nn.Flatten())
self.layers.apply(self.init_weights)
def init_weights(self, layer):
if type(layer) == nn.Linear or type(layer) == nn.Conv2d:
nn.init.xavier_uniform_(layer.weight)
def forward(self, x):
out = self.layers(x)
return out
原文如下:
“Network In Network” is one of the most important study related convoutional neural network because of the concept of 1 by 1 convolution and global average pooling. It was presented in International Conference on Learning Representations (ICLR) 2014 by Min Lin, Qiang Chen, and Shuicheng Yan.
TABLE OF CONTENTS
Classic convolutional neuron networks consist of alternatively stacked convolutional layers and spatial pooling layers. The convolutional layers generate feature maps by linear convolutional filters followed by nonlinear activation functions.
Using the linear rectifier as an example, the feature map can be calculated as follows:
(i,j) is the pixel index in the feature map, stands for the input patch centered at location (i,j), and k is used to index the channels of the feature map.
In conventional CNN, linear convolution is not enough for abstraction and representations that achieve good abstraction are generally highly non-linear functions of the input data. So, NIN is proposed that the micro network is integrated into CNN structure in persuit of better abstractions for all levels of features.stands for the input patch centered at location (i,j), and k is used to index the channels of the feature map.
This paper suggests new type of layer called mlpconv, in which MLP replaces the GLM to convolve over the input. The reason to choose multilayer perceptron is
Figure 1: Comparison of linear convolution layer and mlpconv layer.
When nn is the number of layers in the multilayer perceptron, mlpconv layer can be calculated as follows:
Mlpconv can be explained by a convolution layer with 1x1 convolution kernel. What the 1x1 convolution kernel does is:
Figure 2: Convolution with kernel of size 3x3 (left) vs. Convolution with kernel of size 1x1 (right)
In conventional convolutional neural network, the feature maps of the last convolutional layer are vectorized and fed into fully connected layers followed by a softmax logistic regression layer.
Problem of the fully-connected layers:
This paper propose a global average pooling (GAP) to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. We take the average of each feature map, and the resulting vector is fed directly into the softmax layer.
Figure 3: Example of global average pooling (GAP)
Advantage of GAP over the fully-connected layers:
As shown in Figure 4, the overall structure of NIN is a stack of mlpconv layers, on top of which lie the global average pooling and the objective cost layer. The number of layers in both NIN and the micro networks is flexible and can be tuned for specific tasks.
Figure 4: The overall structure of Network In Network. In this paper, the NINs include the stacking of three mlpconv layers and one global average pooling layer.