to.to

41_经典卷积网络、LeNet、AlexNet、VGG、GoogleNet、ResNet、NIN、DenseNet、EfficientNet、MobileNetV1/2/3、SENet等

1.38.经典卷积网络
1.38.1.LeNet
1.38.1.1.介绍
1.38.1.2.网络结构
1.38.1.3.代码实现
1.38.2.AlexNet
1.38.2.1.介绍
1.38.2.2.网络结构
1.38.2.3.代码实现
1.38.3.VGG
1.38.3.1.介绍
1.38.3.2.网络结构 VGG-16
1.38.3.3.代码实现
1.38.4.GoogleNet
1.38.4.1.介绍
1.38.4.2.网络结构
1.38.4.3.代码实现
1.38.5.ResNet
1.38.5.1.介绍
1.38.5.2.网络结构
1.38.5.3.代码实现
1.38.6.NIN
1.38.6.1.局部块
1.38.6.2.替代FC,使用全局平均池化层求分类
1.38.6.3.代码案例
1.38.7.DenseNet
1.38.7.1.稠密块
1.38.7.2.过滤层
1.38.7.3.模型结构
1.38.7.4.代码案例
1.38.7.5.总结
1.38.8.EfficientNet
1.38.9.MobileNetV1
1.38.10.MobileNetV2
1.38.11.MobileNetV3
1.38.12.SENet
1.38.13.ShuffleNetV1
1.38.14.ShuffleNetV2
1.38.15.参考博文

1.38.经典卷积网络

网络的精确度为：

ImageNet上前5个误差最小的网络

1.38.1.LeNet

LeNet-5第一个神经网络，用于手写字体识别

1.38.1.1.介绍

LeNet是卷积神经网络的祖师爷LeCun在1998年提出，用于解决手写数字识别的视觉任务。一共有7层，其中2层卷积和2层池化层交替出现，最后输出3层全连接层得到整体的结果。没有添加激活层。

随后CNN的最基本的架构就定下来了：卷积层、池化层、全连接层。如今各大深度学习框架中所使用的LeNet都是简化改进过的LeNet-5（-5表示具有5个层），和原始的LeNet有些许不同，比如把激活函数改为了现在很常用的ReLu。

LeNet-5跟现有的conv->pool->ReLU的套路不同，它使用的方式是conv1->pool->conv2->pool2再接全连接层，但是不变的是，卷积层后紧接池化层的模式。

99.2% acc
5/6 layers

1.38.1.2.网络结构

LeNet-5网络

LeNet-5网络是针对灰度图进行训练的，输入图像大小为32 * 32 * 1

1.38.1.3.代码实现

# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn


class Lenet(nn.Module):

    def __init__(self):
        super(Lenet, self).__init__()

        layer1 = nn.Sequential()
        layer1.add_module('conv1', nn.Conv2d(3, 6, 5, padding=1))
        layer1.add_module('pool1', nn.MaxPool2d(2, 2))
        self.layer1 = layer1

        layer2 = nn.Sequential()
        layer2.add_module('conv2', nn.Conv2d(6, 16, 5))
        layer2.add_module('pool2', nn.MaxPool2d(2, 2))
        self.layer2 = layer2

        layer3 = nn.Sequential()
        layer3.add_module('fc1', nn.Linear(400, 120))
        layer3.add_module('fc2', nn.Linear(120, 84))
        layer3.add_module('fc3', nn.Linear(84, 10))
        self.layer3 = layer3

    def forward(self, x):
        assert x.size(-1) == 32
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.view(x.size(0), -1)
        x = self.layer3(x)
        return x


def main():
    net = Lenet()
    x = torch.randn(1, 3, 32, 32)
    print(net(x))


if __name__ == '__main__':
    main()

输出结果：

tensor([[-0.0687,  0.1565,  0.0494, -0.1422,  0.0449, -0.0102, -0.0630, -0.0782,
          0.1001, -0.2641]], grad_fn=<AddmmBackward>)

1.38.2.AlexNet

AlexNet，8层网络，2012年ImageNet冠军，引起了对神经网络的关注
提出了max pooling,ReLU函数,使用dropout,采集卷积核比较大。

1.38.2.1.介绍

2012年AlexNet框架在ImageNet竞赛上面大放异彩的，它以领先第二名10%的准确度夺得冠军。掀起了卷积神经网络在图像领域的热潮。AlexNet相比于LeNet层数更深，第一次引入了激活层ReLU，并且在全连接层加入了Dropout防止过拟合。

1.38.2.2.网络结构

网友画的图有：

1.贡献： ILSVRC2012冠军，展现出了深度CNN在图像任务上的惊人表现，掀起CNN研究的热潮，是如今深度学习和AI迅猛发展的重要原因。ImageNet比赛为一直研究神经网络的Hinton提供了施展平台，AlexNet就是由hinton和他的两位学生发表的，在AlexNet之前，深度学习已经沉寂了很久。
2.网络结构：如下图所示，8层网络，参数大约有60 million，使用了relu函数，头两个全连接层使用了0.5的dropout。使用了LRN和重叠的池化，现在LRN都不用了，一般用BN作Normalization。当时使用了多GPU训练。
3.预处理：先down-sample成最短边为256的图像，然后剪出中间的256x256图像，再减均值做归一化（over training set）。训练时，做数据增强，对每张图像，随机提取出227x227以及水平镜像版本的图像。除了数据增强，还使用了PCA对RGB像素降维的方式来缓和过拟合问题。
4.预测：对每张图像提取出5张（四个角落以及中间）以及水平镜像版本，总共10张，平均10个预测作为最终预测。
5.超参数：SGD，学习率0.01，batch size是128，momentum为0.9，weight decay为0.0005（论文有个权重更新公式），每当validation error不再下降时，学习率除以10。权重初始化用（0，0.01）的高斯分布，二四五卷积层和全连接层的bias初始化为1（给relu提供正值利于加速前期训练），其余bias初始化为0。
AlexNet Architecture

AlexNet 每层的超参数及参数数量

1.38.2.3.代码实现

# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn


class Alexnet(nn.Module):
    def __init__(self, n_classes):
        super(Alexnet, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.1),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.1),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, n_classes),
        )

    def forward(self, x):
        assert x.size(-1) == 224
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


def main():
    x = torch.randn(1, 3, 224, 224)
    net = Alexnet(10)
    print(net(x))


if __name__ == '__main__':
    main()

输出结果：

tensor([[-0.0016,  0.0054, -0.0021, -0.0123,  0.0113,  0.0088,  0.0171,  0.0032,
         -0.0075,  0.0074]], grad_fn=<AddmmBackward>)

1.38.3.VGG

VGG 网络层数更新，提出1 * 1， 3 * 3卷积核会使参数更少，效果更好。

1.38.3.1.介绍

VGGNe是ImageNet 2014年的亚军，它使用了更小的滤波器，同时使用的更深的网络结构。VGG只是对网络层进行不断的堆叠，并没有进行太多的创新，但增加深度确实可以一定程度改善模型效果。
AlexNet只有8层网络，而VGGNet有16和19层网络。AlexNet使用1111的大滤波器，而VGGNet使用33的卷积滤波器和22的大池化层。AlexNet和VGGNet对比图：

1.38.3.2.网络结构 VGG-16

从网络模型可以看出，VGG16相比AlexNet类的模型具有较深的深度，通过反复堆叠33的卷积层和22的池化层，VGG16构建了较深层次的网络结构，整个网络的卷积核使用了一致的33的尺寸，最大池化层尺寸也一致为22。与AlexNet主要有以下不同：
VGG16有16层网络，AlexNet只有8层
在训练和测试时使用了多尺度做数据增强。

1.38.3.3.代码实现

# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn


class VGGNet(nn.Module):
    def __init__(self, n_classes, n_layers, in_channels=3):
        super(VGGNet, self).__init__()

        self.architecture = {
            16: [64, 64, 'Pooling', 128, 128, 'Pooling', 256, 256, 256,
                 'Pooling', 512, 512, 512, 'Pooling', 512, 512, 512, 'Pooling'],
            19: [64, 64, 'Pooling', 128, 128, 'Pooling', 256, 256, 256, 256,
                 'Pooling', 512, 512, 512, 512, 'Pooling', 512, 512, 512, 512, 'Pooling'],
        }
        self.features = self._constructNet(self.architecture[n_layers], in_channels)
        self.classifier = nn.Linear(512, n_classes)

    def _constructNet(self, arch, in_channels):
        net = []
        for out_channels in arch:
            if out_channels == 'Pooling':
                net += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                net += [
                    nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
                    nn.BatchNorm2d(out_channels),
                    nn.ReLU(inplace=True)
                ]
                in_channels = out_channels
        net += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*net)

    def forward(self, x):
        assert x.size(-1) == 32
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


def main():
    x = torch.randn(1, 3, 32, 32)
    net = VGGNet(10, 19)
    print(net(x))


if __name__ == '__main__':
    main()

1.38.4.GoogleNet

1.38.4.1.介绍

GoogLeNet 也叫InceptionNet，是在 2014 年被提出的。GoogLeNet 采取了比 VGGNet 更深的网络结构，一共有 22 层，但是它的参数却比 A1exNet少了12倍。同时有很高的计算效率，因为它采用了一种很有效的Inception模块，而且它也没有全连接层，是 2014 年比赛的冠军。

1.38.4.2.网络结构

简单版Inception模块：
该结构将CNN中常用的卷积（1x1，3x3，5x5）、池化操作（3x3）堆叠在一起（卷积、池化后的尺寸相同，将通道相加），一方面增加了网络的宽度，另一方面也增加了网络对尺度的适应性。

网络卷积层中的网络能够提取输入的每一个细节信息，同时5x5的滤波器也能够覆盖大部分接受层的的输入。还可以进行一个池化操作，以减少空间大小，降低过度拟合。在这些层之上，在每一个卷积层后都要做一个ReLU操作，以增加网络的非线性特征。然而这个Inception原始版本，所有的卷积核都在上一层的所有输出上来做，而那个5x5的卷积核所需的计算量就太大了，造成了特征图的厚度很大。

优化版Inception模型：为了避免上述情况，在3x3前、5x5前、max pooling后分别加上了1x1的卷积核，以起到了降低特征图厚度的作用，这也就形成了Inception v1的网络结构，如下图所示：

1.38.4.3.代码实现

整个GoogLeNet都是由这些Inception模块组成的。首先定义一个最基础的卷积模块、然后根据这个模块定义了1x1 ，3x3和5x5的模块和一个池化层，最后使用 torch.cat()将它们按深度拼接起来，得到输出结果。

# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn


class Inception(nn.Module):
    def __init__(self, in_channels, out_1x1, out_3x3_1, out_n3x3_2, out_5x5_1, out_5x5_2, pool_size):
        super(Inception, self).__init__()
        # 1x1 conv branch
        self.b1 = nn.Sequential(
            nn.Conv2d(in_channels, out_1x1, kernel_size=1),
            nn.BatchNorm2d(out_1x1),
            nn.ReLU(True),
        )

        # 1x1 conv -> 3x3 conv branch
        self.b2 = nn.Sequential(
            nn.Conv2d(in_channels, out_3x3_1, kernel_size=1),
            nn.BatchNorm2d(out_3x3_1),
            nn.ReLU(True),
            nn.Conv2d(out_3x3_1, out_n3x3_2, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_n3x3_2),
            nn.ReLU(True),
        )

        # 1x1 conv -> 5x5 conv branch
        self.b3 = nn.Sequential(
            nn.Conv2d(in_channels, out_5x5_1, kernel_size=1),
            nn.BatchNorm2d(out_5x5_1),
            nn.ReLU(True),
            nn.Conv2d(out_5x5_1, out_5x5_2, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_5x5_2),
            nn.ReLU(True),
            nn.Conv2d(out_5x5_2, out_5x5_2, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_5x5_2),
            nn.ReLU(True),
        )

        # 3x3 pool -> 1x1 conv branch
        self.b4 = nn.Sequential(
            nn.MaxPool2d(3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_size, kernel_size=1),
            nn.BatchNorm2d(pool_size),
            nn.ReLU(True),
        )

    def forward(self, x):
        y1 = self.b1(x)
        y2 = self.b2(x)
        y3 = self.b3(x)
        y4 = self.b4(x)
        return torch.cat([y1,y2,y3,y4], 1)


class Googlenet(nn.Module):
    def __init__(self):
        super(Googlenet, self).__init__()
        self.pre_layers = nn.Sequential(
            nn.Conv2d(3, 192, kernel_size=3, padding=1),
            nn.BatchNorm2d(192),
            nn.ReLU(True),
        )

        self.a3 = Inception(192,  64,  96, 128, 16, 32, 32)
        self.b3 = Inception(256, 128, 128, 192, 32, 96, 64)

        self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)

        self.a4 = Inception(480, 192,  96, 208, 16,  48,  64)
        self.b4 = Inception(512, 160, 112, 224, 24,  64,  64)
        self.c4 = Inception(512, 128, 128, 256, 24,  64,  64)
        self.d4 = Inception(512, 112, 144, 288, 32,  64,  64)
        self.e4 = Inception(528, 256, 160, 320, 32, 128, 128)

        self.a5 = Inception(832, 256, 160, 320, 32, 128, 128)
        self.b5 = Inception(832, 384, 192, 384, 48, 128, 128)

        self.avgpool = nn.AvgPool2d(8, stride=1)
        self.linear = nn.Linear(1024, 10)

    def forward(self, x):
        x = self.pre_layers(x)
        x = self.a3(x)
        x = self.b3(x)
        x = self.maxpool(x)
        x = self.a4(x)
        x = self.b4(x)
        x = self.c4(x)
        x = self.d4(x)
        x = self.e4(x)
        x = self.maxpool(x)
        x = self.a5(x)
        x = self.b5(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.linear(x)
        return x


def main():
    x = torch.randn(1, 3, 32, 32)
    net = Googlenet()
    print(net(x))


if __name__ == '__main__':
    main()

1.38.5.ResNet

1.38.5.1.介绍

ResNet是2015年ImageNet竞赛的冠军。由微软研究院提出，不再是简单的堆积层数，通过残差模块能够成功地训练高达152层深的神经网络。ResNet 最初的设计灵感来自这个问题：

随着网络深度增加，网络的准确度应该同步增加，当然要注意过拟合问题。但是网络深度增加的一个问题在于这些增加的层是参数更新的信号，因为梯度是从后向前传播的，增加网络深度后，比较靠前的层梯度会很小。这意味着这些层基本上学习停滞了，这就是梯度消失问题。

深度网络的第二个问题在于训练，当网络更深时意味着参数空间更大，优化问题变得更难，因此简单地去增加网络深度反而出现更高的训练误差。在不断加深度神经网络的时候，会出现一个Degradation，即准确率会先上开然后达到饱和，再持续增加深度则会导致模型准确率下降。比如下图，一个56层的网络的性能却不如20层的性能好，这不是因为过拟合（训练集训练误差依然很高），这就是退化问题。残差网络ResNet设计一种残差模块让我们可以训练更深的网络。

这里详细分析一下残差单元来理解ResNet的精髓。

从下图可以看出，数据经过了两条路线，一条是常规路线，另一条则是捷（shortcut），直接实现单位映射的直接连接的路线，这有点类似与电路中的“短路”。通过实验，这种带有shortcut的结构确实可以很好地应对退化问题。我们把网络中的一个模块的输入和输出关系看作是y=H(x)，那么直接通过梯度方法求H(x)就会遇到上面提到的退化问题，如果使用了这种带shortcut的结构，那么可变参数部分的优化目标就不再是H(x),若用F(x)来代表需要优化的部分的话，则H(x)=F(x)+x，也就是F(x)=H(x)-x。因为在单位映射的假设中y=x就相当于观测值，所以F(x)就对应着残差，因而叫残差网络。为啥要这样做，因为作者认为学习残差F(X)比直接学习H(X)简单！设想下，现在根据我们只需要去学习输入和输出的差值就可以了，绝对量变为相对量（H（x）-x 就是输出相对于输入变化了多少），优化起来简单很多。

考虑到x的维度与F(X)维度可能不匹配情况，需进行维度匹配。这里论文中采用两种方法解决这一问题(其实是三种，但通过实验发现第三种方法会使performance急剧下降，故不采用):

zero_padding:对恒等层进行0填充的方式将维度补充完整。这种方法不会增加额外的参数
projection:在恒等层采用1x1的卷积核来增加维度。这种方法会增加额外的参数

1.38.5.2.网络结构

下图展示了两种形态的残差模块，左图是常规残差模块，有两个3×3卷积核卷积核组成，但是随着网络进一步加深，这种残差结构在实践中并不是十分有效。针对这问题，右图的“瓶颈残差模块”（bottleneck residual block）可以有更好的效果，它依次由1×1、3×3、1×1这三个卷积层堆积而成，这里的1×1的卷积能够起降维或升维的作用，从而令3×3的卷积可以在相对较低维度的输入上进行，以达到提高计算效率的目的。

1.38.5.3.代码实现

# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn
import torch.nn.functional as F


class basicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1):
        super(basicBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # shortcut is a convolution layer with BatchNormalization
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != self.expansion * in_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, self.expansion * out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion * out_channels)
            )

    def forward(self, input):
        x = F.relu(self.bn1(self.conv1(input)))
        x = self.bn2(self.conv2(x))
        x += self.shortcut(input)
        x = F.relu(x)
        return x


class bottleneckBlock(nn.Module):
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1):
        super(bottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, self.expansion * out_channels, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion * out_channels)

        if stride != 1 or in_channels != self.expansion * out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, self.expansion * out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion * out_channels)
            )

    def forward(self, input):
        x = F.relu(self.bn1(self.conv1(input)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.bn3(self.conv3(x))
        x += self.shortcut(input)
        x = F.relu(x)
        return x


class Resnet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(Resnet, self).__init__()
        self.in_channels = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = F.avg_pool2d(x, 4)
        x = x.view(x.size(0), -1)
        x = self.linear(x)
        return x


def ResNet18():
    return Resnet(basicBlock, [2, 2, 2, 2])


def ResNet152():
    return Resnet(bottleneckBlock, [3, 8, 36, 3])


def main():
    x = torch.randn(1, 3, 32, 32)
    net = ResNet18()
    print(net(x))


if __name__ == '__main__':
    main()

1.38.6.NIN

全称Network in network,顾名思义，该网络的目的就是用某种方式堆叠小的网络。

左图展示了AlexNet和VGG的网络结构，右图展示了NiN的网络结构。

1.38.6.1.局部块

左图是AlexNet和VGG的网络结构局部，右图是NiN的网络结构局部

左图是AlexNet和VGG的网络结构局部,右图是NiN的网络结构局部

NiN块是NiN中的基础块。它由一个卷积层加两个充当全连接层的卷积层串联而成。其中第一个卷积层的超参数可以自行设置,而而第二和第三个卷积层的超参数一般是固定的。

如果要堆叠小模型，那么就无法避免全连接层，而将卷积结果的特征图转为全连接的维度转换，将耗费大量的资源。而使用1x1的卷积层正好可以完成这一替换。1x1的卷积层，可以看成全连接层，其中空间维度（高和宽）上的每个元素相当于样本，通道相当于特征。因此，NIN使用1x1卷积层来替代全连接层，从而使空间信息能够自然传递到后面的层。

1.38.6.2.替代FC,使用全局平均池化层求分类

NiN还有一个设计与AlexNet显著不不同:NiN去掉了了AlexNet最后的3个全连接层,取而代之地,NiN使用用了输出通道数等于标签类别数的NiN块,然后使用全局平均池化层对每个通道中所有元素求平均并直接用于分类。

这里的全局平均池化层即窗口形状等于输入空间维形状的平均池化层。

NiN的这个设计的好处是可以显著减小模型参数尺寸,从而缓解过拟合。然而,该设计有时会造成获得有效模型的训练时间的增加。

1.38.6.3.代码案例

class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()

    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])


class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, X):
        return X.view(X.shape[0], -1)


class NIN(nn.Module):
    def __init__(self, classes):
        self.classes = classes
        super(NIN, self).__init__()
        self.net = nn.Sequential(
            self.nin_block(3, 96, kernel_size=11, stride=4, padding=0),
            nn.MaxPool2d(kernel_size=3, stride=2),
            self.nin_block(96, 256, kernel_size=5, stride=1, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            self.nin_block(256, 384, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Dropout(0.5),
            self.nin_block(384, self.classes, kernel_size=3, stride=1, padding=1),
            GlobalAvgPool2d(),
            FlattenLayer()
        )

    def nin_block(self, in_channels, out_channels, kernel_size, stride, padding):
        block = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
            nn.ReLU(),
            nn.Conv2d(out_channels, out_channels, kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(out_channels, out_channels, kernel_size=1),
            nn.ReLU()
        )
        return block

    def forward(self, img):
        return self.net(img)

1.38.7.DenseNet

DenseNet与ResNet区别主要在于它将ResNet的残差块中的相加变为了channel的连结。

1.38.7.1.稠密块

DenseNet使用了ResNet改良版的“BN->relu->conv”结构，它组成了基本的卷积块 conv_block。

稠密块由多个 conv_block 组成,每块使用相同的输出通道数。但在前向计算时,我们将每块的输入和输出在通道维上连结。

1.38.7.2.过滤层

由于每个稠密块都会带来通道数的增加,使用用过多则会带来过于复杂的模型。过渡层用来控制模型复杂度。它通过卷积层来减小通道数,并使用步幅为2的平均池化层减半高和宽,从而进一步降低模型复杂度。

1.38.7.3.模型结构

1.38.7.4.代码案例

class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, X):
        return X.view(X.shape[0], -1)


class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()

    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])


class DenseBlock(nn.Module):
    def __init__(self, in_channels, out_channels, num_conv):
        super(DenseBlock, self).__init__()
        self.out_channels = out_channels

        layers = []
        for i in range(num_conv):
            in_c = in_channels + i * self.out_channels
            layers.append(self.conv_block(in_c, self.out_channels))

        self.net = nn.ModuleList(layers)
        self.out_channels = in_channels + num_conv * out_channels

    def conv_block(self, in_channels, out_channels):
        block = nn.Sequential(
            nn.BatchNorm2d(in_channels),
            nn.ReLU(),
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
        )
        return block

    def forward(self, X):
        for layer in self.net:
            Y = layer(X)
            X = torch.cat((X, Y), dim=1)
        return X


class DenseNet(nn.Module):
    def __init__(self, classes):
        super(DenseNet, self).__init__()
        self.classes = classes
        self.net = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )

        num_channels, growth_rate = 64, 32  # 当前通道数64,每层增加32通道(即每层卷积输出通道为32)
        num_convs_in_dense_block = [4, 4, 4, 4]  # 每个dense block 卷积数

        for i, num_convs in enumerate(num_convs_in_dense_block):
            block = DenseBlock(num_channels, growth_rate, num_convs)
            self.net.add_module("dense_block_%d" % i, block)

            num_channels = block.out_channels  # 上一个块的输出通道数

            if i != len(num_convs_in_dense_block) - 1:
                self.net.add_module("trainsition_block_%d" % i, self.transition_block(num_channels, num_channels // 2))
                num_channels = num_channels // 2

        self.net.add_module("BN", nn.BatchNorm2d(num_channels))
        self.net.add_module("relu", nn.ReLU())
        self.net.add_module("global_avg_pool", GlobalAvgPool2d())
        self.net.add_module("flatten", FlattenLayer())
        self.net.add_module("fc", nn.Linear(num_channels, self.classes))

    def forward(self, X):
        return self.net(X)

    def transition_block(self, in_channels, out_channels):
        block = nn.Sequential(
            nn.BatchNorm2d(in_channels),
            nn.ReLU(),
            nn.Conv2d(in_channels, out_channels, kernel_size=1),
            nn.AvgPool2d(kernel_size=2, stride=2)
        )
        return block

1.38.7.5.总结

在跨层连接上,不同于ResNet中将输入与输出相加,DenseNet在通道维上连结输入与输出。
DenseNet的主要构建模块是稠密块和过渡层。

1.38.8.EfficientNet

代码：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np


def conv_bn_act(in_, out_, kernel_size,
                stride=1, groups=1, bias=True,
                eps=1e-3, momentum=0.01):
    return nn.Sequential(
        SamePadConv2d(in_, out_, kernel_size, stride, groups=groups, bias=bias),
        nn.BatchNorm2d(out_, eps, momentum),
        Swish()
    )


class SamePadConv2d(nn.Conv2d):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):
        super(SamePadConv2d, self).__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)

    def get_pad_odd(self, in_, weight, stride, dilation):
        effective_filter_size_rows = (weight - 1) * dilation + 1
        out_rows = (in_ + stride - 1) // stride
        padding_needed = max(0, (out_rows - 1) * stride + effective_filter_size_rows - in_)
        padding_rows = max(0, (out_rows - 1) * stride + (weight - 1) * dilation + 1 - in_)
        rows_odd = (padding_rows % 2 != 0)
        return padding_rows, rows_odd

    def forward(self, x):
        padding_rows, rows_odd = self.get_pad_odd(x.shape[2], self.weight.shape[2], self.stride[0], self.dilation[0])
        padding_cols, cols_odd = self.get_pad_odd(x.shape[3], self.weight.shape[3], self.stride[1], self.dilation[1])

        if rows_odd or cols_odd:
            x = F.pad(x, [0, int(cols_odd), 0, int(rows_odd)])

        return F.conv2d(x, self.weight, self.bias, self.stride,
                        padding=(padding_rows // 2, padding_cols // 2),
                        dilation=self.dilation, groups=self.groups)


class Swish(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)


class Flatten(nn.Module):
    def forward(self, x):
        return x.view(x.shape[0], -1)


class SEModule(nn.Module):
    def __init__(self, in_, squeeze_ch):
        super(SEModule, self).__init__()
        self.se = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(in_, squeeze_ch, kernel_size=1, stride=1, padding=0, bias=True),
            Swish(),
            nn.Conv2d(squeeze_ch, in_, kernel_size=1, stride=1, padding=0, bias=True),
        )

    def forward(self, x):
        return x * torch.sigmoid(self.se(x))


class DropConnect(nn.Module):
    def __init__(self, ratio):
        super().__init__()
        self.ratio = 1.0 - ratio

    def forward(self, x):
        if not self.training:
            return x

        random_tensor = self.ratio
        random_tensor += torch.rand([x.shape[0], 1, 1, 1], dtype=torch.float, device=x.device)
        random_tensor.requires_grad_(False)
        return x / self.ratio * random_tensor.floor()


class MBConv(nn.Module):
    def __init__(self, in_, out_, expand,
                 kernel_size, stride, skip,
                 se_ratio, dc_ratio=0.2):
        super().__init__()
        mid_ = in_ * expand
        self.expand_conv = conv_bn_act(in_, mid_, kernel_size=1, bias=False) if expand != 1 else nn.Identity()

        self.depth_wise_conv = conv_bn_act(mid_, mid_,
                                           kernel_size=kernel_size, stride=stride,
                                           groups=mid_, bias=False)

        self.se = SEModule(mid_, int(in_ * se_ratio)) if se_ratio > 0 else nn.Identity()

        self.project_conv = nn.Sequential(
            SamePadConv2d(mid_, out_, kernel_size=1, stride=1, bias=False),
            nn.BatchNorm2d(out_, 1e-3, 0.01)
        )

        # if _block_args.id_skip:
        # and all(s == 1 for s in self._block_args.strides)
        # and self._block_args.input_filters == self._block_args.output_filters:
        self.skip = skip and (stride == 1) and (in_ == out_)

        # DropConnect
        # self.dropconnect = DropConnect(dc_ratio) if dc_ratio > 0 else nn.Identity()
        # Original TF Repo not using drop_rate
        # https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L408
        self.dropconnect = nn.Identity()

    def forward(self, inputs):
        expand = self.expand_conv(inputs)
        x = self.depth_wise_conv(expand)
        x = self.se(x)
        x = self.project_conv(x)
        if self.skip:
            x = self.dropconnect(x)
            x = x + inputs
        return x


class MBBlock(nn.Module):
    def __init__(self, in_, out_, expand, kernel, stride, num_repeat, skip, se_ratio, drop_connect_ratio=0.2):
        super().__init__()
        layers = [MBConv(in_, out_, expand, kernel, stride, skip, se_ratio, drop_connect_ratio)]
        for i in range(1, num_repeat):
            layers.append(MBConv(out_, out_, expand, kernel, 1, skip, se_ratio, drop_connect_ratio))
        self.layers = nn.Sequential(*layers)

    def forward(self, x):
        return self.layers(x)


class EfficientNet(nn.Module):
    def __init__(self, width_coeff, depth_coeff,
                 depth_div=8, min_depth=None,
                 dropout_rate=0.2, drop_connect_rate=0.2,
                 num_classes=1000):
        super().__init__()
        min_depth = min_depth or depth_div

        self.stem = conv_bn_act(3, 32, kernel_size=3, stride=2, bias=False)

        def renew_ch(x):
            if not width_coeff:
                return x

            x *= width_coeff
            new_x = max(min_depth, int(x + depth_div / 2) // depth_div * depth_div)
            if new_x < 0.9 * x:
                new_x += depth_div
            return int(new_x)

        def renew_repeat(x):
            return int(np.ceil(x * depth_coeff))

        self.blocks = nn.Sequential(
            #       input channel  output    expand  k  s                   skip  se
            MBBlock(renew_ch(32), renew_ch(16), 1, 3, 1, renew_repeat(1), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(16), renew_ch(24), 6, 3, 2, renew_repeat(2), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(24), renew_ch(40), 6, 5, 2, renew_repeat(2), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(40), renew_ch(80), 6, 3, 2, renew_repeat(3), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(80), renew_ch(112), 6, 5, 1, renew_repeat(3), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(112), renew_ch(192), 6, 5, 2, renew_repeat(4), True, 0.25, drop_connect_rate),
            MBBlock(renew_ch(192), renew_ch(320), 6, 3, 1, renew_repeat(1), True, 0.25, drop_connect_rate)
        )

        self.head = nn.Sequential(
            *conv_bn_act(renew_ch(320), renew_ch(1280), kernel_size=1, bias=False),
            nn.AdaptiveAvgPool2d(1),
            nn.Dropout2d(dropout_rate, True) if dropout_rate > 0 else nn.Identity(),
            Flatten(),
            nn.Linear(renew_ch(1280), num_classes)
        )

        self.init_weights()

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, SamePadConv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
            elif isinstance(m, nn.Linear):
                init_range = 1.0 / np.sqrt(m.weight.shape[1])
                nn.init.uniform_(m.weight, -init_range, init_range)

    def forward(self, inputs):
        stem = self.stem(inputs)
        x = self.blocks(stem)
        head = self.head(x)
        return head


def main():
    x = torch.randn(1, 3, 224, 224)
    net = EfficientNet(1, 1)
    print(net(x))


if __name__ == '__main__':
    main()

1.38.9.MobileNetV1

import time
import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable


class MobileNet(nn.Module):
    def __init__(self):
        super(MobileNet, self).__init__()

        def conv_bn(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True))

        def conv_dw(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                nn.BatchNorm2d(inp),
                nn.ReLU(inplace=True),

                nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True))

        self.feature = nn.Sequential(
            conv_bn(3, 32, 2),
            conv_dw(32, 64, 1),
            conv_dw(64, 128, 2),
            conv_dw(128, 128, 1),
            conv_dw(128, 256, 2),
            conv_dw(256, 256, 1),
            conv_dw(256, 512, 2),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 1024, 2),
            conv_dw(1024, 1024, 1),
            nn.AvgPool2d(7) )

        self.fc = nn.Linear(1024, 1000)

    def forward(self, x):
        x = self.feature(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x



def main():
    x = torch.randn(1, 3, 224, 224)
    net = MobileNet()
    print(net(x))


if __name__ == '__main__':
    main()

1.38.10.MobileNetV2

import torch
import torch.nn as nn
import numpy as np


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(inp * expand_ratio)
        self.use_res_connect = self.stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNet(nn.Module):
    def __init__(self, n_class=15, input_size=224, width_mult=1.):
        super(MobileNet, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280
        interverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        # building first layer
        assert input_size % 32 == 0
        # input_channel = make_divisible(input_channel * width_mult)  # first channel is always 32!
        self.last_channel = self.make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channel
        self.features = [self.conv_bn(3, input_channel, 2)]
        # building inverted residual blocks
        for t, c, n, s in interverted_residual_setting:
            output_channel = self.make_divisible(c * width_mult) if t > 1 else c
            for i in range(n):
                if i == 0:
                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
                else:
                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        self.features.append(self.conv_1x1_bn(input_channel, self.last_channel))
        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)

        # building classifier
        self.classifier = nn.Linear(self.last_channel, n_class)

        self._initialize_weights()

    def make_divisible(self, x, divisible_by=8):
        import numpy as np
        return int(np.ceil(x * 1. / divisible_by) * divisible_by)

    def conv_bn(self, inp, oup, stride):
        return nn.Sequential(
            nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
            nn.BatchNorm2d(oup),
            nn.ReLU6(inplace=True)
        )

    def conv_1x1_bn(self, inp, oup):
        return nn.Sequential(
            nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup),
            nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, np.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()



def main():
    x = torch.randn(1, 3, 224, 224)
    net = MobileNet()
    print(net)
    print(net(x))


if __name__ == '__main__':
    main()

1.38.11.MobileNetV3

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

def get_model_parameters(model):
    total_parameters = 0
    for layer in list(model.parameters()):
        layer_parameter = 1
        for l in list(layer.size()):
            layer_parameter *= l
        total_parameters += layer_parameter
    return total_parameters


class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.inplace = inplace

    def forward(self, x):
        return F.relu6(x + 3., inplace=self.inplace) / 6.


class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.inplace = inplace

    def forward(self, x):
        out = F.relu6(x + 3., inplace=self.inplace) / 6.
        return out * x


class SqueezeBlock(nn.Module):
    def __init__(self, exp_size, divide=4):
        super(SqueezeBlock, self).__init__()
        self.dense = nn.Sequential(
            nn.Linear(exp_size, exp_size // divide),
            nn.ReLU(inplace=True),
            nn.Linear(exp_size // divide, exp_size),
            h_sigmoid()
        )

    def forward(self, x):
        batch, channels, height, width = x.size()
        out = F.avg_pool2d(x, kernel_size=[height, width]).view(batch, -1)
        out = self.dense(out)
        out = out.view(batch, channels, 1, 1)
        # out = hard_sigmoid(out)

        return out * x


class MobileBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernal_size, stride, nonLinear, SE, exp_size):
        super(MobileBlock, self).__init__()
        self.out_channels = out_channels
        self.nonLinear = nonLinear
        self.SE = SE
        padding = (kernal_size - 1) // 2

        self.use_connect = stride == 1 and in_channels == out_channels

        if self.nonLinear == "RE":
            activation = nn.ReLU
        else:
            activation = h_swish

        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, exp_size, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(exp_size),
            activation(inplace=True)
        )
        self.depth_conv = nn.Sequential(
            nn.Conv2d(exp_size, exp_size, kernel_size=kernal_size, stride=stride, padding=padding, groups=exp_size),
            nn.BatchNorm2d(exp_size),
        )

        if self.SE:
            self.squeeze_block = SqueezeBlock(exp_size)

        self.point_conv = nn.Sequential(
            nn.Conv2d(exp_size, out_channels, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_channels),
            activation(inplace=True)
        )

    def forward(self, x):
        # MobileNetV2
        out = self.conv(x)
        out = self.depth_conv(out)

        # Squeeze and Excite
        if self.SE:
            out = self.squeeze_block(out)

        # point-wise conv
        out = self.point_conv(out)

        # connection
        if self.use_connect:
            return x + out
        else:
            return out


class MobileNet(nn.Module):
    def __init__(self, model_mode="SMALL", num_classes=1000, multiplier=1.0, dropout_rate=0.0):
        super(MobileNet, self).__init__()
        self.num_classes = num_classes

        if model_mode == "LARGE":
            layers = [
                [16, 16, 3, 1, "RE", False, 16],
                [16, 24, 3, 2, "RE", False, 64],
                [24, 24, 3, 1, "RE", False, 72],
                [24, 40, 5, 2, "RE", True, 72],
                [40, 40, 5, 1, "RE", True, 120],

                [40, 40, 5, 1, "RE", True, 120],
                [40, 80, 3, 2, "HS", False, 240],
                [80, 80, 3, 1, "HS", False, 200],
                [80, 80, 3, 1, "HS", False, 184],
                [80, 80, 3, 1, "HS", False, 184],

                [80, 112, 3, 1, "HS", True, 480],
                [112, 112, 3, 1, "HS", True, 672],
                [112, 160, 5, 1, "HS", True, 672],
                [160, 160, 5, 2, "HS", True, 672],
                [160, 160, 5, 1, "HS", True, 960],
            ]
            init_conv_out = self._make_divisible(16 * multiplier)
            self.init_conv = nn.Sequential(
                nn.Conv2d(in_channels=3, out_channels=init_conv_out, kernel_size=3, stride=2, padding=1),
                nn.BatchNorm2d(init_conv_out),
                h_swish(inplace=True),
            )

            self.block = []
            for in_channels, out_channels, kernal_size, stride, nonlinear, se, exp_size in layers:
                in_channels = self._make_divisible(in_channels * multiplier)
                out_channels = self._make_divisible(out_channels * multiplier)
                exp_size = self._make_divisible(exp_size * multiplier)
                self.block.append(MobileBlock(in_channels, out_channels, kernal_size, stride, nonlinear, se, exp_size))
            self.block = nn.Sequential(*self.block)

            out_conv1_in = self._make_divisible(160 * multiplier)
            out_conv1_out = self._make_divisible(960 * multiplier)
            self.out_conv1 = nn.Sequential(
                nn.Conv2d(out_conv1_in, out_conv1_out, kernel_size=1, stride=1),
                nn.BatchNorm2d(out_conv1_out),
                h_swish(inplace=True),
            )

            out_conv2_in = self._make_divisible(960 * multiplier)
            out_conv2_out = self._make_divisible(1280 * multiplier)
            self.out_conv2 = nn.Sequential(
                nn.Conv2d(out_conv2_in, out_conv2_out, kernel_size=1, stride=1),
                h_swish(inplace=True),
                nn.Dropout(dropout_rate),
                nn.Conv2d(out_conv2_out, self.num_classes, kernel_size=1, stride=1),
            )

        elif model_mode == "SMALL":
            layers = [
                [16, 16, 3, 2, "RE", True, 16],
                [16, 24, 3, 2, "RE", False, 72],
                [24, 24, 3, 1, "RE", False, 88],
                [24, 40, 5, 2, "RE", True, 96],
                [40, 40, 5, 1, "RE", True, 240],
                [40, 40, 5, 1, "RE", True, 240],
                [40, 48, 5, 1, "HS", True, 120],
                [48, 48, 5, 1, "HS", True, 144],
                [48, 96, 5, 2, "HS", True, 288],
                [96, 96, 5, 1, "HS", True, 576],
                [96, 96, 5, 1, "HS", True, 576],
            ]

            init_conv_out = self._make_divisible(16 * multiplier)
            self.init_conv = nn.Sequential(
                nn.Conv2d(in_channels=3, out_channels=init_conv_out, kernel_size=3, stride=2, padding=1),
                nn.BatchNorm2d(init_conv_out),
                h_swish(inplace=True),
            )

            self.block = []
            for in_channels, out_channels, kernal_size, stride, nonlinear, se, exp_size in layers:
                in_channels = self._make_divisible(in_channels * multiplier)
                out_channels = self._make_divisible(out_channels * multiplier)
                exp_size = self._make_divisible(exp_size * multiplier)
                self.block.append(MobileBlock(in_channels, out_channels, kernal_size, stride, nonlinear, se, exp_size))
            self.block = nn.Sequential(*self.block)

            out_conv1_in = self._make_divisible(96 * multiplier)
            out_conv1_out = self._make_divisible(576 * multiplier)
            self.out_conv1 = nn.Sequential(
                nn.Conv2d(out_conv1_in, out_conv1_out, kernel_size=1, stride=1),
                SqueezeBlock(out_conv1_out),
                nn.BatchNorm2d(out_conv1_out),
                h_swish(inplace=True),
            )

            out_conv2_in = self._make_divisible(576 * multiplier)
            out_conv2_out = self._make_divisible(1280 * multiplier)
            self.out_conv2 = nn.Sequential(
                nn.Conv2d(out_conv2_in, out_conv2_out, kernel_size=1, stride=1),
                h_swish(inplace=True),
                nn.Dropout(dropout_rate),
                nn.Conv2d(out_conv2_out, self.num_classes, kernel_size=1, stride=1),
            )

        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, np.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


    def _make_divisible(self, v, divisor=8, min_value=None):
        if min_value is None:
            min_value = divisor
        new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
        # Make sure that round down does not go down by more than 10%.
        if new_v < 0.9 * v:
            new_v += divisor
        return new_v

    def forward(self, x):
        out = self.init_conv(x)
        out = self.block(out)
        out = self.out_conv1(out)
        batch, channels, height, width = out.size()
        out = F.avg_pool2d(out, kernel_size=[height, width])
        out = self.out_conv2(out).view(batch, -1)
        return out

def main():
    x = torch.randn(1, 3, 224, 224)
    net = MobileNet()
    print(net(x))

if __name__ == '__main__':
    main()

1.38.12.SENet

import torch
import torch.nn as nn
import torch.nn.functional as F


class SEBlock(nn.Module):
    def __init__(self, in_planes, planes, stride=1):
        super(SEBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes)
            )

        # SE layers
        self.fc1 = nn.Conv2d(planes, planes//16, kernel_size=1)  # Use nn.Conv2d instead of nn.Linear
        self.fc2 = nn.Conv2d(planes//16, planes, kernel_size=1)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))

        # Squeeze
        w = F.avg_pool2d(out, out.size(2))
        w = F.relu(self.fc1(w))
        w = F.sigmoid(self.fc2(w))
        # Excitation
        out = out * w  # New broadcasting feature from v0.2!

        out += self.shortcut(x)
        out = F.relu(out)
        return out


class PreActBlock(nn.Module):
    def __init__(self, in_planes, planes, stride=1):
        super(PreActBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)

        if stride != 1 or in_planes != planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False)
            )

        # SE layers
        self.fc1 = nn.Conv2d(planes, planes//16, kernel_size=1)
        self.fc2 = nn.Conv2d(planes//16, planes, kernel_size=1)

    def forward(self, x):
        out = F.relu(self.bn1(x))
        shortcut = self.shortcut(out) if hasattr(self, 'shortcut') else x
        out = self.conv1(out)
        out = self.conv2(F.relu(self.bn2(out)))

        # Squeeze
        w = F.avg_pool2d(out, out.size(2))
        w = F.relu(self.fc1(w))
        w = F.sigmoid(self.fc2(w))
        # Excitation
        out = out * w

        out += shortcut
        return out


class SqueezeExcitationNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=1000):
        super(SqueezeExcitationNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block,  64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(25088, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def SENet18():
    return SqueezeExcitationNet(PreActBlock, [2,2,2,2])


def main():
    x = torch.randn(1, 3, 224, 224)
    net = SENet18()
    print(net(x))


if __name__ == '__main__':
    main()

1.38.13.ShuffleNetV1

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from collections import OrderedDict
from torch.nn import init



def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.data.size()

    channels_per_group = num_channels // groups

    # reshape
    x = x.view(batchsize, groups,
               channels_per_group, height, width)

    # transpose
    # - contiguous() required if transpose() is used before view().
    #   See https://github.com/pytorch/pytorch/issues/764
    x = torch.transpose(x, 1, 2).contiguous()

    # flatten
    x = x.view(batchsize, -1, height, width)

    return x


class ShuffleUnit(nn.Module):
    def __init__(self, in_channels, out_channels, groups=3,
                 grouped_conv=True, combine='add'):

        super(ShuffleUnit, self).__init__()

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.grouped_conv = grouped_conv
        self.combine = combine
        self.groups = groups
        self.bottleneck_channels = self.out_channels // 4

        # define the type of ShuffleUnit
        if self.combine == 'add':
            # ShuffleUnit Figure 2b
            self.depthwise_stride = 1
            self._combine_func = self._add
        elif self.combine == 'concat':
            # ShuffleUnit Figure 2c
            self.depthwise_stride = 2
            self._combine_func = self._concat

            # ensure output of concat has the same channels as
            # original output channels.
            self.out_channels -= self.in_channels
        else:
            raise ValueError("Cannot combine tensors with \"{}\"" \
                             "Only \"add\" and \"concat\" are" \
                             "supported".format(self.combine))

        # Use a 1x1 grouped or non-grouped convolution to reduce input channels
        # to bottleneck channels, as in a ResNet bottleneck module.
        # NOTE: Do not use group convolution for the first conv1x1 in Stage 2.
        self.first_1x1_groups = self.groups if grouped_conv else 1

        self.g_conv_1x1_compress = self._make_grouped_conv1x1(
            self.in_channels,
            self.bottleneck_channels,
            self.first_1x1_groups,
            batch_norm=True,
            relu=True
        )

        # 3x3 depthwise convolution followed by batch normalization
        self.depthwise_conv3x3 = nn.Conv2d(
            self.bottleneck_channels, self.bottleneck_channels, kernel_size=3, padding=1,
            stride=self.depthwise_stride, groups=self.bottleneck_channels, bias=False)
        self.bn_after_depthwise = nn.BatchNorm2d(self.bottleneck_channels)

        # Use 1x1 grouped convolution to expand from
        # bottleneck_channels to out_channels
        self.g_conv_1x1_expand = self._make_grouped_conv1x1(
            self.bottleneck_channels,
            self.out_channels,
            self.groups,
            batch_norm=True,
            relu=False
        )

    @staticmethod
    def _add(x, out):
        # residual connection
        return x + out

    @staticmethod
    def _concat(x, out):
        # concatenate along channel axis
        return torch.cat((x, out), 1)

    def _make_grouped_conv1x1(self, in_channels, out_channels, groups,
                              batch_norm=True, relu=False):

        modules = OrderedDict()

        conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, groups=groups)
        modules['conv1x1'] = conv

        if batch_norm:
            modules['batch_norm'] = nn.BatchNorm2d(out_channels)
        if relu:
            modules['relu'] = nn.ReLU()
        if len(modules) > 1:
            return nn.Sequential(modules)
        else:
            return conv

    def forward(self, x):
        # save for combining later with output
        residual = x

        if self.combine == 'concat':
            residual = F.avg_pool2d(residual, kernel_size=3,
                                    stride=2, padding=1)

        out = self.g_conv_1x1_compress(x)
        out = channel_shuffle(out, self.groups)
        out = self.depthwise_conv3x3(out)
        out = self.bn_after_depthwise(out)
        out = self.g_conv_1x1_expand(out)

        out = self._combine_func(residual, out)
        return F.relu(out)


class ShuffleNet(nn.Module):
    """ShuffleNet implementation.
    """

    def __init__(self, groups=3, in_channels=3, num_classes=1000):
        """ShuffleNet constructor.

        Arguments:
            groups (int, optional): number of groups to be used in grouped
                1x1 convolutions in each ShuffleUnit. Default is 3 for best
                performance according to original paper.
            in_channels (int, optional): number of channels in the input tensor.
                Default is 3 for RGB image inputs.
            num_classes (int, optional): number of classes to predict. Default
                is 1000 for ImageNet.

        """
        super(ShuffleNet, self).__init__()

        self.groups = groups
        self.stage_repeats = [3, 7, 3]
        self.in_channels = in_channels
        self.num_classes = num_classes

        # index 0 is invalid and should never be called.
        # only used for indexing convenience.
        if groups == 1:
            self.stage_out_channels = [-1, 24, 144, 288, 567]
        elif groups == 2:
            self.stage_out_channels = [-1, 24, 200, 400, 800]
        elif groups == 3:
            self.stage_out_channels = [-1, 24, 240, 480, 960]
        elif groups == 4:
            self.stage_out_channels = [-1, 24, 272, 544, 1088]
        elif groups == 8:
            self.stage_out_channels = [-1, 24, 384, 768, 1536]
        else:
            raise ValueError(
                """{} groups is not supported for
                   1x1 Grouped Convolutions""".format(self.groups))

        # Stage 1 always has 24 output channels
        self.conv1 = nn.Conv2d(self.in_channels, self.stage_out_channels[1], kernel_size=3, padding=1, groups=1, stride=2)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Stage 2
        self.stage2 = self._make_stage(2)
        # Stage 3
        self.stage3 = self._make_stage(3)
        # Stage 4
        self.stage4 = self._make_stage(4)

        # Global pooling:
        # Undefined as PyTorch's functional API can be used for on-the-fly
        # shape inference if input size is not ImageNet's 224x224

        # Fully-connected classification layer
        num_inputs = self.stage_out_channels[-1]
        self.fc = nn.Linear(num_inputs, self.num_classes)
        self.init_params()

    def init_params(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                init.constant_(m.weight, 1)
                init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                init.normal_(m.weight, std=0.001)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def _make_stage(self, stage):
        modules = OrderedDict()
        stage_name = "ShuffleUnit_Stage{}".format(stage)

        # First ShuffleUnit in the stage
        # 1. non-grouped 1x1 convolution (i.e. pointwise convolution)
        #   is used in Stage 2. Group convolutions used everywhere else.
        grouped_conv = stage > 2

        # 2. concatenation unit is always used.
        first_module = ShuffleUnit(
            self.stage_out_channels[stage - 1],
            self.stage_out_channels[stage],
            groups=self.groups,
            grouped_conv=grouped_conv,
            combine='concat'
        )
        modules[stage_name + "_0"] = first_module

        # add more ShuffleUnits depending on pre-defined number of repeats
        for i in range(self.stage_repeats[stage - 2]):
            name = stage_name + "_{}".format(i + 1)
            module = ShuffleUnit(
                self.stage_out_channels[stage],
                self.stage_out_channels[stage],
                groups=self.groups,
                grouped_conv=True,
                combine='add'
            )
            modules[name] = module

        return nn.Sequential(modules)

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(x)

        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)

        # global average pooling layer
        x = F.avg_pool2d(x, x.data.size()[-2:])

        # flatten for input to fully-connected layer
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return F.log_softmax(x, dim=1)

def main():
    x = torch.randn(1, 3, 224, 224)
    net = ShuffleNet()
    print(net(x))


if __name__ == '__main__':
    main()

1.38.14.ShuffleNetV2

import torch
import torch.nn as nn


def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.data.size()

    channels_per_group = num_channels // groups

    # reshape
    x = x.view(batchsize, groups,
               channels_per_group, height, width)

    x = torch.transpose(x, 1, 2).contiguous()

    # flatten
    x = x.view(batchsize, -1, height, width)

    return x


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, benchmodel):
        super(InvertedResidual, self).__init__()
        self.benchmodel = benchmodel
        self.stride = stride
        assert stride in [1, 2]

        oup_inc = oup // 2

        if self.benchmodel == 1:
            # assert inp == oup_inc
            self.banch2 = nn.Sequential(
                # pw
                nn.Conv2d(oup_inc, oup_inc, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup_inc),
                nn.ReLU(inplace=True),
                # dw
                nn.Conv2d(oup_inc, oup_inc, 3, stride, 1, groups=oup_inc, bias=False),
                nn.BatchNorm2d(oup_inc),
                # pw-linear
                nn.Conv2d(oup_inc, oup_inc, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup_inc),
                nn.ReLU(inplace=True),
            )
        else:
            self.banch1 = nn.Sequential(
                # dw
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                nn.BatchNorm2d(inp),
                # pw-linear
                nn.Conv2d(inp, oup_inc, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup_inc),
                nn.ReLU(inplace=True),
            )

            self.banch2 = nn.Sequential(
                # pw
                nn.Conv2d(inp, oup_inc, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup_inc),
                nn.ReLU(inplace=True),
                # dw
                nn.Conv2d(oup_inc, oup_inc, 3, stride, 1, groups=oup_inc, bias=False),
                nn.BatchNorm2d(oup_inc),
                # pw-linear
                nn.Conv2d(oup_inc, oup_inc, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup_inc),
                nn.ReLU(inplace=True),
            )

    @staticmethod
    def _concat(x, out):
        # concatenate along channel axis
        return torch.cat((x, out), 1)

    def forward(self, x):
        if 1 == self.benchmodel:
            x1 = x[:, :(x.shape[1] // 2), :, :]
            x2 = x[:, (x.shape[1] // 2):, :, :]
            out = self._concat(x1, self.banch2(x2))
        elif 2 == self.benchmodel:
            out = self._concat(self.banch1(x), self.banch2(x))
        else:
            return

        return channel_shuffle(out, 2)


class ShuffleNetV2(nn.Module):
    def __init__(self, n_class=1000, input_size=224, width_mult=1.):
        super(ShuffleNetV2, self).__init__()

        assert input_size % 32 == 0

        self.stage_repeats = [4, 8, 4]
        # index 0 is invalid and should never be called.
        # only used for indexing convenience.
        if width_mult == 0.5:
            self.stage_out_channels = [-1, 24, 48, 96, 192, 1024]
        elif width_mult == 1.0:
            self.stage_out_channels = [-1, 24, 116, 232, 464, 1024]
        elif width_mult == 1.5:
            self.stage_out_channels = [-1, 24, 176, 352, 704, 1024]
        elif width_mult == 2.0:
            self.stage_out_channels = [-1, 24, 224, 488, 976, 2048]
        else:
            raise ValueError(
                """{} groups is not supported for
                       1x1 Grouped Convolutions""".format(self.groups))

        # building first layer
        input_channel = self.stage_out_channels[1]
        self.conv1 = self._conv_bn(3, input_channel, 2)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.features = []
        # building inverted residual blocks
        for idxstage in range(len(self.stage_repeats)):
            numrepeat = self.stage_repeats[idxstage]
            output_channel = self.stage_out_channels[idxstage + 2]
            for i in range(numrepeat):
                if i == 0:
                    # inp, oup, stride, benchmodel):
                    self.features.append(InvertedResidual(input_channel, output_channel, 2, 2))
                else:
                    self.features.append(InvertedResidual(input_channel, output_channel, 1, 1))
                input_channel = output_channel

        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)

        # building last several layers
        self.conv_last = self._conv_1x1_bn(input_channel, self.stage_out_channels[-1])
        self.globalpool = nn.Sequential(nn.AvgPool2d(int(input_size / 32)))

        # building classifier
        self.classifier = nn.Sequential(nn.Linear(self.stage_out_channels[-1], n_class))

    def _conv_1x1_bn(self, inp, oup):
        return nn.Sequential(
            nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup),
            nn.ReLU(inplace=True)
        )

    def _conv_bn(self, inp, oup, stride):
        return nn.Sequential(
            nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
            nn.BatchNorm2d(oup),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.features(x)
        x = self.conv_last(x)
        x = self.globalpool(x)
        x = x.view(-1, self.stage_out_channels[-1])
        x = self.classifier(x)
        return x


def main():
    x = torch.randn(1, 3, 224, 224)
    net = ShuffleNetV2()
    print(net)

if __name__ == '__main__':
    main()

1.38.15.参考博文

https://blog.csdn.net/weixin_44388679/article/details/103059143
http://ishero.net/CNN%E7%B3%BB%E5%88%97%E4%B9%8B%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB.html

你可能感兴趣的:(#,Pytorch学习笔记,#)

【Pytorch学习笔记】模型模块09——VGG详解越轨 Pytorch学习笔记 pytorch 学习笔记深度学习人工智能 python
一、VGG核心设计原理小卷积核堆叠用多层3×3卷积替代大卷积核（如5×5/7×7）数学原理：2层3×3卷积感受野等效于5×5：RFout=(RFin−1)×stride+KRF_{out}=(RF_{in}-1)\timesstride+KRFout=(RFin−1)×stride+K参数量对比：3层3×3卷积（3×(32C2)=27C23×(3^2C^2)=27C^23×(32C2)=27C2）
PyTorch学习笔记 - 损失函数 __星辰大海__ PyTorch pytorch
文章目录1.内置损失函数2.继承nn.Module自定义损失函数3.继承autograd.Function自定义损失函数3.三种不同方式实现MSE实验PyTorch除了内置损失函数，还可以自定义损失函数。我们以均方误差为例来讲解PyTorch中损失函数的使用方法。均方误差(MeanSquaredError,MSE)是预测值x=(x1,x2,...,xn)x=(x_1,x_2,...,x_n)x=(
【Pytorch学习笔记】模型模块05——Module常用函数越轨 Pytorch学习笔记 pytorch 学习笔记人工智能 python
Module常用函数设置训练和评估模式**作用：**在PyTorch中，模型有训练(training)和评估(evaluation)两种模式，它们会影响某些层的行为。主要影响的层：Dropout层：训练时随机丢弃神经元，评估时保持全部神经元BatchNorm层：训练时计算并更新统计量，评估时使用固定统计量LayerNorm层：行为在两种模式下基本一致2.设置方法#设置训练模式model.train
【Pytorch学习笔记】模型模块06——hook函数越轨 Pytorch学习笔记深度学习 pytorch 人工智能学习笔记 python 机器学习
hook函数什么是hook函数hook函数相当于插件，可以实现一些额外的功能，而又不改变主体代码。就像是把额外的功能挂在主体代码上，所有叫hook（钩子）。下面介绍Pytorch中的几种主要hook函数。torch.Tensor.register_hooktorch.Tensor.register_hook()是一个用于注册梯度钩子函数的方法。它主要用于获取和修改张量在反向传播过程中的梯度。语法格
小土堆pytorch学习笔记之神经网络基本骨架李小鱼爱喝水 pytorch pytorch 学习笔记
pytorch之神经网络基本骨架[!TIP]首先来补补一些图像处理的基础知识吧！（尊嘟是0基础了）关于图片格式高度（Height）：图像的垂直尺寸，即图像从上到下的像素数量。宽度（Width）：图像的水平尺寸，即图像从左到右的像素数量。通道（Channels）：图像的颜色信息，最常见的是RGB（红、绿、蓝）三通道。每个通道代表图像在特定颜色维度上的强度。批量处理：深度学习模型通常一次处理多个图像，
【Pytorch学习笔记】数据模块05——编写自己的Dataset 越轨 Pytorch学习笔记 pytorch 学习笔记人工智能
编写自己的Dataset通过前面的知识，大家基本了解如何整个数据模块是如何构建的，下面举个完整的例子，要编写自定义的Dataset类，需要遵循以下基本步骤：1.基本结构自定义Dataset类需要继承torch.utils.data.Dataset，并实现以下三个必要方法：init：初始化函数，通常用于加载数据集和进行必要的预处理len：返回数据集的总长度getitem：根据索引返回对应的数据样本和
pytorch学习笔记（三） shushu113 pytorch 学习笔记
pytorch学习笔记（三）一、模型保存用pathlib库中的方法来保存模型参数1）保存模型参数frompathlibimportPathMODEL_PATH=Path("models")#Path更好表示路径#parents表示当前路径是否存在多级嵌套，exist_ok表示当前文件夹存在也不影响MODEL_PATH.mkdir(parents=True,exist_ok=True)MODEL_N
【pytorch学习笔记，利用Anaconda安装pytorch和paddle深度学习环境+pycharm安装---免额外安装CUDA和cudnn】徳一 pytorch学习深度学习 pytorch 学习
学习的作者链接:link一、安装pytorch环境1.打开打开anaconda的终端后condaenvlist然后创建一个名字叫pytorch，python是3.8版本的环境condacreate-npytorchpython=3.8再次看环境condaenvlist#condaenvironments:#显示如下环境base*D:\anacondapytorchD:\anaconda\envs\
Pytorch学习笔记（十六）Image and Video - Transfer Learning for Computer Vision Tutorial nenchoumi3119 pytorch学习笔记 pytorch 学习笔记
这篇博客瞄准的是pytorch官方教程中ImageandVideo章节的TransferLearningforComputerVisionTutorial部分。官网链接：https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html完整网盘链接:https://pan.baidu.com/s/1L9PVZ-KRDGVER
Pytorch学习笔记（十一）Learning PyTorch - What is torch.nn really nenchoumi3119 pytorch学习笔记 pytorch 学习笔记
这篇博客瞄准的是pytorch官方教程中LearningPyTorch章节的Whatistorch.nnreally?部分。主要是教你如何一步一步将最原始的代码进行重构至pytorch标准的代码，如果你已经熟悉了如何使用原始代码以及pytorch标准形式构建模型，可以跳过这一篇。官网链接：https://pytorch.org/tutorials/beginner/nn_tutorial.html
【pytorch】图像数据预处理子根笔记 pytorch python 深度学习
本文是记录一些在深度学习中的预处理的一些语法和函数torchvision.transforms的图像变换[PyTorch学习笔记]2.3二十二种transforms图片数据预处理方法-知乎TORCHVISION.TRANSFORMS的图像预处理_阿巫兮兮的博客-CSDN博客PyTorch09：transforms图像变换、方法操作及自定义方法-YEY的博客|YEYBlog2D、3D中心裁剪：imp
Pytorch学习笔记（二）不牌不改【Pytorch学习】pytorch 深度学习 python
后续遇到一些函数等知识，还会进行及时的补充。tensor的创建使用pytorch中的列表创建tensortensor=torch.Tensor([[-1,1],[0,2<
PyTorch学习笔记之基础函数篇（四）熊猫Devin 深度学习之PyTorch pytorch 学习笔记
文章目录2.8torch.logspace函数讲解2.9torch.ones函数2.10torch.rand函数2.11torch.randn函数2.12torch.zeros函数2.8torch.logspace函数讲解torch.logspace函数在PyTorch中用于生成一个在对数尺度上均匀分布的张量（tensor）。这意味着张量中的元素是按照对数间隔排列的，而不是线性间隔。这对于创建在数
pytorch学习笔记（2）--Tensor ToToBe pytorch 笔记 1024程序员节
系列文章pytorch学习笔记（1）–QUICKSTARTpytorch学习笔记（2）–Tensorpytorch学习笔记（3）–数据集与数据导入pytorch学习笔记（4）–创建模型(BuildModel)pytorch学习笔记（5）–Autograd文章目录系列文章Tensor（张量）1.初始化张量2.张量的属性3.张量的操作1.类似numpy的索引和切片2.拼接3.算数操作4.单元素张量5.
PyTorch学习笔记（三）：softmax回归 FriendshipT PyTorch学习笔记 pytorch 回归深度学习 softmax
PyTorch学习笔记（三）：softmax回归softmax回归分类问题softmax回归模型单样本分类的矢量计算表达式小批量样本分类的矢量计算表达式交叉熵损失函数模型预测及评价小结Torchvision获取数据集读取小批量PyTorch从零开始实现softmax获取和读取数据初始化模型参数实现softmax运算定义模型定义损失函数定义优化算法计算分类准确率训练模型预测小结PyTorch模块实现
PyTorch学习笔记1 zt_d918
训练过程importtorch#batch_size,input_dimension,hidden_dimension,output_dimensionN,D_in,H,D_out=64,1000,100,10#模拟一个训练集x=torch.randn(N,D_in)y=torch.randn(N,D_out)#模型定义有多种方式，这里不提model#loss函数定义loss_fn=torch.n
小土堆pytorch学习笔记004 柠檬不萌只是酸i 深度学习 pytorch 学习笔记机器学习深度学习
目录1、神经网络的基本骨架-nn.Module的使用2、卷积操作实例3、神经网络-卷积层4、神经网络-最大池化的使用（1）最大池化画图理解：（2）代码实现：5、神经网络-非线性激活（1）代码实现（调用sigmoid函数）6、神经网络-线性层（1）代码7、网络搭建-小实战（1）完整代码1、神经网络的基本骨架-nn.Module的使用官网地址：pytorch里的nnimporttorchfromtor
小土堆pytorch学习笔记003 | 下载数据集dataset 及报错处理柠檬不萌只是酸i 深度学习人工智能深度学习机器学习 pytorch python
目录1、下载数据集2、展示数据集里面的内容3、DataLoader的使用例子：结果展示：1、下载数据集#数据集importtorchvisiontrain_set=torchvision.datasets.CIFAR10(root="./test10_dataset",train=True,download=True)test_set=torchvision.datasets.CIFAR10(ro
小土堆pytorch学习笔记005 | 完结，✿✿ヽ(°▽°)ノ✿ 柠檬不萌只是酸i 深度学习学习笔记 pytorch 机器学习深度学习
目录1、损失函数与反向传播2、如何在搭建的网络中使用损失函数呢？3、优化器4、现有网络模型的使用及修改例子：5、模型训练保存+读取（1）保存（2）读取6、完整的模型训练：（1）代码【model文件】：【主文件】：（2）运行截图：（3）绘图展示：（4）添加训练正确率的完整代码：（5）总结！！！：（6）使用GPU训练7、完整模型验证（1）代码（2）运行结果1、损失函数与反向传播①计算实际输出和目标之间
小土堆pytorch学习笔记002 柠檬不萌只是酸i 深度学习 pytorch 学习笔记
目录1、TensorBoard的使用（1）显示坐标：（2）显示图片：2、Transform的使用3、常见的Transforms（1）#ToTensor()（2）#Normalize()（3）#Resize()（4）#Compose()4、总结：1、TensorBoard的使用（1）显示坐标：fromtorch.utils.tensorboardimportSummaryWriterimportnu
【pytorch】pytorch学习笔记（续2）小白冲鸭 pytorch 学习笔记
p30：1.均方差（MeanSquaredError,MSE）：（1）注意区分MSE和L2范数：L2范数要开根号，而MSE不需要开根号。用torch.norm函数求MSE的时候不要忘记加上pow(2)。求导：pytorch实现自动求导：第一种方法：torch.autograd.grad()设置w需要求导有两种方法：（1）在创建w之后，用来设置w需要求导。（2）在创建w的时候，用w=torch.te
【pytorch】pytorch学习笔记（续1）小白冲鸭 pytorch 学习笔记
p22：1.加减乘除：（1）add(a,b)：等同于a+b。（2）sub(a,b)：等同于a-b。（3）mul(a,b)：等同于a*b。（4）div(a,b)：等同于a/b。a//b表示整除。2.tensor的矩阵式相乘：matmul注意区分：（1）*：表示相同位置的元素相乘；（2）.matmul：表示矩阵相乘。对于（2）矩阵的相乘，有三种方式：(1)torch.mm:只适用于二维的tensor，
【pytorch】pytorch学习笔记小白冲鸭 pytorch 学习笔记
（实践）p5：线性回归问题中损失函数为什么要使用均方误差？均方误差：即误差的平方和的平均数。p8：1.pytorch不是一个完备的语言库，而是一个对于数据的gpu加速库，所以其没有对string的内键支持，即pytorch的基本类型中不包含string。2.pytorch表示string的方法：（1）onehotencoding问题：1）两个单词之间的相关性并没有在onehot编码中得到体现；2）
【pytorch】pytorch学习笔记（续3）小白冲鸭 pytorch 学习笔记
p41：1.LeakReLU,SELU,softplus2.GPU加速：.to方法p42：不太懂p43：1.visdom，tensorbroadXp44：p45：1.如何检测过拟合？在train上表现很好，而在test上表现不好。test的目的（没有valset的时候）：防止过拟合，选取最优参数。相当于是验证集。一般选取testaccuracy最高的那点停止训练，作为最优参数。p46：1.trai
小土堆pytorch学习笔记001 柠檬不萌只是酸i 深度学习 pytorch 学习笔记
1、Pytorch环境的配置与安装。（1）建议安装：Anaconda（2）检查显卡：GPU（3）管理环境（不同版本的pytorch版本不同）：condacreate-npytorchpython=3.6（4）检测自己的电脑是否可以使用：2、pytorch编辑器的选择（1）pycharm（下载社区版）（2）jupyter（可以交互）启动本地的jupyter:3、为什么torch.cuda.is_av
Pytorch学习笔记(2) Autograd(自动求导) —— PyTorch的核心银色尘埃010
本文是Pytorch快速入门第二部分，主要学习记录，主要翻译PytorchAutograd部分教程原文autograd包是PyTorch中神经网络的核心部分。torch.autograd提供了类和函数,用来对任意标量函数进行求导。要想使用自动求导，只需要对已有的代码进行微小的改变。只需要将所有的tensor包含进Variable对象中即可。一、Tensor(张量)torch.Tensor是程序包的
pytorch学习笔记（十）満湫学习笔记
一、损失函数举个例子比如说根据Loss提供的信息知道，解答题太弱了，需要多训练训练这个模块。Loss作用：1.算实际输出和目标之间的差距2.为我们更新输出提供一定的依据（反向传播）看官方文档每个输入输出相减取平均取绝对值再取平均第一个损失函数：L1Loss（差的绝对值取平均）需要注意输入输出N=batch_size你有多少个数据第一个损失函数：MSELoss（平方差误差，平方取平均）稳妥的写法是先
pytorch学习笔记（八）満湫 pytorch 学习笔记
Sequential看看搭建了这个能不能更容易管理，CIFAR-10数据集进行看一下网络模型CIFAR-10模型123456789输入进过一次卷积，然后经过一次最大池化，尺寸变成16*16了，在经过一次卷积尺寸没变，紧接着进过了一次最大池化，变成了8*8，再经过一次卷积通道数改变32→64，再经过一次池化变成4*4，然后展平，最后输出。（1-2）根据图里面看，32×32经过卷积后的尺寸仍然是32×
pytorch学习笔记（十一）満湫 pytorch 学习笔记
优化器学习把搭建好的模型拿来训练，得到最优的参数。importtorch.optimimporttorchvisionfromtorchimportnnfromtorch.nnimportSequential,Conv2d,MaxPool2d,Flatten,Linearfromtorch.utils.dataimportDataLoaderdataset=torchvision.datasets
pytorch学习笔记（五）満湫学习笔记
关注不同的方法输入是什么类型，输出是什么类型。1.Compose主要关注初始化函数从作用内置call的调用方法两种，第一种,直接使用对象，不用使用点，直接调用的是__call__当要调用的时候直接写个Person()按住ctrl+P看看需要填啥参数。2.Totensor的使用输出结果如下3.Normalize归一化输入必须要tensor的均值，标准差，然后看图片的维度计算4.Resize给定的是一
ASM系列五利用TreeApi 解析生成Class lijingyao8206 ASM 字节码动态生成 ClassNode TreeAPI
前面CoreApi的介绍部分基本涵盖了ASMCore包下面的主要API及功能，其中还有一部分关于MetaData的解析和生成就不再赘述。这篇开始介绍ASM另一部分主要的Api。TreeApi。这一部分源码是关联的asm-tree-5.0.4的版本。在介绍前，先要知道一点， Tree工程的接口基本可以完
链表树——复合数据结构应用实例 bardo 数据结构树型结构表结构设计链表菜单排序
我们清楚：数据库设计中，表结构设计的好坏，直接影响程序的复杂度。所以，本文就无限级分类（目录）树与链表的复合在表设计中的应用进行探讨。当然，什么是树，什么是链表，这里不作介绍。有兴趣可以去看相关的教材。需求简介：经常遇到这样的需求，我们希望能将保存在数据库中的树结构能够按确定的顺序读出来。比如，多级菜单、组织结构、商品分类。更具体的，我们希望某个二级菜单在这一级别中就是第一个。虽然它是最后
为啥要用位运算代替取模呢 chenchao051 位运算哈希汇编
在hash中查找key的时候，经常会发现用&取代%，先看两段代码吧， JDK6中的HashMap中的indexFor方法： /** * Returns index for hash code h. */ static int indexFor(int h, int length) {
最近的情况麦田的设计者生活感悟计划软考想
今天是2015年4月27号整理一下最近的思绪以及要完成的任务 1、最近在驾校科目二练车，每周四天，练三周。其实做什么都要用心，追求合理的途径解决。为
PHP去掉字符串中最后一个字符的方法 IT独行者 PHP 字符串
今天在PHP项目开发中遇到一个需求，去掉字符串中的最后一个字符原字符串1,2,3,4,5,6, 去掉最后一个字符","，最终结果为1,2,3,4,5,6 代码如下： $str = "1,2,3,4,5,6,"; $newstr = substr($str,0,strlen($str)-1); echo $newstr;
hadoop在linux上单机安装过程 _wy_ linux hadoop
1、安装JDK jdk版本最好是1.6以上，可以使用执行命令java -version查看当前JAVA版本号，如果报命令不存在或版本比较低，则需要安装一个高版本的JDK，并在/etc/profile的文件末尾，根据本机JDK实际的安装位置加上以下几行： export JAVA_HOME=/usr/java/jdk1.7.0_25
JAVA进阶----分布式事务的一种简单处理方法无量多系统交互分布式事务
每个方法都是原子操作：提供第三方服务的系统，要同时提供执行方法和对应的回滚方法 A系统调用B,C,D系统完成分布式事务 =========执行开始======== A.aa(); try { B.bb(); } catch(Exception e) { A.rollbackAa(); } try { C.cc(); } catch(Excep
安墨移动广告：移动DSP厚积薄发引领未来广告业发展命脉矮蛋蛋 hadoop 互联网
　　“谁掌握了强大的DSP技术，谁将引领未来的广告行业发展命脉。”2014年，移动广告行业的热点非移动DSP莫属。各个圈子都在纷纷谈论，认为移动DSP是行业突破点，一时间许多移动广告联盟风起云涌，竞相推出专属移动DSP产品。　　到底什么是移动DSP呢? 　　DSP(Demand-SidePlatform)，就是需求方平台，为解决广告主投放的各种需求，真正实现人群定位的精准广
myelipse设置 alafqq IP
在一个项目的完整的生命周期中，其维护费用，往往是其开发费用的数倍。因此项目的可维护性、可复用性是衡量一个项目好坏的关键。而注释则是可维护性中必不可少的一环。注释模板导入步骤安装方法：打开eclipse/myeclipse 选择 window-->Preferences-->JAVA-->Code-->Code
java数组百合不是茶 java数组
java数组的声明创建初始化； java支持C语言数组中的每个数都有唯一的一个下标一维数组的定义声明： int[] a = new int[3];声明数组中有三个数int[3] int[] a 中有三个数，下标从0开始，可以同过for来遍历数组中的数
javascript读取表单数据 bijian1013 JavaScript
利用javascript读取表单数据，可以利用以下三种方法获取： 1、通过表单ID属性：var a = document.getElementByIdx_x_x("id"); 2、通过表单名称属性：var b = document.getElementsByName("name"); 3、直接通过表单名字获取：var c = form.content.
探索JUnit4扩展：使用Theory bijian1013 java JUnit Theory
理论机制（Theory）一.为什么要引用理论机制（Theory）当今软件开发中，测试驱动开发（TDD — Test-driven development）越发流行。为什么 TDD 会如此流行呢？因为它确实拥有很多优点，它允许开发人员通过简单的例子来指定和表明他们代码的行为意图。 TDD 的优点： &nb
[Spring Data Mongo一]Spring Mongo Template操作MongoDB bit1129 template
什么是Spring Data Mongo Spring Data MongoDB项目对访问MongoDB的Java客户端API进行了封装，这种封装类似于Spring封装Hibernate和JDBC而提供的HibernateTemplate和JDBCTemplate，主要能力包括 1. 封装客户端跟MongoDB的链接管理 2. 文档-对象映射，通过注解:@Document(collectio
【Kafka八】Zookeeper上关于Kafka的配置信息 bit1129 zookeeper
问题： 1. Kafka的哪些信息记录在Zookeeper中 2. Consumer Group消费的每个Partition的Offset信息存放在什么位置 3. Topic的每个Partition存放在哪个Broker上的信息存放在哪里 4. Producer跟Zookeeper究竟有没有关系？没有关系！！！ //consumers、config、brokers、cont
java OOM内存异常的四种类型及异常与解决方案 ronin47 java OOM 内存异常
　OOM异常的四种类型：　　　　　一：　StackOverflowError ：通常因为递归函数引起（死递归，递归太深）。-Xss 128k 一般够用。　二：　out Of memory: PermGen Space：通常是动态类大多，比如web 服务器自动更新部署时引起。-Xmx
java-实现链表反转-递归和非递归实现 bylijinnan java
20120422更新：对链表中部分节点进行反转操作，这些节点相隔k个： 0->1->2->3->4->5->6->7->8->9 k=2 8->1->6->3->4->5->2->7->0->9 注意1 3 5 7 9 位置是不变的。解法：将链表拆成两部分： a.0-&
Netty源码学习-DelimiterBasedFrameDecoder bylijinnan java netty
看DelimiterBasedFrameDecoder的API，有举例：接收到的ChannelBuffer如下： +--------------+ | ABC\nDEF\r\n | +--------------+ 经过DelimiterBasedFrameDecoder(Delimiters.lineDelimiter())之后，得到： +-----+----
linux的一些命令 -查看cc攻击-网口ip统计等 hotsunshine linux
Linux判断CC攻击命令详解 2011年12月23日 ⁄ 安全 ⁄ 暂无评论查看所有80端口的连接数 netstat -nat|grep -i '80'|wc -l 对连接的IP按连接数量进行排序 netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n 查看TCP连接状态 n
Spring获取SessionFactory ctrain sessionFactory
String sql = "select sysdate from dual"; WebApplicationContext wac = ContextLoader.getCurrentWebApplicationContext(); String[] names = wac.getBeanDefinitionNames(); for(int i=0; i&
Hive几种导出数据方式 daizj hive 数据导出
Hive几种导出数据方式 1.拷贝文件如果数据文件恰好是用户需要的格式，那么只需要拷贝文件或文件夹就可以。 hadoop fs –cp source_path target_path 2.导出到本地文件系统 --不能使用insert into local directory来导出数据，会报错 --只能使用
编程之美 dcj3sjt126com 编程 PHP 重构
我个人的 PHP 编程经验中，递归调用常常与静态变量使用。静态变量的含义可以参考 PHP 手册。希望下面的代码，会更有利于对递归以及静态变量的理解 header("Content-type: text/plain"); function static_function () { static $i = 0; if ($i++ < 1
Android保存用户名和密码 dcj3sjt126com android
转自：http://www.2cto.com/kf/201401/272336.html 我们不管在开发一个项目或者使用别人的项目，都有用户登录功能，为了让用户的体验效果更好，我们通常会做一个功能，叫做保存用户，这样做的目地就是为了让用户下一次再使用该程序不会重新输入用户名和密码，这里我使用3种方式来存储用户名和密码 1、通过普通的txt文本存储 2、通过properties属性文件进行存
Oracle 复习笔记之同义词 eksliang Oracle 同义词 Oracle synonym
转载请出自出处：http://eksliang.iteye.com/blog/2098861 1.什么是同义词同义词是现有模式对象的一个别名。概念性的东西，什么是模式呢？创建一个用户，就相应的创建了一个模式。模式是指数据库对象，是对用户所创建的数据对象的总称。模式对象包括表、视图、索引、同义词、序列、过
Ajax案例 gongmeitao Ajax jsp
数据库采用Sql Server2005 项目名称为:Ajax_Demo 1.com.demo.conn包 package com.demo.conn; import java.sql.Connection;import java.sql.DriverManager;import java.sql.SQLException; //获取数据库连接的类public class DBConnec
ASP.NET中Request.RawUrl、Request.Url的区别 hvt .net Web C#asp.net hovertree
如果访问的地址是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree%3C&n=myslider#zonemenu那么Request.Url.ToString() 的值是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree<&
SVG 教程（七）SVG 实例，SVG 参考手册天梯梦 svg
SVG 实例在线实例下面的例子是把SVG代码直接嵌入到HTML代码中。谷歌Chrome，火狐，Internet Explorer9，和Safari都支持。注意：下面的例子将不会在Opera运行，即使Opera支持SVG - 它也不支持SVG在HTML代码中直接使用。 SVG 实例 SVG基本形状一个圆矩形不透明矩形一个矩形不透明2 一个带圆角矩
事务管理 luyulong java spring 编程事务
事物管理 spring事物的好处为不同的事物API提供了一致的编程模型支持声明式事务管理提供比大多数事务API更简单更易于使用的编程式事务管理API 整合spring的各种数据访问抽象 TransactionDefinition 定义了事务策略 int getIsolationLevel()得到当前事务的隔离级别 READ_COMMITTED
基础数据结构和算法十一：Red-black binary search tree sunwinner Algorithm Red-black
The insertion algorithm for 2-3 trees just described is not difficult to understand; now, we will see that it is also not difficult to implement. We will consider a simple representation known
centos同步时间 stunizhengjia linux 集群同步时间
做了集群，时间的同步就显得非常必要了。以下是查到的如何做时间同步。在CentOS 5不再区分客户端和服务器，只要配置了NTP，它就会提供NTP服务。 1)确认已经ntp程序包： # yum install ntp 2)配置时间源（默认就行，不需要修改） # vi /etc/ntp.conf server pool.ntp.o
ITeye 9月技术图书有奖试读获奖名单公布 ITeye管理员 ITeye
ITeye携手博文视点举办的9月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 9月试读活动回顾：http://webmaster.iteye.com/blog/2118112本次技术图书试读活动的优秀奖获奖名单及相应作品如下（优秀文章有很多，但名额有限，没获奖并不代表不优秀）：《NFC：Arduino、Andro