游客26024

CV+Deep Learning——网络架构Pytorch复现系列——classification(二：ResNeXt，GoogLeNet，MobileNet)

上一话

CV+Deep Learning——网络架构Pytorch复现系列——classification(一：LeNet5，VGG，AlexNet，ResNet)https://blog.csdn.net/XiaoyYidiaodiao/article/details/125505058?spm=1001.2014.3001.5501

引言此系列重点在于复现计算机视觉（分类、目标检测、语义分割）中深度学习各个经典的网络模型，以便初学者使用（浅入深出）！

代码都运行无误！！

首先复现深度学习的经典分类网络模块，其中专门做目标检测的Backbone(10.，11.)但是它的主要目的是用来提取特征所以也放在这里，有：

1.LeNet5(√，上一话)

2.VGG(√，上一话)

3.AlexNet(√，上一话)

4.ResNet(√，上一话)

5.ResNeXt(√)

6.GoogLeNet(√)

7.MobileNet(√)

8.ShuffleNet

9.EfficientNet

10.VovNet

11.DarkNet

...

注意：

完整代码上传至我的github，其次编译环境设置、分类数据集使用ImageNet或CIFAR10、项目文件结构上一章博客都讲过如何使用，其次由于GoogLeNet使用了全连接层不能修改图像的size，所以这个网络架构在图像预处理时图像的size就必须固定

https://github.com/HanXiaoyiGitHub/Simple-CV-Pytorch-masterhttps://github.com/HanXiaoyiGitHub/Simple-CV-Pytorch-master

5.ResNeXt[5]

图 1.

如图 1. 所述ResNeXt与ResNet（在上一章已经讲过啦！）不同之处在于block。并且想清楚ResNeXt的网络架构，必须知道什么是分组卷积（Group Conv，其实AlexNet也有）如下图2.1). 为普通卷积，2. 2). 为分组卷积

图 2.1).

图 2.2).

这样做的最大好处是参数下降，节约了计算成本。

下图 2. 3).中的(a)、（b)、（c）三个block等价，也就是与图 1.中右边的block等价。理解看图 2. 4).

图 2. 3).

图 2. 4).

复现整个ResNeXt可从ResNeXt-50(32x4d)开始理解如下图 2. 5). ，32 表示每个block中的卷积的组分为32组，4 表示conv2中每个group中采用4个卷积核(比如128 = 32 * 4）。没复现18-layer与34-layer是因为其block低于3层，这样做不会使得错误率降低，没有意义。

图 2. 5).

对比ResNet其代码修改的地方便是 Bottleneck与ResNet分别都加了groups，width_per_group。

Bottleneck

# 50-layer, 101-layer, 152-layer
class Bottleneck(nn.Module):
    """
    self.conv1(kernel_size=1,stride=2)
    self.conv2(kernel_size=3,stride=1)

    to

    self.conv1(kernel_size=1,stride=1)
    self.conv2(kernel_size=3,stride=2)

    acc: up 0.5%
    """
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channels * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=width, kernel_size=1,
                               stride=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width)

        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, kernel_size=3,
                               groups=groups, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)

        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channels * self.expansion, kernel_size=1,
                               stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU()
        self.downsample = downsample

    @autocast()
    def forward(self, x):
        identity = x

        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

ResNet

class resnet(nn.Module):
    def __init__(self, block, blocks_num, num_classes=1000,
                 include_top=True, groups=1, width_per_group=64):
        super(resnet, self).__init__()
        self.include_top = include_top
        self.in_channels = 64
        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channels, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.flatten = nn.Flatten()
            self.fc = nn.Linear(512 * block.expansion, num_classes)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channels, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(in_channels=self.in_channels, out_channels=channels * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channels * block.expansion))
        layers = []
        layers.append(block(in_channels=self.in_channels, out_channels=channels, downsample=downsample,
                            stride=stride, groups=self.groups, width_per_group=self.width_per_group))
        self.in_channels = channels * block.expansion

        for _ in range(1, block_num):
            layers.append(
                block(in_channels=self.in_channels, out_channels=channels,
                      groups=self.groups, width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    @autocast()
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = self.flatten(x)
            x = self.fc(x)
        return x

完整代码

import torch
import torch.nn as nn
from utils.path import CheckPoints
from torch.cuda.amp import autocast

__all__ = [
    'resnet18',
    'resnet34',
    'resnet50',
    'resnet101',
    'resnet152',
    'resnext50_32x4d',
    'resnext101_32x8d'
]
# if your network is limited, you can download them, and put them into CheckPoints(my Project:Simple-CV-Pytorch-master/checkpoints/).
model_urls = {
    # 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet18': '{}/resnet18-5c106cde.pth'.format(CheckPoints),
    # 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet34': '{}/resnet34-333f7ec4.pth'.format(CheckPoints),
    # 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet50': '{}/resnet50-19c8e357.pth'.format(CheckPoints),
    # 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet101': '{}/resnet101-5d3b4d8f.pth'.format(CheckPoints),
    # 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
    'resnet152': '{}/resnet152-b121ed2d.pth'.format(CheckPoints),
    # 'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
    'resnext50_32x4d': '{}/resnext50_32x4d-7cdf4587.pth'.format(CheckPoints),
    # 'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
    'resnext101_32x8d': '{}/resnext101_32x8d-8ba56ff5.pth'.format(CheckPoints)
}


def resnet_(arch, block, block_num, num_classes, pretrained, include_top, **kwargs):
    model = resnet(block=block, blocks_num=block_num, num_classes=num_classes, include_top=include_top, **kwargs)
    # if you're training for the first time, no pretrained is required!
    if pretrained:
        # if you want to use cpu, you should modify map_loaction=torch.device("cpu")
        pretrained_models = torch.load(model_urls[arch], map_location=torch.device("cuda:0"))
        # transfer learning
        # if you want to train your own dataset
        # del pretrained_models['module.classifier.bias']
        model.load_state_dict(pretrained_models, strict=False)
    return model


# 18-layer, 34-layer
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(BasicBlock, self).__init__()
        if groups != 1 or width_per_group != 64:
            raise ValueError("BasicBlock only supports groups=1 and base_width=64")
        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU()

        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=3, stride=1, padding=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    @autocast()
    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


# 50-layer, 101-layer, 152-layer
class Bottleneck(nn.Module):
    """
    self.conv1(kernel_size=1,stride=2)
    self.conv2(kernel_size=3,stride=1)

    to

    self.conv1(kernel_size=1,stride=1)
    self.conv2(kernel_size=3,stride=2)

    acc: up 0.5%
    """
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channels * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=width, kernel_size=1,
                               stride=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width)

        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, kernel_size=3,
                               groups=groups, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)

        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channels * self.expansion, kernel_size=1,
                               stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU()
        self.downsample = downsample

    @autocast()
    def forward(self, x):
        identity = x

        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class resnet(nn.Module):
    def __init__(self, block, blocks_num, num_classes=1000,
                 include_top=True, groups=1, width_per_group=64):
        super(resnet, self).__init__()
        self.include_top = include_top
        self.in_channels = 64
        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channels, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.flatten = nn.Flatten()
            self.fc = nn.Linear(512 * block.expansion, num_classes)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channels, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(in_channels=self.in_channels, out_channels=channels * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channels * block.expansion))
        layers = []
        layers.append(block(in_channels=self.in_channels, out_channels=channels, downsample=downsample,
                            stride=stride, groups=self.groups, width_per_group=self.width_per_group))
        self.in_channels = channels * block.expansion

        for _ in range(1, block_num):
            layers.append(
                block(in_channels=self.in_channels, out_channels=channels,
                      groups=self.groups, width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    @autocast()
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = self.flatten(x)
            x = self.fc(x)
        return x


def resnet18(num_classes=1000, pretrained=False, include_top=True):
    return resnet_('resnet18', BasicBlock, [2, 2, 2, 2], num_classes, pretrained, include_top)


def resnet34(num_classes=1000, pretrained=False, include_top=True):
    return resnet_('resnet34', BasicBlock, [3, 4, 6, 3], num_classes, pretrained, include_top)


def resnet50(num_classes=1000, pretrained=False, include_top=True):
    return resnet_('resnet50', Bottleneck, [3, 4, 6, 3], num_classes, pretrained, include_top)


def resnet101(num_classes=1000, pretrained=False, include_top=True):
    return resnet_('resnet101', Bottleneck, [3, 4, 23, 3], num_classes, pretrained, include_top)


def resnet152(num_classes=1000, pretrained=False, include_top=True):
    return resnet_('resnet152', Bottleneck, [3, 8, 36, 3], num_classes, pretrained, include_top)


def resnext50_32x4d(num_classes=1000, pretrained=False, include_top=True):
    groups = 32
    width_per_group = 4
    return resnet_('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],
                   num_classes=num_classes, pretrained=pretrained,
                   groups=groups, width_per_group=width_per_group,
                   include_top=include_top)


def resnext101_32x8d(num_classes=1000, pretrianed=False, include_top=True):
    groups = 32
    width_per_group = 8
    return resnet_('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
                   num_classes=num_classes, pretrianed=pretrianed,
                   groups=groups, width_per_group=width_per_group,
                   include_top=include_top)

6.GoogLeNet(size: 224 * 224 * 3)[6]

谷歌为了致敬LeNet，便取名为GoogLeNet。但是GoogLeNet的使用没有同时期的VGG广泛，这是因为GoogLeNet多了两个辅助分类器，使得其的可拓展性没那么强。

图 3.

如图 3. 所示还原GoogLeNet。

首先复现每个小的block，而每个小的block都是由一个卷积和一个ReLU激活，将其整合为class BasicConv2d

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, **kwargs)
        self.relu = nn.ReLU()

    @autocast()
    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

接着复现GoogLeNet中最重要的Inception，将图 3.（拆分成下图的图 4.）与图5. 还有代码对照着来看，则

         代码                                             图 2.                 图 3.
self.branch1 (ch1x1)                        ->         #1x1               -> 1x1 convolutions 
self.branch2 (ch3x3red, ch3x3)              ->         #3x3reduce, #3x3   -> 1x1 convolutions, 3x3 convolutions
self.branch3 (ch5x5red, ch5x5)              ->         #5x5reduce, #5x5   -> 1x1 convolutions, 5x5 convolutions
self.branch4 (1x1conv[pool proj] maxpool)   ->         pool proj          -> 1x1 convolutions, max pooling
其中
ch1x1      表示 在branch1中 1x1 conv 输出的channels
ch3x3read  表示 在branch2中 1x1 conv 输出的channels, ch3x3  表示 在branch3中 3x3 conv 输出的channels
ch5x5red   表示 在branch3中 1x1 conv 输出的channels, ch5x5  表示 在branch5中 5x5 conv 输出的channels
poolproj   表示 在branch4中 1x1 conv 输出的channels,  之后的max pool 输出的channels 与 之输入的channels（来自于这个1x1 conv）一样

class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        self.branch1 = BasicConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1)

        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels=in_channels, out_channels=ch3x3red, kernel_size=1),
            BasicConv2d(in_channels=ch3x3red, out_channels=ch3x3, kernel_size=3, padding=1)
        )

        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels=in_channels, out_channels=ch5x5red, kernel_size=1),
            BasicConv2d(in_channels=ch5x5red, out_channels=ch5x5, kernel_size=5, padding=2)
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels=in_channels, out_channels=pool_proj, kernel_size=1)
        )

    @autocast()
    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)

图 4.

图 5.

之后，复现辅助分类器，aux1与aux2，这两个辅助分类器的网络架构都是一样的如图 6.（经过一个平均池化层，一个BasicConv2d，两个全连接层），仅仅是只是输入的channels不同，其输入的channels分别为512（aux1）,528（aux2）

图 6.

class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        # aux1 input: 14 * 14 * 512 -> 4 * 4 * 512
        # aux2 input: 14 * 14 * 528 -> 4 * 4 * 528
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        # aux1 4 * 4 * 512 -> 4 * 4 * 128
        # aux2 4 * 4 * 528 -> 4 * 4 * 128
        self.conv = BasicConv2d(in_channels=in_channels, out_channels=128, kernel_size=1)

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    @autocast()
    def forward(self, x):
        x = self.averagePool(x)
        x = self.conv(x)
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, 0.5, training=self.training)
        x = self.fc2(x)
        return x

最后复现整个GoogLeNet，如图 7.

图 7.

先看conv1与maxpool1结合图 3. 与图 7.

init
 ...      
        # input 224 * 224 * 3 -> 112 * 112 * 64
        self.conv1 = BasicConv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        # 112 * 112 * 64 -> 56 * 56 * 64
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
...

forward
...
        x = self.conv1(x)
        x = self.maxpool1(x)
...

由于LRN加入的效果不明显，影响不是很大，所以本次复现选择忽略LRN，如果想使用可调用

import torch.nn.functional as F
F.local_response_norm()

再看conv2 、conv3 与maxpool2结合图 3. 与图 7.

init
...
        # 56 * 56 * 64 -> 56 * 56 * 64
        self.conv2 = BasicConv2d(in_channels=64, out_channels=64, kernel_size=1)
        # 56 * 56 * 64 -> 56 * 56 * 192
        self.conv3 = BasicConv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1)
        # 56 * 56 * 192 -> 28 * 28 * 192
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

...

forward
...
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool2(x)
...

接着复现Inception 3a，3b，maxpool3 结合图 3. 与图 7.

init
...
        # 28 * 28 * 192 -> 28 * 28 * 256
        self.inception3a = Inception(in_channels=192, ch1x1=64, ch3x3red=96, ch3x3=128, ch5x5red=16, ch5x5=32,
                                     pool_proj=32)
        # 28 * 28 * 256 -> 28 * 28 * 480
        self.inception3b = Inception(in_channels=256, ch1x1=128, ch3x3red=128, ch3x3=192, ch5x5red=32, ch5x5=96,
                                     pool_proj=64)
        # 28 * 28 * 480 -> 14 * 14 * 480
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
...

forward
...
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
...

之后复现Inception4a, 4b，4c，4d，4e，并且在4b（4a输出）与4e（4d输出）上分别插入aux1与aux2,结合图 3.与图 7.

init
...
        # 14 * 14 * 480 -> 512 * 14 * 14
        self.inception4a = Inception(in_channels=480, ch1x1=192, ch3x3red=96, ch3x3=208, ch5x5red=16, ch5x5=48,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4b = Inception(in_channels=512, ch1x1=160, ch3x3red=112, ch3x3=224, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4c = Inception(in_channels=512, ch1x1=128, ch3x3red=128, ch3x3=256, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 528 * 14 * 14
        self.inception4d = Inception(in_channels=512, ch1x1=112, ch3x3red=144, ch3x3=288, ch5x5red=32, ch5x5=64,
                                     pool_proj=64)
        # 14 * 14 * 528 -> 14 * 14 * 832
        self.inception4e = Inception(in_channels=528, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 14 * 14 * 832 -> 7 * 7 * 832
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        if self.aux_logits:
            self.aux1 = InceptionAux(in_channels=512, num_classes=num_classes)
            self.aux2 = InceptionAux(in_channels=528, num_classes=num_classes)
...

forward
...
        x = self.inception4a(x)

        if self.training and self.aux_logits:
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)

        if self.training and self.aux_logits:
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        x = self.maxpool4(x)
...

复现Inception5a，5b，结合图 3.与图 7.

init
...
        # 7 * 7 * 832 -> 7 * 7 * 832
        self.inception5a = Inception(in_channels=832, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 7 * 7 * 832 -> 7 * 7 * 1024
        self.inception5b = Inception(in_channels=832, ch1x1=384, ch3x3red=192, ch3x3=384, ch5x5red=48, ch5x5=128,
                                     pool_proj=128)
...

forward
...
        x = self.inception5a(x)
        x = self.inception5b(x)
...

之后复现平均池化层，dropout，全连接层，结合图 3.与图 7.

init
...
        # 1 * 1 * 1024
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
...

forward
...
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
...

class googlenet代码

class googlenet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(googlenet, self).__init__()
        self.aux_logits = aux_logits

        # input 224 * 224 * 3 -> 112 * 112 * 64
        self.conv1 = BasicConv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        # 112 * 112 * 64 -> 56 * 56 * 64
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 56 * 56 * 64 -> 56 * 56 * 64
        self.conv2 = BasicConv2d(in_channels=64, out_channels=64, kernel_size=1)
        # 56 * 56 * 64 -> 56 * 56 * 192
        self.conv3 = BasicConv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1)
        # 56 * 56 * 192 -> 28 * 28 * 192
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 28 * 28 * 192 -> 28 * 28 * 256
        self.inception3a = Inception(in_channels=192, ch1x1=64, ch3x3red=96, ch3x3=128, ch5x5red=16, ch5x5=32,
                                     pool_proj=32)
        # 28 * 28 * 256 -> 28 * 28 * 480
        self.inception3b = Inception(in_channels=256, ch1x1=128, ch3x3red=128, ch3x3=192, ch5x5red=32, ch5x5=96,
                                     pool_proj=64)
        # 28 * 28 * 480 -> 14 * 14 * 480
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        # 14 * 14 * 480 -> 512 * 14 * 14
        self.inception4a = Inception(in_channels=480, ch1x1=192, ch3x3red=96, ch3x3=208, ch5x5red=16, ch5x5=48,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4b = Inception(in_channels=512, ch1x1=160, ch3x3red=112, ch3x3=224, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4c = Inception(in_channels=512, ch1x1=128, ch3x3red=128, ch3x3=256, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 528 * 14 * 14
        self.inception4d = Inception(in_channels=512, ch1x1=112, ch3x3red=144, ch3x3=288, ch5x5red=32, ch5x5=64,
                                     pool_proj=64)
        # 14 * 14 * 528 -> 14 * 14 * 832
        self.inception4e = Inception(in_channels=528, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 14 * 14 * 832 -> 7 * 7 * 832
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        # 7 * 7 * 832 -> 7 * 7 * 832
        self.inception5a = Inception(in_channels=832, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 7 * 7 * 832 -> 7 * 7 * 1024
        self.inception5b = Inception(in_channels=832, ch1x1=384, ch3x3red=192, ch3x3=384, ch5x5red=48, ch5x5=128,
                                     pool_proj=128)
        if self.aux_logits:
            self.aux1 = InceptionAux(in_channels=512, num_classes=num_classes)
            self.aux2 = InceptionAux(in_channels=528, num_classes=num_classes)
        # 1 * 1 * 1024
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()

    @autocast()
    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool1(x)

        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool2(x)

        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)

        x = self.inception4a(x)

        if self.training and self.aux_logits:
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)

        if self.training and self.aux_logits:
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        x = self.maxpool4(x)

        x = self.inception5a(x)
        x = self.inception5b(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        if self.training and self.aux_logits:
            return x, aux2, aux1
        return x

完整代码

import torch
import torch.nn as nn
import torch.nn.functional as F
from utils.path import CheckPoints
from torch.cuda.amp import autocast

__all__ = ['googlenet']
models_urls = {
    #  'googlenet': 'https://download.pytorch.org/models/googlenet-1378be20.pth',
    'googlenet': '{}/googlenet-1378be20.pth'.format(CheckPoints),
}


def GoogLeNet(num_classes, pretrained, aux_logits=True, init_weights=True, **kwargs):
    model = googlenet(num_classes=num_classes, aux_logits=aux_logits, init_weights=init_weights, **kwargs)
    if pretrained:
        # if you want to use cpu, you should modify map_loaction=torch.device("cpu")
        pretrained_models = torch.load(models_urls['googlenet'], map_location=torch.device("cuda:0"))
        # transfer learning
        # I modify GoogLeNet
        # Inception.branch3.1.conv(kernel_size=3) to Inception.branch3.1.conv(kernel_size=5)
        del pretrained_models['inception3a.branch3.1.conv.weight']
        del pretrained_models['inception3b.branch3.1.conv.weight']
        del pretrained_models['inception4a.branch3.1.conv.weight']
        del pretrained_models['inception4b.branch3.1.conv.weight']
        del pretrained_models['inception4c.branch3.1.conv.weight']
        del pretrained_models['inception4d.branch3.1.conv.weight']
        del pretrained_models['inception4e.branch3.1.conv.weight']
        del pretrained_models['inception5a.branch3.1.conv.weight']
        del pretrained_models['inception5b.branch3.1.conv.weight']
        model.load_state_dict(pretrained_models, strict=False)
    return model


class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, **kwargs)
        self.relu = nn.ReLU()

    @autocast()
    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x


class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        self.branch1 = BasicConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1)

        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels=in_channels, out_channels=ch3x3red, kernel_size=1),
            BasicConv2d(in_channels=ch3x3red, out_channels=ch3x3, kernel_size=3, padding=1)
        )

        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels=in_channels, out_channels=ch5x5red, kernel_size=1),
            BasicConv2d(in_channels=ch5x5red, out_channels=ch5x5, kernel_size=5, padding=2)
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels=in_channels, out_channels=pool_proj, kernel_size=1)
        )

    @autocast()
    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)


class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        # aux1 input: 14 * 14 * 512 -> 4 * 4 * 512
        # aux2 input: 14 * 14 * 528 -> 4 * 4 * 528
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        # aux1 4 * 4 * 512 -> 4 * 4 * 128
        # aux2 4 * 4 * 528 -> 4 * 4 * 128
        self.conv = BasicConv2d(in_channels=in_channels, out_channels=128, kernel_size=1)

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    @autocast()
    def forward(self, x):
        x = self.averagePool(x)
        x = self.conv(x)
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, 0.5, training=self.training)
        x = self.fc2(x)
        return x


class googlenet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(googlenet, self).__init__()
        self.aux_logits = aux_logits

        # input 224 * 224 * 3 -> 112 * 112 * 64
        self.conv1 = BasicConv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        # 112 * 112 * 64 -> 56 * 56 * 64
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 56 * 56 * 64 -> 56 * 56 * 64
        self.conv2 = BasicConv2d(in_channels=64, out_channels=64, kernel_size=1)
        # 56 * 56 * 64 -> 56 * 56 * 192
        self.conv3 = BasicConv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1)
        # 56 * 56 * 192 -> 28 * 28 * 192
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 28 * 28 * 192 -> 28 * 28 * 256
        self.inception3a = Inception(in_channels=192, ch1x1=64, ch3x3red=96, ch3x3=128, ch5x5red=16, ch5x5=32,
                                     pool_proj=32)
        # 28 * 28 * 256 -> 28 * 28 * 480
        self.inception3b = Inception(in_channels=256, ch1x1=128, ch3x3red=128, ch3x3=192, ch5x5red=32, ch5x5=96,
                                     pool_proj=64)
        # 28 * 28 * 480 -> 14 * 14 * 480
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        # 14 * 14 * 480 -> 512 * 14 * 14
        self.inception4a = Inception(in_channels=480, ch1x1=192, ch3x3red=96, ch3x3=208, ch5x5red=16, ch5x5=48,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4b = Inception(in_channels=512, ch1x1=160, ch3x3red=112, ch3x3=224, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 512 * 14 * 14
        self.inception4c = Inception(in_channels=512, ch1x1=128, ch3x3red=128, ch3x3=256, ch5x5red=24, ch5x5=64,
                                     pool_proj=64)
        # 512 * 14 * 14 -> 528 * 14 * 14
        self.inception4d = Inception(in_channels=512, ch1x1=112, ch3x3red=144, ch3x3=288, ch5x5red=32, ch5x5=64,
                                     pool_proj=64)
        # 14 * 14 * 528 -> 14 * 14 * 832
        self.inception4e = Inception(in_channels=528, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 14 * 14 * 832 -> 7 * 7 * 832
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        # 7 * 7 * 832 -> 7 * 7 * 832
        self.inception5a = Inception(in_channels=832, ch1x1=256, ch3x3red=160, ch3x3=320, ch5x5red=32, ch5x5=128,
                                     pool_proj=128)
        # 7 * 7 * 832 -> 7 * 7 * 1024
        self.inception5b = Inception(in_channels=832, ch1x1=384, ch3x3red=192, ch3x3=384, ch5x5red=48, ch5x5=128,
                                     pool_proj=128)
        if self.aux_logits:
            self.aux1 = InceptionAux(in_channels=512, num_classes=num_classes)
            self.aux2 = InceptionAux(in_channels=528, num_classes=num_classes)
        # 1 * 1 * 1024
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()

    @autocast()
    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool1(x)

        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool2(x)

        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)

        x = self.inception4a(x)

        if self.training and self.aux_logits:
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)

        if self.training and self.aux_logits:
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        x = self.maxpool4(x)

        x = self.inception5a(x)
        x = self.inception5b(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        if self.training and self.aux_logits:
            return x, aux2, aux1
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

7.MobileNet

7.1 MobileNet v1[7]

图 8.

如图 8.为MobileNet v1的网络架构。搞懂MobileNet v1之前需要搞懂 depthwise 卷积与 pointwise 卷积，如果想让一个feature map 进行卷积之后输出的通道数与之前不一样，普通卷积与 depthwise 卷积 + pointwise 卷积的做法是不一样的，但是可以得到一样的输出通道数，但是depthwise 卷积 + pointwise 卷积参数远远少于普通卷积。

如图 9.为普通卷积，其中卷积的输出通道表示有多少个卷积参与其中。（比如对一个shape为32x32x3的feature map做卷积，其中卷积的kernel_size为1，in_channels就是这个feature map的channels，而out_channels就是用了多少个卷积，如果用了4个就是4，会得到的feature map就是32x32x4（就用个1x1 conv 对其通道进行改变一下，不看stride，padding之类，不要把这个弄得太麻烦了））

图 9.

图 10.

如图 11.,pointwise 卷积（大致理解）把之前depthwise得到的feature map 经过4个1x1xM的卷积在深度上进行加权组合，得到最终的feature map，其shape为?_size x ?_size x 4。

图 11.

由于在MobileNet v1网络架构中，在训练后depthwise部分的卷积核参数大部分等于零，也就是depthwise卷积和没有起到作用。于是便有了v2版本，所以v1不复现，直接看v2。

7.2 MobileNet v2[8]

图 12.

MobileNet v2: Inverted residuals and linear bottlenecks，亮点在这个名字中体现了，就是Inverted residuals 与 Linear bottlenecks，如图 12.为MobileNet v2的网络架构（其中, t 表示扩展因子，扩展卷积核的个数；c 表示输出的通道数；n 表示 bottleneck[倒残差结构]重复的个数，s 表示步长，每个block一层的stride。）。

首先复现这个Inverted residuals 倒残差结构，如下图 13.（其中的t为扩展因子，扩展卷积核的个数）与图 14.，（左图有个shortcut，这个shortcut有且仅有当stride=1 and input feature map shape == output feature map shape）步骤，

1x1 conv 升维
3x3 conv DepthWise
1x1 conv 降维

一层 ConvBNReLU(继承nn.Sequential)

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                      kernel_size=kernel_size, stride=stride,
                      padding=padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6())

图 13.

图 14.

整个Inverted residuals，结合图 13. 与图 14.

class InvertedResidual(nn.Module):
    def __init__(self, in_channels, out_channels, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channels = in_channels * expand_ratio
        self.use_shortcut = stride == 1 and in_channels == out_channels

        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channels=in_channels, out_channels=hidden_channels, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(in_channels=hidden_channels, out_channels=hidden_channels, stride=stride,
                       groups=hidden_channels),
            # 1x1 pointwise conv(liner)
            nn.Conv2d(in_channels=hidden_channels, out_channels=out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels)
        ])
        self.conv = nn.Sequential(*layers)

    @autocast()
    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)

        else:
            return self.conv(x)

其中 expand_ratio 表示 t 扩展因子，in_channels 表示k，隐藏层的channels就为(t*k) hidden_channels

 hidden_channels = in_channels * expand_ratio

之后查看，是否使用shorcut->self.use_shortcut，满足之前所述的两个条件，1. stride=1; 2. in_channels=out_channels;

self.use_shortcut = stride == 1 and in_channels == out_channels

接着，查看bottleneck是否使用为1，如果为1就没有1x1 conv(改变通道)，如果不为1则有1x1 conv。

        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channels=in_channels, out_channels=hidden_channels, kernel_size=1))

MobileNet v2网络架构，结合图 12.，先来看初始化，其中alpha 表示控制卷积核使用的倍率；_make_divisible(params,round_nearest) 表示将channels调整为round_nearest的整数倍。

class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channels = _make_divisible(32 * alpha, round_nearest)
        last_channels = _make_divisible(1280 * alpha, round_nearest)

_make_divisible函数为：ch->32 * alpha or 1280 * alpha, divisor->round_nearest

def _make_divisible(ch, divisor=8, min_ch=None):
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

之后，倒残差的参数设置，结合图 12.来看

        inverted_residual_setting = [
            # t,c,n,s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

第一层卷积，结合图 12.

搭建bottlenck，结合图 12.，（其中, t 表示扩展因子，扩展卷积核的个数；c 表示输出的通道数；n 表示 bottleneck[倒残差结构]重复的个数，s 表示步长，每个block一层的stride。）

        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channels = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(
                    block(in_channels=input_channels, out_channels=output_channels, stride=stride, expand_ratio=t))
                input_channels = output_channels

结合图 12.，定义后续

        # building last several layers
        features.append(ConvBNReLU(in_channels=input_channels, out_channels=last_channels, kernel_size=1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channels, num_classes)
        )

整个MobileNet v2网络架构代码

class mobilenet_v2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8, init_weights=False):
        super(mobilenet_v2, self).__init__()
        block = InvertedResidual
        input_channels = _make_divisible(32 * alpha, round_nearest)
        last_channels = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t,c,n,s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(in_channels=3, out_channels=input_channels, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channels = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(
                    block(in_channels=input_channels, out_channels=output_channels, stride=stride, expand_ratio=t))
                input_channels = output_channels
        # building last several layers
        features.append(ConvBNReLU(in_channels=input_channels, out_channels=last_channels, kernel_size=1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channels, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def _initialize_weights(self):
        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    @autocast()
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

完整代码

import torch
import torch.nn as nn
from utils.path import CheckPoints
from torch.cuda.amp import autocast

__all__ = ['mobilenet_v2']
models_urls = {
    # "mobilenet_v2": "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth",
    'mobilenet_v2': '{}/mobilenet_v2-b0353104.pth'.format(CheckPoints),
}


def MobileNet_v2(num_classes=1000, pretrained=False, init_weights=False, **kwargs):
    model = mobilenet_v2(num_classes=num_classes, init_weights=init_weights, **kwargs)
    if pretrained:
        # if you want to use cpu, you should modify map_loaction=torch.device("cpu")
        pretrained_models = torch.load(models_urls['mobilenet_v2'], map_location=torch.device("cuda:0"))
        model.load_state_dict(pretrained_models, strict=False)
    return model


class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                      kernel_size=kernel_size, stride=stride,
                      padding=padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU6())


class InvertedResidual(nn.Module):
    def __init__(self, in_channels, out_channels, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channels = in_channels * expand_ratio
        self.use_shortcut = stride == 1 and in_channels == out_channels

        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channels=in_channels, out_channels=hidden_channels, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(in_channels=hidden_channels, out_channels=hidden_channels, stride=stride,
                       groups=hidden_channels),
            # 1x1 pointwise conv(liner)
            nn.Conv2d(in_channels=hidden_channels, out_channels=out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels)
        ])
        self.conv = nn.Sequential(*layers)

    @autocast()
    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)

        else:
            return self.conv(x)


def _make_divisible(ch, divisor=8, min_ch=None):
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


class mobilenet_v2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8, init_weights=False):
        super(mobilenet_v2, self).__init__()
        block = InvertedResidual
        input_channels = _make_divisible(32 * alpha, round_nearest)
        last_channels = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t,c,n,s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(in_channels=3, out_channels=input_channels, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channels = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(
                    block(in_channels=input_channels, out_channels=output_channels, stride=stride, expand_ratio=t))
                input_channels = output_channels
        # building last several layers
        features.append(ConvBNReLU(in_channels=input_channels, out_channels=last_channels, kernel_size=1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channels, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def _initialize_weights(self):
        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    @autocast()
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

7.3 MobileNet v3[9]

Large

small

买个关子，下一章更新...

训练代码

import os
import logging
import argparse
import warnings

warnings.filterwarnings('ignore')

import sys

BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)

import time
import torch
from data import *
import torchvision
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
from torchvision import transforms
from utils.accuracy import accuracy
from torch.utils.data import DataLoader
from utils.get_logger import get_logger
from models.basenets.lenet5 import lenet5
from models.basenets.alexnet import alexnet
from utils.AverageMeter import AverageMeter
from torch.cuda.amp import autocast, GradScaler
from models.basenets.googlenet import googlenet, GoogLeNet
from models.basenets.vgg import vgg11, vgg13, vgg16, vgg19
from models.basenets.mobilenet_v2 import mobilenet_v2, MobileNet_v2
from models.basenets.resnet import resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d


def parse_args():
    parser = argparse.ArgumentParser(description='PyTorch Classification Training')
    parser.add_mutually_exclusive_group()
    parser.add_argument('--dataset',
                        type=str,
                        default='ImageNet',
                        choices=['ImageNet', 'CIFAR'],
                        help='ImageNet, CIFAR')
    parser.add_argument('--dataset_root',
                        type=str,
                        default=ImageNet_Train_ROOT,
                        choices=[ImageNet_Train_ROOT, CIFAR_ROOT],
                        help='Dataset root directory path')
    parser.add_argument('--basenet',
                        type=str,
                        default='resnext',
                        choices=['resnet', 'vgg', 'lenet', 'alexnet', 'googlenet', 'mobilenet', 'resnext'],
                        help='Pretrained base model')
    parser.add_argument('--depth',
                        type=int,
                        default=50,
                        help='BaseNet depth, including: LeNet of 5, AlexNet of 0, VGG of 11, 13, 16, 19, ResNet of 18, 34, 50, 101, 152, GoogLeNet of 0, MobileNet of 2, 3, ResNeXt of 50, 101')
    parser.add_argument('--batch_size',
                        type=int,
                        default=32,
                        help='Batch size for training')
    parser.add_argument('--resume',
                        type=str,
                        default=None,
                        help='Checkpoint state_dict file to resume training from')
    parser.add_argument('--num_workers',
                        type=int,
                        default=8,
                        help='Number of workers user in dataloading')
    parser.add_argument('--cuda',
                        type=str,
                        default=True,
                        help='Use CUDA to train model')
    parser.add_argument('--accumulation_steps',
                        type=int,
                        default=1,
                        help='Gradient acumulation steps')
    parser.add_argument('--save_folder',
                        type=str,
                        default=config.checkpoint_path,
                        help='Directory for saving checkpoint models')
    parser.add_argument('--tensorboard',
                        type=str,
                        default=False,
                        help='Use tensorboard for loss visualization')
    parser.add_argument('--log_folder',
                        type=str,
                        default=config.log,
                        help='Log Folder')
    parser.add_argument('--log_name',
                        type=str,
                        default=config.classification_train_log,
                        help='Log Name')
    parser.add_argument('--tensorboard_log',
                        type=str,
                        default=config.tensorboard_log,
                        help='Use tensorboard for loss visualization')
    parser.add_argument('--lr',
                        type=float,
                        default=1e-3,
                        help='learning rate')
    parser.add_argument('--epochs',
                        type=int,
                        default=30,
                        help='Number of epochs')
    parser.add_argument('--num_classes',
                        type=int,
                        default=1000,
                        help='the number classes, like ImageNet:1000, cifar:10')
    parser.add_argument('--image_size',
                        type=int,
                        default=224,
                        help='image size, like ImageNet:224, cifar:32')
    parser.add_argument('--pretrained',
                        type=str,
                        default=True,
                        help='Models was pretrained')
    parser.add_argument('--init_weights',
                        type=str,
                        default=False,
                        help='Init Weights')
    parser.add_argument('--patience',
                        type=int,
                        default=2,
                        help='patience of ReduceLROnPlateau')
    parser.add_argument('--weight_decay',
                        type=float,
                        default=1e-4,
                        help='weight decay')
    parser.add_argument('--momentum',
                        type=float,
                        default=0.9,
                        help='Momentum value for optim')

    return parser.parse_args()


args = parse_args()

# 1. Log
get_logger(args.log_folder, args.log_name)
logger = logging.getLogger(args.log_name)

# 2. Torch choose cuda or cpu
if torch.cuda.is_available():
    if args.cuda:
        torch.set_default_tensor_type('torch.cuda.FloatTensor')
    if not args.cuda:
        print("WARNING: It looks like you have a CUDA device, but you aren't using it" +
              "\n You can set the parameter of cuda to True.")
        torch.set_default_tensor_type('torch.FloatTensor')
else:
    torch.set_default_tensor_type('torch.FloatTensor')

if not os.path.exists(args.save_folder):
    os.mkdir(args.save_folder)


def train():
    # 3. Create SummaryWriter
    if args.tensorboard:
        from torch.utils.tensorboard import SummaryWriter
        # tensorboard  loss
        writer = SummaryWriter(args.tensorboard_log)
    # vgg16, alexnet and lenet5 need to resize image_size, because of fc.
    if args.basenet == 'vgg' or args.basenet == 'alexnet' or args.basenet == 'googlenet':
        args.image_size = 224
    elif args.basenet == 'lenet':
        args.image_size = 32

    # 4. Ready dataset
    if args.dataset == 'ImageNet':
        if args.dataset_root == CIFAR_ROOT:
            raise ValueError('Must specify dataset_root if specifying dataset ImageNet2012.')

        elif os.path.exists(ImageNet_Train_ROOT) is None:
            raise ValueError("WARNING: Using default ImageNet2012 dataset_root because " +
                             "--dataset_root was not specified.")

        dataset = torchvision.datasets.ImageFolder(
            root=args.dataset_root,
            transform=torchvision.transforms.Compose([
                transforms.Resize((args.image_size,
                                   args.image_size)),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
            ]))

    elif args.dataset == 'CIFAR':
        if args.dataset_root == ImageNet_Train_ROOT:
            raise ValueError('Must specify dataset_root if specifying dataset CIFAR10.')

        elif args.dataset_root is None:
            raise ValueError("Must provide --dataset_root when training on CIFAR10.")

        dataset = torchvision.datasets.CIFAR10(root=args.dataset_root, train=True,
                                               transform=torchvision.transforms.Compose([
                                                   transforms.Resize((args.image_size,
                                                                      args.image_size)),
                                                   torchvision.transforms.ToTensor()]))
    else:
        raise ValueError('Dataset type not understood (must be ImageNet or CIFAR), exiting.')

    dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size=args.batch_size,
                                             shuffle=True, num_workers=args.num_workers,
                                             pin_memory=False, generator=torch.Generator(device='cuda'))

    top1 = AverageMeter()
    top5 = AverageMeter()
    losses = AverageMeter()

    # 5. Define train model

    # Unfortunately, LeNet5 and AlexNet don't provide pretrianed Model.
    if args.basenet == 'lenet':
        if args.depth == 5:
            model = lenet5(num_classes=args.num_classes,
                           init_weights=args.init_weights)
        else:
            raise ValueError('Unsupported LeNet depth!')

    elif args.basenet == 'alexnet':
        if args.depth == 0:
            model = alexnet(num_classes=args.num_classes,
                            init_weights=args.init_weights)
        else:
            raise ValueError('Unsupported AlexNet depth!')

    elif args.basenet == 'googlenet':
        if args.depth == 0:
            model = GoogLeNet(num_classes=args.num_classes,
                              pretrained=args.pretrained,
                              aux_logits=True,
                              init_weights=args.init_weights)
        else:
            raise ValueError('Unsupported GoogLeNet depth!')

    elif args.basenet == 'vgg':
        if args.depth == 11:
            model = vgg11(pretrained=args.pretrained,
                          num_classes=args.num_classes,
                          init_weights=args.init_weights)
        elif args.depth == 13:
            model = vgg13(pretrained=args.pretrained,
                          num_classes=args.num_classes,
                          init_weights=args.init_weights)
        elif args.depth == 16:
            model = vgg16(pretrained=args.pretrained,
                          num_classes=args.num_classes,
                          init_weights=args.init_weights)
        elif args.depth == 19:
            model = vgg19(pretrained=args.pretrained,
                          num_classes=args.num_classes,
                          init_weights=args.init_weights)
        else:
            raise ValueError('Unsupported VGG depth!')

    # Unfortunately for my resnet, there is no set init_weight, because I'm going to set object detection algorithm
    elif args.basenet == 'resnet':
        if args.depth == 18:
            model = resnet18(pretrained=args.pretrained,
                             num_classes=args.num_classes)
        elif args.depth == 34:
            model = resnet34(pretrained=args.pretrained,
                             num_classes=args.num_classes)
        elif args.depth == 50:
            model = resnet50(pretrained=args.pretrained,
                             num_classes=args.num_classes)  # False means the models was not trained
        elif args.depth == 101:
            model = resnet101(pretrained=args.pretrained,
                              num_classes=args.num_classes)
        elif args.depth == 152:
            model = resnet152(pretrained=args.pretrained,
                              num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNet depth!')
    elif args.basenet == 'resnext':
        if args.depth == 50:
            model = resnext50_32x4d(pretrained=args.pretrained,
                                    num_classes=args.num_classes)
        elif args.depth == 101:
            model = resnext101_32x8d(pretrained=args.pretrained,
                                     num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNeXt depth!')

    elif args.basenet == 'mobilenet':
        if args.depth == 2:
            model = MobileNet_v2(pretrained=args.pretrained,
                                 num_classes=args.num_classes,
                                 init_weights=args.init_weights)
        else:
            raise ValueError('Unsupported MobileNet depth!')

    else:
        raise ValueError('Unsupported model type!')

    if args.cuda:
        if torch.cuda.is_available():
            model = model.cuda()
            model = torch.nn.DataParallel(model).cuda()
    else:
        model = torch.nn.DataParallel(model)

    # 6. Loading weights
    if args.resume:
        other, ext = os.path.splitext(args.resume)
        if ext == '.pkl' or '.pth':
            print('Loading weights into state dict...')
            model_load = os.path.join(args.save_folder, args.resume)
            model.load_state_dict(torch.load(model_load))
        else:
            print('Sorry only .pth and .pkl files supported.')
    if args.init_weights:
        # initialize newly added models' weights with xavier method
        if args.basenet == 'resnet':
            print("There is no set init_weight, because I'm going to set object detection algorithm.")
        else:
            print("Initializing weights...")
    else:
        print("Not Initializing weights...")
    if args.pretrained:
        if args.basenet == 'lenet' or args.basenet == 'alexnet':
            print("There is no available pretrained model on the website. ")
        else:
            print("Models was pretrained...")
    else:
        print("Pretrained models is False...")

    model.train()

    iteration = 0

    # 7. Optimizer
    optimizer = optim.SGD(model.parameters(), lr=args.lr,
                          momentum=args.momentum, weight_decay=args.weight_decay)
    criterion = nn.CrossEntropyLoss()
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, patience=args.patience, verbose=True)
    scaler = GradScaler()

    # 8. Length
    iter_size = len(dataset) // args.batch_size
    print("len(dataset): {}, iter_size: {}".format(len(dataset), iter_size))
    logger.info(f"args - {args}")
    t0 = time.time()

    # 9. Create batch iterator
    for epoch in range(args.epochs):
        t1 = time.time()
        model.training = True
        torch.cuda.empty_cache()
        # 10. Load train data
        for data in dataloader:
            iteration += 1
            images, targets = data
            # 11. Backward
            optimizer.zero_grad()
            if args.cuda:
                images, targets = images.cuda(), targets.cuda()
                criterion = criterion.cuda()
            # 12. Forward
            with autocast():
                if args.basenet == 'googlenet':
                    outputs, aux2_output, aux1_output = model(images)
                    loss1 = criterion(outputs, targets)
                    loss_aux2 = criterion(aux2_output, targets)
                    loss_aux1 = criterion(aux1_output, targets)
                    loss = loss1 + loss_aux2 * 0.3 + loss_aux1 * 0.3
                else:
                    outputs = model(images)
                    loss = criterion(outputs, targets)
                    loss = loss / args.accumulation_steps

            if args.tensorboard:
                writer.add_scalar("train_classification_loss", loss.item(), iteration)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            # 13. Measure accuracy and record loss
            acc1, acc5 = accuracy(outputs, targets, topk=(1, 5))
            top1.update(acc1.item(), images.size(0))
            top5.update(acc5.item(), images.size(0))
            losses.update(loss.item(), images.size(0))

            if iteration % 100 == 0:
                logger.info(
                    f"- epoch: {epoch},  iteration: {iteration}, lr: {optimizer.param_groups[0]['lr']}, "
                    f"top1 acc: {acc1.item():.2f}%, top5 acc: {acc5.item():.2f}%, "
                    f"loss: {loss.item():.3f}, (losses.avg): {losses.avg:3f} "
                )

        scheduler.step(losses.avg)

        t2 = time.time()
        h_time = (t2 - t1) // 3600
        m_time = ((t2 - t1) % 3600) // 60
        s_time = ((t2 - t1) % 3600) % 60
        print("epoch {} is finished, and the time is {}h{}min{}s".format(epoch, int(h_time), int(m_time), int(s_time)))

        # 14. Save train model
        if epoch != 0 and epoch % 10 == 0:
            print('Saving state, iter:', epoch)
            torch.save(model.state_dict(),
                       args.save_folder + '/' + args.dataset +
                       '_' + args.basenet + str(args.depth) + '_' + repr(epoch) + '.pth')
        torch.save(model.state_dict(),
                   args.save_folder + '/' + args.dataset + "_" + args.basenet + str(args.depth) + '.pth')

    if args.tensorboard:
        writer.close()

    t3 = time.time()
    h = (t3 - t0) // 3600
    m = ((t3 - t0) % 3600) // 60
    s = ((t3 - t0) % 3600) % 60
    print("The Finished Time is {}h{}m{}s".format(int(h), int(m), int(s)))
    return top1.avg, top5.avg, losses.avg


if __name__ == '__main__':
    torch.multiprocessing.set_start_method('spawn')
    logger.info("Program started")
    top1, top5, loss = train()
    print("top1 acc: {}, top5 acc: {}, loss:{}".format(top1, top5, loss))
    logger.info("Done!")

测试代码

import logging
import os
import argparse
import warnings

warnings.filterwarnings('ignore')

import sys

BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)

import time
from data import *
from PIL import Image
import torch.nn.parallel
from torchvision import transforms
from utils.get_logger import get_logger
from models.basenets.lenet5 import lenet5
from models.basenets.alexnet import alexnet
from models.basenets.vgg import vgg11, vgg13, vgg16, vgg19
from models.basenets.googlenet import googlenet, GoogLeNet
from models.basenets.mobilenet_v2 import mobilenet_v2, MobileNet_v2
from models.basenets.resnet import resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d


def parse_args():
    parser = argparse.ArgumentParser(description='PyTorch Classification Testing')
    parser.add_mutually_exclusive_group()
    parser.add_argument('--dataset',
                        type=str,
                        default='ImageNet',
                        choices=['ImageNet', 'CIFAR'],
                        help='ImageNet,  CIFAR')
    parser.add_argument('--images_root',
                        type=str,
                        default=config.images_cls_root,
                        help='Dataset root directory path')
    parser.add_argument('--basenet',
                        type=str,
                        default='resnext',
                        choices=['resnet', 'vgg', 'lenet', 'alexnet', 'googlenet', 'mobilenet', 'resnext'],
                        help='Pretrained base model')
    parser.add_argument('--depth',
                        type=int,
                        default=50,
                        help='BaseNet depth, including: LeNet of 5, AlexNet of 0, VGG of 11, 13, 16, 19, ResNet of 18, 34, 50, 101, 152, GoogLeNet of 0, MobileNet of 2, 3,ResNeXt of 50, 101')
    parser.add_argument('--evaluate',
                        type=str,
                        default=config.classification_evaluate,
                        help='Checkpoint state_dict file to evaluate training from')
    parser.add_argument('--save_folder',
                        type=str,
                        default=config.checkpoint_path,
                        help='Directory for saving checkpoint models')
    parser.add_argument('--log_folder',
                        type=str,
                        default=config.log,
                        help='Log Folder')
    parser.add_argument('--log_name',
                        type=str,
                        default=config.classification_test_log,
                        help='Log Name')
    parser.add_argument('--cuda',
                        type=str,
                        default=True,
                        help='Use CUDA to train model')
    parser.add_argument('--num_classes',
                        type=int,
                        default=1000,
                        help='the number classes, like ImageNet:1000, cifar:10')
    parser.add_argument('--image_size',
                        type=int,
                        default=224,
                        help='image size, like ImageNet:224, cifar:32')
    return parser.parse_args()


args = parse_args()

# 1. Torch choose cuda or cpu
if torch.cuda.is_available():
    if args.cuda:
        torch.set_default_tensor_type('torch.cuda.FloatTensor')
    if not args.cuda:
        print("WARNING: It looks like you have a CUDA device, but you aren't using it" +
              "\n You can set the parameter of cuda to True.")
        torch.set_default_tensor_type('torch.FloatTensor')
else:
    torch.set_default_tensor_type('torch.FloatTensor')

if not os.path.exists(args.save_folder):
    os.mkdir(args.save_folder)

# 2. Log
get_logger(args.log_folder, args.log_name)
logger = logging.getLogger(args.log_name)


def get_label_file(filename):
    if not os.path.exists(filename):
        print("The dataset label.txt is empty, We need to create a new one.")
        os.mkdir(filename)
    return filename


def dataset_labels_results(filename, output):
    filename = os.path.join(BASE_DIR, 'data', filename + '_labels.txt')
    get_label_file(filename=filename)
    with open(file=filename, mode='r') as f:
        dict = f.readlines()
        output = output.cpu().numpy()
        output = output[0]
        output = dict[output]
        f.close()
    return output


def test():
    # vgg16, alexnet and lenet5 need to resize image_size, because of fc.
    if args.basenet == 'vgg' or args.basenet == 'alexnet' or args.basenet == 'googlenet':
        args.image_size = 224
    elif args.basenet == 'lenet':
        args.image_size = 32

    # 3. Ready image
    if args.images_root is None:
        raise ValueError("The images is None, you should load image!")

    image = Image.open(args.images_root)
    transform = transforms.Compose([
        transforms.Resize((args.image_size,
                           args.image_size)),
        transforms.ToTensor()])

    image = transform(image)

    image = image.reshape(1, 3, args.image_size, args.image_size)

    # 4. Define to train mode
    if args.basenet == 'lenet':
        if args.depth == 5:
            model = lenet5(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported LeNet depth!')

    elif args.basenet == 'alexnet':
        if args.depth == 0:
            model = alexnet(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported AlexNet depth!')

    elif args.basenet == 'googlenet':
        if args.depth == 0:
            model = googlenet(num_classes=args.num_classes,
                              aux_logits=False)
        else:
            raise ValueError('Unsupported GoogLeNet depth!')

    elif args.basenet == 'vgg':
        if args.depth == 11:
            model = vgg11(num_classes=args.num_classes)
        elif args.depth == 13:
            model = vgg13(num_classes=args.num_classes)
        elif args.depth == 16:
            model = vgg16(num_classes=args.num_classes)
        elif args.depth == 19:
            model = vgg19(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported VGG depth!')

    elif args.basenet == 'resnet':
        if args.depth == 18:
            model = resnet18(num_classes=args.num_classes)
        elif args.depth == 34:
            model = resnet34(num_classes=args.num_classes)
        elif args.depth == 50:
            model = resnet50(num_classes=args.num_classes)  # False means the models is not trained
        elif args.depth == 101:
            model = resnet101(num_classes=args.num_classes)
        elif args.depth == 152:
            model = resnet152(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNet depth!')
    elif args.basenet == 'resnext':
        if args.depth == 50:
            model = resnext50_32x4d(num_classes=args.num_classes)
        elif args.depth == 101:
            model = resnext101_32x8d(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNeXt depth!')
    elif args.basenet == 'mobilenet':
        if args.depth == 2:
            model = mobilenet_v2(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported MobileNet depth!')
    else:
        raise ValueError('Unsupported model type!')

    if args.cuda:
        model = model.cuda()
        model = torch.nn.DataParallel(model).cuda()
    else:
        model = torch.nn.DataParallel(model)

    # 5. Loading model
    if args.evaluate:
        other, ext = os.path.splitext(args.evaluate)
        if ext == '.pkl' or '.pth':
            print('Loading weights into state dict...')
            model_evaluate_load = os.path.join(args.save_folder, args.evaluate)
            model_evaluate_load = torch.load(model_evaluate_load)
            if args.basenet == 'googlenet':
                model_evaluate_load = {k: v for k, v in model_evaluate_load.items() if "aux" not in k}
            model.load_state_dict(model_evaluate_load)
        else:
            print('Sorry only .pth and .pkl files supported.')
    elif args.evaluate is None:
        print("Sorry, you should load weights! ")

    model.eval()

    # 6. print
    logger.info(f"args - {args}")

    # 7. Test
    with torch.no_grad():
        t0 = time.time()
        # 8. Forward
        if args.cuda:
            image = image.cuda()
        output = model(image)
        output = output.argmax(1)
        t1 = time.time()
        m = (t1 - t0) // 60
        s = (t1 - t0) % 60
        folder_name = args.dataset
        output = dataset_labels_results(filename=folder_name, output=output)
        logger.info(f"output: {output}")
        print("It took a total of {}m{}s to complete the testing.".format(int(m), int(s)))
    return output


if __name__ == '__main__':
    torch.multiprocessing.set_start_method('spawn')
    logger.info("Program started")
    output = test()
    logger.info("Done!")

评估代码

import os
import logging
import argparse
import warnings

warnings.filterwarnings('ignore')

import sys

BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(BASE_DIR)

import time
import torch
from data import *
import torchvision
import torch.nn.parallel
from torchvision import transforms
from utils.accuracy import accuracy
from utils.get_logger import get_logger
from torch.utils.data import DataLoader
from models.basenets.lenet5 import lenet5
from models.basenets.alexnet import alexnet
from utils.AverageMeter import AverageMeter
from models.basenets.vgg import vgg11, vgg13, vgg16, vgg19
from models.basenets.googlenet import googlenet, GoogLeNet
from models.basenets.mobilenet_v2 import mobilenet_v2, MobileNet_v2
from models.basenets.resnet import resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d


def parse_args():
    parser = argparse.ArgumentParser(description='PyTorch Classification Evaluation')
    parser.add_mutually_exclusive_group()
    parser.add_argument('--dataset',
                        type=str,
                        default='ImageNet',
                        choices=['ImageNet', 'CIFAR'],
                        help='ImageNet,CIFAR')
    parser.add_argument('--dataset_root',
                        type=str,
                        default=ImageNet_Eval_ROOT,
                        choices=[ImageNet_Eval_ROOT, CIFAR_ROOT],
                        help='Dataset root directory path')
    parser.add_argument('--basenet',
                        type=str,
                        default='resnext',
                        choices=['resnet', 'vgg', 'lenet', 'alexnet', 'googlenet', 'resnext'],
                        help='Pretrained base model')
    parser.add_argument('--depth',
                        type=int,
                        default=50,
                        help='BaseNet depth, including: LeNet of 5, AlexNet of 0, VGG of 11, 13, 16, 19, ResNet of 18, 34, 50, 101, 152, GoogLeNet of 0,ResNext of 50, 101')
    parser.add_argument('--batch_size',
                        type=int,
                        default=32,
                        help='Batch size for training')
    parser.add_argument('--evaluate',
                        type=str,
                        default=config.classification_evaluate,
                        help='Checkpoint state_dict file to evaluate training from')
    parser.add_argument('--num_workers',
                        type=int,
                        default=8,
                        help='Number of workers user in dataloading')
    parser.add_argument('--cuda',
                        type=str,
                        default=True,
                        help='Use CUDA to eval model')
    parser.add_argument('--save_folder',
                        type=str,
                        default=config.checkpoint_path,
                        help='Directory for saving checkpoint models')
    parser.add_argument('--log_folder',
                        type=str,
                        default=config.log,
                        help='Log Folder')
    parser.add_argument('--log_name',
                        type=str,
                        default=config.classification_eval_log,
                        help='Log Name')
    parser.add_argument('--num_classes',
                        type=int,
                        default=1000,
                        help='the number classes, like ImageNet:1000, cifar:10')
    parser.add_argument('--image_size',
                        type=int,
                        default=224,
                        help='image size, like ImageNet:224, cifar:32')

    return parser.parse_args()


args = parse_args()

# 1. Torch choose cuda or cpu
if torch.cuda.is_available():
    if args.cuda:
        torch.set_default_tensor_type('torch.cuda.FloatTensor')
    if not args.cuda:
        print("WARNING: It looks like you have a CUDA device, but you aren't using it" +
              "\n You can set the parameter of cuda to True.")
        torch.set_default_tensor_type('torch.FloatTensor')
else:
    torch.set_default_tensor_type('torch.FloatTensor')

if os.path.exists(args.save_folder) is None:
    os.mkdir(args.save_folder)

# 2. Log
get_logger(args.log_folder, args.log_name)
logger = logging.getLogger(args.log_name)


def eval():
    # vgg16, alexnet and lenet5 need to resize image_size, because of fc.
    if args.basenet == 'vgg' or args.basenet == 'alexnet' or args.basenet == 'googlenet':
        args.image_size = 224
    elif args.basenet == 'lenet':
        args.image_size = 32

    # 3. Ready dataset
    if args.dataset == 'ImageNet':
        if args.dataset_root == CIFAR_ROOT:
            raise ValueError("Must specify dataset_root if specifying dataset ImageNet")
        elif os.path.exists(ImageNet_Eval_ROOT) is None:
            raise ValueError("WARNING: Using default ImageNet dataset_root because " +
                             "--dataset_root was not specified.")

        dataset = torchvision.datasets.ImageFolder(
            root=args.dataset_root,
            transform=torchvision.transforms.Compose([
                transforms.Resize((args.image_size,
                                   args.image_size)),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
            ]))

    elif args.dataset == 'CIFAR':
        if args.dataset_root == ImageNet_Eval_ROOT:
            raise ValueError('Must specify dataset_root if specifying dataset CIFAR')
        elif args.dataset_root is None:
            raise ValueError("Must provide --dataset_root when training on CIFAR")

        dataset = torchvision.datasets.CIFAR10(
            root=args.dataset_root, train=False,
            transform=torchvision.transforms.Compose([
                transforms.Resize((args.image_size,
                                   args.image_size)),
                torchvision.transforms.ToTensor()]))

    else:
        raise ValueError('Dataset type not understood (must be ImageNet or CIFAR), exiting.')

    dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size=args.batch_size,
                                             shuffle=True, num_workers=args.num_workers,
                                             pin_memory=False, generator=torch.Generator(device='cuda'))

    top1 = AverageMeter()
    top5 = AverageMeter()

    # 4. Define to mode
    if args.basenet == 'lenet':
        if args.depth == 5:
            model = lenet5(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported LeNet depth!')

    elif args.basenet == 'alexnet':
        if args.depth == 0:
            model = alexnet(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported AlexNet depth!')

    elif args.basenet == 'googlenet':
        if args.depth == 0:
            model = googlenet(num_classes=args.num_classes,
                              aux_logits=False)
        else:
            raise ValueError('Unsupported GoogLeNet depth!')

    elif args.basenet == 'vgg':
        if args.depth == 11:
            model = vgg11(num_classes=args.num_classes)
        elif args.depth == 13:
            model = vgg13(num_classes=args.num_classes)
        elif args.depth == 16:
            model = vgg16(num_classes=args.num_classes)
        elif args.depth == 19:
            model = vgg19(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported VGG depth!')

    elif args.basenet == 'resnet':
        if args.depth == 18:
            model = resnet18(num_classes=args.num_classes)
        elif args.depth == 34:
            model = resnet34(num_classes=args.num_classes)
        elif args.depth == 50:
            model = resnet50(num_classes=args.num_classes)  # False means the models is not trained
        elif args.depth == 101:
            model = resnet101(num_classes=args.num_classes)
        elif args.depth == 152:
            model = resnet152(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNet depth!')
    elif args.basenet == 'resnext':
        if args.depth == 50:
            model = resnext50_32x4d(num_classes=args.num_classes)
        elif args.depth == 101:
            model = resnext101_32x8d(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported ResNeXt depth!')
    elif args.basenet == 'mobilenet':
        if args.depth == 2:
            model = mobilenet_v2(num_classes=args.num_classes)
        else:
            raise ValueError('Unsupported MobileNet depth!')
    else:
        raise ValueError('Unsupported model type!')

    if args.cuda:
        if torch.cuda.is_available():
            model = model.cuda()
            model = torch.nn.DataParallel(model).cuda()
    else:
        model = torch.nn.DataParallel(model)

        # 5. Loading model
    if args.evaluate:
        other, ext = os.path.splitext(args.evaluate)
        if ext == '.pkl' or '.pth':
            print('Loading weights into state dict...')
            model_evaluate_load = os.path.join(args.save_folder, args.evaluate)
            model_evaluate_load = torch.load(model_evaluate_load)
            if args.basenet == 'googlenet':
                model_evaluate_load = {k: v for k, v in model_evaluate_load.items() if "aux" not in k}
            model.load_state_dict(model_evaluate_load)
        else:
            print('Sorry only .pth and .pkl files supported.')
    elif args.evaluate is None:
        raise ValueError("Sorry, you should load weights! ")

    model.eval()

    # 6. Length
    iter_size = len(dataset) // args.batch_size
    print("len(dataset): {}, iter_size: {}".format(len(dataset), iter_size))
    logger.info(f"args - {args}")
    t0 = time.time()
    iteration = 0

    # 7. Test
    with torch.no_grad():
        torch.cuda.empty_cache()
        # 8. Load test data
        for data in dataloader:
            iteration += 1
            images, targets = data
            if args.cuda:
                images, targets = images.cuda(), targets.cuda()

            # 9. Forward
            outputs = model(images)

            # 10. measure accuracy and record loss
            acc1, acc5 = accuracy(outputs, targets, topk=(1, 5))
            top1.update(acc1.item(), images.size(0))
            top5.update(acc5.item(), images.size(0))

            logger.info(
                f"iteration: {iteration}, top1 acc: {acc1.item():.2f}%, top5 acc: {acc5.item():.2f}%. ")

        t1 = time.time()
        m = (t1 - t0) // 60
        s = (t1 - t0) % 60
        print("It took a total of {}m{}s to complete the evaluating.".format(int(m), int(s)))
    return top1.avg, top5.avg


if __name__ == '__main__':
    torch.multiprocessing.set_start_method('spawn')
    logger.info("Program started")
    top1, top5 = eval()
    print("top1 acc: {}, top5 acc: {}".format(top1, top5))
    logger.info("Done!")

运行结果

5.ResNeXt

 basenet: resnext50_32x4d
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/eopch	top1 acc (%)	top5 acc (%)
7	4h5min16s	72.28	91.56

6.GoogLeNet

 basenet: GoogLeNet 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 epoch: 30

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h59min21s	37.50	65.62

 basenet: GoogLeNet 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/eopch	top1 acc (%)	top5 acc (%)
5	3h54min31s	42.70	69.34

7.MobileNet

7.1.MobileNet_v2

 basenet: MobileNet_v2
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h58min3s	66.90	88.19

下一话

CV+Deep Learning——网络架构Pytorch复现系列——classification(三：MobileNet，ShuffleNet)https://blog.csdn.net/XiaoyYidiaodiao/article/details/126228934

参考文献

[5] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.

[6] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

[7] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

[8] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[9] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.

你可能感兴趣的:(深度学习,pytorch,人工智能)

深度学习——CNN（3）飘涯
前言：前面介绍了最基本的Lenet，下面介绍几种其他的网络结构CNN-AlexNet网络结构如下图：从图中可以看出，采用双gpu训练增加LRN归一化层：本质上，这个层也是为了防止激活函数的饱和的。采用dropout防止过拟合基于AlexNet进行微调，诞生了ZF-netCNN-GoogleNetGoogLeNet借鉴了NIN的特性，在原先的卷积过程中附加了11的卷积核加上ReLU激活。这不仅仅提升
AI 人工智能与 Copilot 的融合发展策略 AI天才研究院 AI人工智能与大数据人工智能 copilot ai
AI人工智能与Copilot的融合发展策略关键词：人工智能、Copilot、代码生成、人机协作、机器学习、自然语言处理、软件开发摘要：本文探讨了人工智能与Copilot技术的融合发展策略。我们将从技术原理、实现方法、应用场景等多个维度深入分析，提出一套完整的融合框架和发展路径。文章首先介绍背景和核心概念，然后详细讲解关键技术，包括自然语言处理、代码生成算法等，接着通过实际案例展示应用效果，最后讨论
AI 人工智能与 Copilot 碰撞出的火花 AI天才研究院 AI大模型企业级应用开发实战人工智能 copilot ai
AI人工智能与Copilot碰撞出的火花关键词：AI人工智能、Copilot、代码辅助、智能编程、人机协作、软件开发、技术创新摘要：本文深入探讨了AI人工智能与Copilot碰撞所产生的一系列效应。首先介绍了相关背景，包括目的、预期读者、文档结构和术语表。接着阐述了核心概念与联系，展示了其原理和架构的示意图及流程图。详细讲解了核心算法原理和具体操作步骤，并通过Python代码进行说明。同时给出了数
微算法科技技术突破：用于前馈神经网络的量子算法技术助力神经网络变革 MicroTech2025 量子计算算法神经网络
随着量子计算和机器学习的迅猛发展，企业界正逐步迈向融合这两大领域的新时代。在这一背景下，微算法科技（NASDAQ:MLGO）成功研发出一套用于前馈神经网络的量子算法，突破了传统神经网络在训练和评估中的性能瓶颈。这一创新性的量子算法以经典的前馈和反向传播算法为基础，借助量子计算的强大算力，极大提升了网络训练和评估效率，并带来了对过拟合的天然抗性。前馈神经网络是深度学习的核心架构，广泛应用于图像分类、
微算法科技研究量子视觉计算，利用量子力学原理提升传统计算机视觉任务的性能
计算机视觉，作为人工智能领域的一个重要分支，致力于模拟人类视觉系统对图像或视频等视觉数据的理解与分析能力。它涵盖了图像识别、目标检测、图像分割等一系列复杂任务，广泛应用于自动驾驶、医疗影像分析、安防监控等多个领域。然而，随着数据规模的不断膨胀和任务复杂度的日益提升，传统计算机视觉算法在处理大规模、高维度数据时遇到了性能瓶颈。微算法科技(NASDAQ：MLGO)研究量子视觉计算，探索量子计算与经典卷
PyTorch数据加载与预处理飘若随风 PyTorch pytorch 人工智能 python
数据加载与预处理详解1.数据集类(Dataset和DataLoader)1.1Dataset基类PyTorch中的Dataset是一个抽象类，所有自定义的数据集都应该继承这个类，并实现以下两个方法：__len__():返回数据集的大小__getitem__():根据索引返回一个样本概念解析：Dataset类提供了统一的数据访问接口通过继承Dataset，我们可以轻松地将数据集成到PyTorch的生
pad_sequence 朋也透william 人工智能深度学习
pad_sequence是PyTorch提供的工具，用于将一组张量序列（通常是变长的序列）进行填充。pad_sequence默认的填充方式是将所有序列填充到同一长度，即最长的序列的长度，这样可以确保所有序列都具有相同的维度。在处理变长序列时，pad_sequence会自动找到需要填充的最大序列长度，然后使用默认的填充值（通常是0）。texts=pad_sequence([torch.LongTen
vLLM快速入门：开启高效推理与部署之旅
在如今这个人工智能飞速发展的时代，语言模型的应用已经深入到我们生活的方方面面，从智能聊天机器人到文本生成工具，都离不开强大的语言模型技术支持。而vLLM作为一个专注于高效推理和部署的开源项目，正在为研究人员和开发人员提供一种全新的解决方案，让语言模型的使用变得更加便捷、高效。初识vLLM：背景与意义vLLM（VeryLargeLanguageModelInference）是一个专注于大型语言模型推
深入解析 vLLM 分布式推理与部署策略
在当今人工智能快速发展的时代，大型语言模型（LLM）的推理和部署面临着诸多挑战，尤其是当模型规模日益庞大时，如何高效地利用硬件资源成为关键问题。vLLM作为一种强大的工具，为分布式推理和部署提供了多种策略，本文将详细探讨其相关技术和应用场景，希望能对您提供有价值的参考。分布式推理策略的选择在开始分布式推理和部署之前，明确何时采用分布式推理以及可选的策略至关重要。1.单GPU推理：如果模型能够在单个
英伟达Triton 推理服务详解 leo0308 基础知识机器人 Triton 人工智能
1.TritonInferenceServer简介TritonInferenceServer（简称Triton，原名NVIDIATensorRTInferenceServer）是英伟达推出的一个开源、高性能的推理服务器，专为AI模型的部署和推理服务而设计。它支持多种深度学习框架和硬件平台，能够帮助开发者和企业高效地将AI模型部署到生产环境中。Triton主要用于模型推理服务化，即将训练好的模型通过
pytorch——cpu版本安装，anaconda及清华源镜像相关龙鹰图腾223
cpu版本的安装1）准备工作：清华源下载所需版本的离线安装包https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/2）anacondaprompt安装d:进入d盘，cd/XX/XX/XX进入目标目录（安装包放的位置）然后condainstallXXXX.tar.bz2(注意：.bz2的后缀如果不加上会影响安装)方法2：直接用pip用
Java NLP炼金术：从词袋到深度学习，构建AI时代的语言魔方墨夶 Java学习资料人工智能 java 自然语言处理
一、JavaNLP的“三剑客”：框架与工具链1.1ApacheOpenNLP：传统NLP的“瑞士军刀”目标：用词袋模型实现文本分类与实体识别代码实战：文档分类器的“炼成术”//OpenNLP文档分类器（基于词袋模型）importopennlp.tools.doccat.*;importopennlp.tools.util.*;publicclassDocumentClassifier{//训练模型
PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
AI音乐模拟器：AIGC时代的智能音乐创作革命 lauo 人工智能 AIGC 开源前端机器人
AI音乐模拟器：AIGC时代的智能音乐创作革命引言：AIGC浪潮下的音乐创作新范式在数字化转型的浪潮中，人工智能生成内容（AIGC）正在重塑各个创意领域。音乐产业作为创意经济的重要组成部分，正经历着前所未有的变革。据最新市场研究数据显示，全球AI音乐市场规模预计将从2023年的5.8亿美元增长到2030年的26.8亿美元，年复合增长率高达24.3%。这一快速增长的市场背后，是AI音乐技术正在打破传
深度学习模型表征提取全解析 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 深度学习人工智能 python embedding 语言模型
模型内部进行表征提取的方法在自然语言处理（NLP）中，“表征（Representation）”指将文本（词、短语、句子、文档等）转化为计算机可理解的数值形式（如向量、矩阵），核心目标是捕捉语言的语义、语法、上下文依赖等信息。自然语言表征技术可按“静态/动态”“有无上下文”“是否融入知识”等维度划分一、传统静态表征（无上下文，词级为主）这类方法为每个词分配固定向量，不考虑其在具体语境中的含义（无法解
视频分析：让AI看懂动态画面随机森林404 计算机视觉音视频人工智能 microsoft
引言：动态视觉理解的革命在数字信息爆炸的时代，视频已成为最主要的媒介形式。据统计，每分钟有超过500小时的视频内容被上传到YouTube平台，而全球互联网流量的82%来自视频数据传输。面对如此海量的视频内容，传统的人工处理方式已无法满足需求，这正是人工智能视频分析技术大显身手的舞台。视频分析技术赋予机器"看懂"动态画面的能力，使其能够自动理解、解释甚至预测视频中的内容，这一突破正在彻底改变我们与视
【Qualcomm】高通SNPE框架简介、下载与使用 Jackilina_Stone 人工智能 Qualcomm SNPE
目录一高通SNPE框架1SNPE简介2QNN与SNPE3Capabilities4工作流程二SNPE的安装与使用1下载2Setup3SNPE的使用概述一高通SNPE框架1SNPE简介SNPE（SnapdragonNeuralProcessingEngine），是高通公司推出的面向移动端和物联网设备的深度学习推理框架。SNPE提供了一套完整的深度学习推理框架，能够支持多种深度学习模型，包括Pytor
vllm本地部署bge-reranker-v2-m3模型API服务实战教程雷电法王大模型部署 linux python vscode language model
文章目录一、说明二、配置环境2.1安装虚拟环境2.2安装vllm2.3对应版本的pytorch安装2.4安装flash_attn2.5下载模型三、运行代码3.1启动服务3.2调用代码验证一、说明本文主要介绍vllm本地部署BAAI/bge-reranker-v2-m3模型API服务实战教程本文是在Ubuntu24.04+CUDA12.8+Python3.12环境下复现成功的二、配置环境2.1安装虚
深度学习篇---昇腾NPU&CANN 工具包 Atticus-Orion 上位机知识篇图像处理篇深度学习篇深度学习人工智能 NPU 昇腾 CANN
介绍昇腾NPU是华为推出的神经网络处理器，具有强大的AI计算能力，而CANN工具包则是面向AI场景的异构计算架构，用于发挥昇腾NPU的性能优势。以下是详细介绍：昇腾NPU架构设计：采用达芬奇架构，是一个片上系统，主要由特制的计算单元、大容量的存储单元和相应的控制单元组成。集成了多个CPU核心，包括控制CPU和AICPU，前者用于控制处理器整体运行，后者承担非矩阵类复杂计算。此外，还拥有AICore
深度学习图像分类数据集—桃子识别分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：桃子识别分类：['B1','M2','R0','S3']训练数据集总共有6637张图片，每个文件夹单独放一种数据各子文件夹图片统计:·B1:1601张图片·M2:1800张图片·R0:1601张图片·S3:
法律科技领域人工智能代理构建的十个经验教训，一位人工智能工程师通过构建、部署和维护智能代理的经验教训来优化法律工作流程的历程。知识大胖 NVIDIA GPU和大语言模型开发教程人工智能 ai
目录介绍什么是代理人？为什么它对法律如此重要？法律技术中代理用例示例-合同审查代理-法律研究代理在LegalTech中使用代理的十个教训-教训1：即使代理很酷，它们也不能解决所有问题-教训2：选择最适合您用例的框架-教训3：能够快速迭代不同的模型-教训4：从简单开始，必要时扩展-教训5：使用跟踪解决方案；您将需要它-教训6：确保跟踪成本，代理循环可能很昂贵-教训7：将控制权交给最终用户（人在环路中
Llama-Omni会说话的人工智能“语音到语音LLM” 利用低延迟、高质量语音转语音 AI 彻底改变对话方式（教程含源码）知识大胖 NVIDIA GPU和大语言模型开发教程 llama 人工智能 nvidia llm
介绍“单靠技术是不够的——技术与文科、人文学科的结合，才能产生让我们心花怒放的成果。”——史蒂夫·乔布斯近年来，人机交互领域发生了重大变化，尤其是随着ChatGPT、GPT-4等大型语言模型(LLM)的出现。虽然这些模型主要基于文本，但人们对语音交互的兴趣日益浓厚，以使人机对话更加无缝和自然。然而，实现语音交互而不受语音转文本处理中常见的延迟和错误的影响仍然是一个挑战。关键字：Llama-Omni
什么是热力学计算？它如何帮助人工智能发展？知识大胖 NVIDIA GPU和大语言模型开发教程人工智能量子计算
现代计算的基础是晶体管，这是一种微型电子开关，可以用它构建逻辑门，从而创建CPU或GPU等复杂的数字电路。随着技术的进步，晶体管变得越来越小。根据摩尔定律，集成电路中晶体管的数量大约每两年增加一倍。这种指数级增长使得计算技术呈指数级发展。然而，晶体管尺寸的缩小是有限度的。我们很快就会达到晶体管无法工作的阈值。此外，人工智能的进步使得对计算能力的需求比以往任何时候都更加迫切。根本问题是自然是随机的（
上海交大：工具增强推理agent
标题：SciMaster:TowardsGeneral-PurposeScientificAIAgentsPartI.X-MasterasFoundation-CanWeLeadonHumanity’sLastExam?来源：arXiv,2507.05241摘要人工智能代理的快速发展激发了利用它们加速科学发现的长期雄心。实现这一目标需要深入了解人类知识的前沿。因此，人类的最后一次考试（HLE）为评
微算法科技的前沿探索：量子机器学习算法在视觉任务中的革新应用 MicroTech2025 量子计算算法
在信息技术飞速发展的今天，计算机视觉作为人工智能领域的重要分支，正逐步渗透到我们生活的方方面面。从自动驾驶到人脸识别，从医疗影像分析到安防监控，计算机视觉技术展现了巨大的应用潜力。然而，随着视觉任务复杂度的不断提升，传统机器学习算法在处理大规模、高维度数据时遇到了计算瓶颈。在此背景下，量子计算作为一种颠覆性的计算模式，以其独特的并行处理能力和指数级增长的计算空间，为解决这一难题提供了新的思路。微算
中国银联豪掷1亿采购海光C86架构服务器信创新态势海光芯片 C86 国产芯片海光信息
近日，中国银联国产服务器采购大单正式敲定，基于海光C86架构的服务器产品中标，项目金额超过1亿元。接下来，C86服务器将用于支撑中国银联的虚拟化、大数据、人工智能、研发测试等技术场景，进一步提升其业务处理能力、用户服务效率和信息安全水平。作为我国重要的银行卡组织和金融基础设施，中国银联在全球183个国家和地区设有银联受理网络，境内外成员机构超过2600家，是世界三大银行卡品牌之一。此次中国银联发力
AI人工智能浪潮中文心一言的独特优势
AI人工智能浪潮中文心一言的独特优势：为什么它是中国市场的“AI主力军”？关键词：文心一言,AI大模型,中文处理,多模态融合,产业落地,安全可控,百度ERNIE摘要：在全球AI大模型浪潮中，百度文心一言（ERNIEBot）凭借“懂中文、会多模态、能落地、守规矩”的四大核心优势，成为中国市场最具竞争力的AI产品之一。本文将用“超级大脑”的比喻，从中文理解、多模态能力、产业生态融合、安全可控性四个维度
正义的算法迷宫—人工智能重构司法体系的技术悖论与文明试炼
一、法庭的数字化迁徙当美国威斯康星州法院采纳COMPAS算法评估被告再犯风险，当中国"智慧法院"系统年处理1.2亿件案件，司法体系正经历从石柱法典到代码裁判的范式革命。这场转型的核心驱动力是司法效率与公正的永恒张力：美国重罪案件平均审理周期达18个月，中国基层法官年人均结案357件（是德国同行的6倍），而算法能在0.3秒内完成百万份文书比对。人工智能渗透司法引发三重裂变：证据分析从经验推断转向数据
NumPy-@运算符详解 GG不是gg numpy numpy
NumPy-@运算符详解一、@运算符的起源与设计目标1.从数学到代码：符号的统一2.设计目标二、@运算符的核心语法与运算规则1.基础用法：二维矩阵乘法2.一维向量的矩阵语义3.高维数组：批次矩阵运算4.广播机制：灵活的形状匹配三、@运算符与其他乘法方式的核心区别1.对比`np.dot()`2.对比元素级乘法`*`3.对比`np.matrix`的`*`运算符四、典型应用场景：从基础到高阶1.深度学习
ASM系列五利用TreeApi 解析生成Class lijingyao8206 ASM 字节码动态生成 ClassNode TreeAPI
前面CoreApi的介绍部分基本涵盖了ASMCore包下面的主要API及功能，其中还有一部分关于MetaData的解析和生成就不再赘述。这篇开始介绍ASM另一部分主要的Api。TreeApi。这一部分源码是关联的asm-tree-5.0.4的版本。在介绍前，先要知道一点， Tree工程的接口基本可以完
链表树——复合数据结构应用实例 bardo 数据结构树型结构表结构设计链表菜单排序
我们清楚：数据库设计中，表结构设计的好坏，直接影响程序的复杂度。所以，本文就无限级分类（目录）树与链表的复合在表设计中的应用进行探讨。当然，什么是树，什么是链表，这里不作介绍。有兴趣可以去看相关的教材。需求简介：经常遇到这样的需求，我们希望能将保存在数据库中的树结构能够按确定的顺序读出来。比如，多级菜单、组织结构、商品分类。更具体的，我们希望某个二级菜单在这一级别中就是第一个。虽然它是最后
为啥要用位运算代替取模呢 chenchao051 位运算哈希汇编
在hash中查找key的时候，经常会发现用&取代%，先看两段代码吧， JDK6中的HashMap中的indexFor方法： /** * Returns index for hash code h. */ static int indexFor(int h, int length) {
最近的情况麦田的设计者生活感悟计划软考想
今天是2015年4月27号整理一下最近的思绪以及要完成的任务 1、最近在驾校科目二练车，每周四天，练三周。其实做什么都要用心，追求合理的途径解决。为
PHP去掉字符串中最后一个字符的方法 IT独行者 PHP 字符串
今天在PHP项目开发中遇到一个需求，去掉字符串中的最后一个字符原字符串1,2,3,4,5,6, 去掉最后一个字符","，最终结果为1,2,3,4,5,6 代码如下： $str = "1,2,3,4,5,6,"; $newstr = substr($str,0,strlen($str)-1); echo $newstr;
hadoop在linux上单机安装过程 _wy_ linux hadoop
1、安装JDK jdk版本最好是1.6以上，可以使用执行命令java -version查看当前JAVA版本号，如果报命令不存在或版本比较低，则需要安装一个高版本的JDK，并在/etc/profile的文件末尾，根据本机JDK实际的安装位置加上以下几行： export JAVA_HOME=/usr/java/jdk1.7.0_25
JAVA进阶----分布式事务的一种简单处理方法无量多系统交互分布式事务
每个方法都是原子操作：提供第三方服务的系统，要同时提供执行方法和对应的回滚方法 A系统调用B,C,D系统完成分布式事务 =========执行开始======== A.aa(); try { B.bb(); } catch(Exception e) { A.rollbackAa(); } try { C.cc(); } catch(Excep
安墨移动广告：移动DSP厚积薄发引领未来广告业发展命脉矮蛋蛋 hadoop 互联网
　　“谁掌握了强大的DSP技术，谁将引领未来的广告行业发展命脉。”2014年，移动广告行业的热点非移动DSP莫属。各个圈子都在纷纷谈论，认为移动DSP是行业突破点，一时间许多移动广告联盟风起云涌，竞相推出专属移动DSP产品。　　到底什么是移动DSP呢? 　　DSP(Demand-SidePlatform)，就是需求方平台，为解决广告主投放的各种需求，真正实现人群定位的精准广
myelipse设置 alafqq IP
在一个项目的完整的生命周期中，其维护费用，往往是其开发费用的数倍。因此项目的可维护性、可复用性是衡量一个项目好坏的关键。而注释则是可维护性中必不可少的一环。注释模板导入步骤安装方法：打开eclipse/myeclipse 选择 window-->Preferences-->JAVA-->Code-->Code
java数组百合不是茶 java数组
java数组的声明创建初始化； java支持C语言数组中的每个数都有唯一的一个下标一维数组的定义声明： int[] a = new int[3];声明数组中有三个数int[3] int[] a 中有三个数，下标从0开始，可以同过for来遍历数组中的数
javascript读取表单数据 bijian1013 JavaScript
利用javascript读取表单数据，可以利用以下三种方法获取： 1、通过表单ID属性：var a = document.getElementByIdx_x_x("id"); 2、通过表单名称属性：var b = document.getElementsByName("name"); 3、直接通过表单名字获取：var c = form.content.
探索JUnit4扩展：使用Theory bijian1013 java JUnit Theory
理论机制（Theory）一.为什么要引用理论机制（Theory）当今软件开发中，测试驱动开发（TDD — Test-driven development）越发流行。为什么 TDD 会如此流行呢？因为它确实拥有很多优点，它允许开发人员通过简单的例子来指定和表明他们代码的行为意图。 TDD 的优点： &nb
[Spring Data Mongo一]Spring Mongo Template操作MongoDB bit1129 template
什么是Spring Data Mongo Spring Data MongoDB项目对访问MongoDB的Java客户端API进行了封装，这种封装类似于Spring封装Hibernate和JDBC而提供的HibernateTemplate和JDBCTemplate，主要能力包括 1. 封装客户端跟MongoDB的链接管理 2. 文档-对象映射，通过注解:@Document(collectio
【Kafka八】Zookeeper上关于Kafka的配置信息 bit1129 zookeeper
问题： 1. Kafka的哪些信息记录在Zookeeper中 2. Consumer Group消费的每个Partition的Offset信息存放在什么位置 3. Topic的每个Partition存放在哪个Broker上的信息存放在哪里 4. Producer跟Zookeeper究竟有没有关系？没有关系！！！ //consumers、config、brokers、cont
java OOM内存异常的四种类型及异常与解决方案 ronin47 java OOM 内存异常
　OOM异常的四种类型：　　　　　一：　StackOverflowError ：通常因为递归函数引起（死递归，递归太深）。-Xss 128k 一般够用。　二：　out Of memory: PermGen Space：通常是动态类大多，比如web 服务器自动更新部署时引起。-Xmx
java-实现链表反转-递归和非递归实现 bylijinnan java
20120422更新：对链表中部分节点进行反转操作，这些节点相隔k个： 0->1->2->3->4->5->6->7->8->9 k=2 8->1->6->3->4->5->2->7->0->9 注意1 3 5 7 9 位置是不变的。解法：将链表拆成两部分： a.0-&
Netty源码学习-DelimiterBasedFrameDecoder bylijinnan java netty
看DelimiterBasedFrameDecoder的API，有举例：接收到的ChannelBuffer如下： +--------------+ | ABC\nDEF\r\n | +--------------+ 经过DelimiterBasedFrameDecoder(Delimiters.lineDelimiter())之后，得到： +-----+----
linux的一些命令 -查看cc攻击-网口ip统计等 hotsunshine linux
Linux判断CC攻击命令详解 2011年12月23日 ⁄ 安全 ⁄ 暂无评论查看所有80端口的连接数 netstat -nat|grep -i '80'|wc -l 对连接的IP按连接数量进行排序 netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n 查看TCP连接状态 n
Spring获取SessionFactory ctrain sessionFactory
String sql = "select sysdate from dual"; WebApplicationContext wac = ContextLoader.getCurrentWebApplicationContext(); String[] names = wac.getBeanDefinitionNames(); for(int i=0; i&
Hive几种导出数据方式 daizj hive 数据导出
Hive几种导出数据方式 1.拷贝文件如果数据文件恰好是用户需要的格式，那么只需要拷贝文件或文件夹就可以。 hadoop fs –cp source_path target_path 2.导出到本地文件系统 --不能使用insert into local directory来导出数据，会报错 --只能使用
编程之美 dcj3sjt126com 编程 PHP 重构
我个人的 PHP 编程经验中，递归调用常常与静态变量使用。静态变量的含义可以参考 PHP 手册。希望下面的代码，会更有利于对递归以及静态变量的理解 header("Content-type: text/plain"); function static_function () { static $i = 0; if ($i++ < 1
Android保存用户名和密码 dcj3sjt126com android
转自：http://www.2cto.com/kf/201401/272336.html 我们不管在开发一个项目或者使用别人的项目，都有用户登录功能，为了让用户的体验效果更好，我们通常会做一个功能，叫做保存用户，这样做的目地就是为了让用户下一次再使用该程序不会重新输入用户名和密码，这里我使用3种方式来存储用户名和密码 1、通过普通的txt文本存储 2、通过properties属性文件进行存
Oracle 复习笔记之同义词 eksliang Oracle 同义词 Oracle synonym
转载请出自出处：http://eksliang.iteye.com/blog/2098861 1.什么是同义词同义词是现有模式对象的一个别名。概念性的东西，什么是模式呢？创建一个用户，就相应的创建了一个模式。模式是指数据库对象，是对用户所创建的数据对象的总称。模式对象包括表、视图、索引、同义词、序列、过
Ajax案例 gongmeitao Ajax jsp
数据库采用Sql Server2005 项目名称为:Ajax_Demo 1.com.demo.conn包 package com.demo.conn; import java.sql.Connection;import java.sql.DriverManager;import java.sql.SQLException; //获取数据库连接的类public class DBConnec
ASP.NET中Request.RawUrl、Request.Url的区别 hvt .net Web C#asp.net hovertree
如果访问的地址是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree%3C&n=myslider#zonemenu那么Request.Url.ToString() 的值是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree<&
SVG 教程（七）SVG 实例，SVG 参考手册天梯梦 svg
SVG 实例在线实例下面的例子是把SVG代码直接嵌入到HTML代码中。谷歌Chrome，火狐，Internet Explorer9，和Safari都支持。注意：下面的例子将不会在Opera运行，即使Opera支持SVG - 它也不支持SVG在HTML代码中直接使用。 SVG 实例 SVG基本形状一个圆矩形不透明矩形一个矩形不透明2 一个带圆角矩
事务管理 luyulong java spring 编程事务
事物管理 spring事物的好处为不同的事物API提供了一致的编程模型支持声明式事务管理提供比大多数事务API更简单更易于使用的编程式事务管理API 整合spring的各种数据访问抽象 TransactionDefinition 定义了事务策略 int getIsolationLevel()得到当前事务的隔离级别 READ_COMMITTED
基础数据结构和算法十一：Red-black binary search tree sunwinner Algorithm Red-black
The insertion algorithm for 2-3 trees just described is not difficult to understand; now, we will see that it is also not difficult to implement. We will consider a simple representation known
centos同步时间 stunizhengjia linux 集群同步时间
做了集群，时间的同步就显得非常必要了。以下是查到的如何做时间同步。在CentOS 5不再区分客户端和服务器，只要配置了NTP，它就会提供NTP服务。 1)确认已经ntp程序包： # yum install ntp 2)配置时间源（默认就行，不需要修改） # vi /etc/ntp.conf server pool.ntp.o
ITeye 9月技术图书有奖试读获奖名单公布 ITeye管理员 ITeye
ITeye携手博文视点举办的9月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 9月试读活动回顾：http://webmaster.iteye.com/blog/2118112本次技术图书试读活动的优秀奖获奖名单及相应作品如下（优秀文章有很多，但名额有限，没获奖并不代表不优秀）：《NFC：Arduino、Andro