图像识别技术的过程分以下几步:
VGG之所以经典,在于它首次将深度学习做得非常“深”,达到了16-19层,同时,它用了非常“小”的卷积核(3X3),
VGG给我们的启发只是在于我们可以让网络变得更深,并在此基础上注意过拟合的问题。
VGG使用比较小的卷积核,如33的卷积核,而Alexnet使用了比较大的卷积核,如1111卷积核,使用小的卷积核的好处在于可以获得更加细节的信息。
卷积层和全连接层的区别:卷积层为局部连接;而全连接层则使用图像的全局信息。可以想象一下,最大的局部是不是就等于全局了?这首先说明全连接层使用卷积层来替代的可行性。
eg: 输入有[5044]个神经元结点,输出有500个节点,则一共需要5044*500=400000个权值参数W和 500个偏置参数b
卷积层和全连接层都是进行了一个点乘操作,它们的函数形式相同。因此全连接层可以转化为对应的卷积层。我们只需要把卷积核变成跟输入的一个feature map大小(h,w)一样就可以了,这样的话就相当于使得卷积跟全连接层的参数一样多。
比如VGG16中, 第一个全连接层的输入是77512, 输出是4096。这可以用一个卷积核大小77, 步长(stride)为1, 没有填补(padding),输出通道数4096的卷积层等效表示,其输出为11*4096,和全连接层等价。后续的全连接层可以用1x1卷积等效替代。
简而言之, 全连接层转化为卷积层的规则是:将卷积核大小设置为输入的空间大小。
实现特征通道的升维和降维
通过控制卷积核的数量达到通道数大小的放缩。而池化层只能改变高度和宽度,无法改变通道数。
import torch.nn as nn
class VGG16(nn.Module):
def __init__(self, num_classes=1000):
super(VGG16, self).__init__()
# VGG16 有五个卷积块
# 下面我们定义每一个卷积块内部的卷积层结构
# 其中,'M' 代表最大池化层
self.features = nn.Sequential(
# 第一个卷积块包含两个卷积层,每个都有 64 个 3x3 的卷积核
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# 第二个卷积块包含两个卷积层,每个都有 128 个 3x3 的卷积核
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# 第三个卷积块包含三个卷积层,每个都有 256 个 3x3 的卷积核
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# 第四个卷积块包含三个卷积层,每个都有 512 个 3x3 的卷积核
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# 第五个卷积块也包含三个卷积层,每个都有 512 个 3x3 的卷积核
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
)
# VGG16 的三个全连接层
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096), # 7x7 是特征图的大小(假设输入图像是 224x224)
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1) # flatten the tensor
x = self.classifier(x)
return x
# 实例化 VGG16 模型
model = VGG16()
print(model)
Residual net(残差网络):将靠前若干层的某一层数据输出直接跳过多层引入到后面数据层的输入 部分。
残差神经单元:假定某段神经网络的输入是x,期望输出是H(x),如果我们直接将输入x传到输出作为初始结果,那么我们需要学习的目标就是F(x) = H(x) - x,这就是一个残差神经单元,相当于将学习目标改变了,不再是学习一个完整的输出H(x),只是输出和输入的差别 H(x) - x ,即残差。
ResNet50有两个基本的块,分别名为Conv Block和Identity Block,其中Conv Block输入和输出的维度是不一样的,所以不能连续串联,它的作用是改变网络的维度;Identity Block输入维度和输出维度相同,可以串联,用于加深网络的。
BatchNormalization:
import torch
import torch.nn as nn
class IdentityBlock(nn.Module):
def __init__(self, in_channels, filters, kernel_size, stride):
super(IdentityBlock, self).__init__()
filters1, filters2, filters3 = filters
# 主路径的第一部分
self.conv1 = nn.Conv2d(in_channels, filters1, (1, 1))
self.bn1 = nn.BatchNorm2d(filters1)
self.relu1 = nn.ReLU()
# 主路径的第二部分
self.conv2 = nn.Conv2d(filters1, filters2, kernel_size, stride=stride, padding=1)
self.bn2 = nn.BatchNorm2d(filters2)
self.relu2 = nn.ReLU()
# 主路径的第三部分
self.conv3 = nn.Conv2d(filters2, filters3, (1, 1))
self.bn3 = nn.BatchNorm2d(filters3)
self.relu = nn.ReLU()
def forward(self, x):
identity = x
x = self.conv1(x)
x = self.bn1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu2(x)
x = self.conv3(x)
x = self.bn3(x)
x += identity
x = self.relu(x)
return x
class ConvBlock(nn.Module):
def __init__(self, in_channels, filters, kernel_size, stride):
super(ConvBlock, self).__init__()
filters1, filters2, filters3 = filters
# 主路径的第一部分
self.conv1 = nn.Conv2d(in_channels, filters1, (1, 1), stride=stride)
self.bn1 = nn.BatchNorm2d(filters1)
self.relu1 = nn.ReLU()
# 主路径的第二部分
self.conv2 = nn.Conv2d(filters1, filters2, kernel_size, padding=1)
self.bn2 = nn.BatchNorm2d(filters2)
self.relu2 = nn.ReLU()
# 主路径的第三部分
self.conv3 = nn.Conv2d(filters2, filters3, (1, 1))
self.bn3 = nn.BatchNorm2d(filters3)
# 快捷路径
self.shortcut_conv = nn.Conv2d(in_channels, filters3, (1, 1), stride=stride)
self.shortcut_bn = nn.BatchNorm2d(filters3)
self.relu = nn.ReLU()
def forward(self, x):
identity = self.shortcut_conv(x)
identity = self.shortcut_bn(identity)
x = self.conv1(x)
x = self.bn1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu2(x)
x = self.conv3(x)
x = self.bn3(x)
x += identity
x = self.relu(x)
return x
class ResNet50(nn.Module):
def __init__(self, num_classes=1000):
super(ResNet50, self).__init__()
self.pad = nn.ZeroPad2d((3, 3, 3, 3))
self.conv1 = nn.Conv2d(3, 64, (7, 7), stride=(2, 2))
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool2d((3, 3), stride=(2, 2))
# 第一阶段
self.stage2a = ConvBlock(64, [64, 64, 256], 3, stride=(1, 1))
self.stage2b = IdentityBlock(256, [64, 64, 256], 3, stride=(1, 1))
self.stage2c = IdentityBlock(256, [64, 64, 256], 3, stride=(1, 1))
# 第二阶段
self.stage3a = ConvBlock(256, [128, 128, 512], 3, stride=(2, 2))
self.stage3b = IdentityBlock(512, [128, 128, 512], 3, stride=(1, 1))
# ... 继续其余的阶段
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(2048, num_classes)
def forward(self, x):
x = self.pad(x)
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
# 第一阶段
x = self.stage2a(x)
x = self.stage2b(x)
x = self.stage2c(x)
# 第二阶段
x = self.stage3a(x)
x = self.stage3b(x)
# ... 继续其余的阶段
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
model = ResNet50()
print(model)
常见的两类迁移学习场景:
Inception 网络是 CNN 发展史上一个重要的里程碑。在 Inception 出现之前,大部分流行CNN 仅仅是把卷积层堆叠得越来越多,使网络越来越深,以此希望能够得到更好的性能。但是存在以下问题:
解决方案:
为什么不在同一层级上运行具备多个尺寸的滤波器呢?网络本质上会变得稍微「宽一些」,而不是「更深」。作者因此设计了 Inception 模块。
Inception 模块( Inception module ):它使用 3 个不同大小的滤波器(1x1、3x3、5x5)对输入执行卷积操作,此外它还会执行最大池化。所有子层的输出最后会被级联起来,并传送至下一个Inception 模块。
一方面增加了网络的宽度,另一方面增加了网络对尺度的适应性
实现降维的 Inception 模块:
如前所述,深度神经网络需要耗费大量计算资源。为了降低算力成本,作者在 3x3 和 5x5 卷积层之前添加额外的 1x1 卷积层,来限制输入通道的数量。尽管添加额外的卷积操作似乎是反直觉的,但是 1x1 卷积比 5x5 卷积要廉价很多,而且输入通道数量减少也有利于降低算力成本。
Inception V2 在输入的时候增加了BatchNormalization:
V2
此外,作者将 n*n 的卷积核尺寸分解为 1×n 和 n×1 两个卷积。
前面三个原则用来构建三种不同类型 的 Inception 模块
InceptionV3 整合了前面 Inception v2 中提到的所有升级,还使用了7x7 卷积
Inception V3设计思想和Trick:
总的来说,Inception v4中基本的Inception module还是沿袭了Inception v2/v3的结构,只是结构看起来更加简洁统一,并且使用更多的Inception module,实验效果也更好。
Inception模型优势:
#-------------------------------------------------------------#
# InceptionV3的网络部分
#-------------------------------------------------------------#
from __future__ import print_function
from __future__ import absolute_import
import warnings
import numpy as np
from keras.models import Model
from keras import layers
from keras.layers import Activation,Dense,Input,BatchNormalization,Conv2D,MaxPooling2D,AveragePooling2D
from keras.layers import GlobalAveragePooling2D,GlobalMaxPooling2D
from keras.engine.topology import get_source_inputs
from keras.utils.layer_utils import convert_all_kernels_in_model
from keras.utils.data_utils import get_file
from keras import backend as K
from keras.applications.imagenet_utils import decode_predictions
from keras.preprocessing import image
def conv2d_bn(x,
filters,
num_row,
num_col,
strides=(1, 1),
padding='same',
name=None):
if name is not None:
bn_name = name + '_bn'
conv_name = name + '_conv'
else:
bn_name = None
conv_name = None
x = Conv2D(
filters, (num_row, num_col),
strides=strides,
padding=padding,
use_bias=False,
name=conv_name)(x)
x = BatchNormalization(scale=False, name=bn_name)(x)
x = Activation('relu', name=name)(x)
return x
def InceptionV3(input_shape=[299,299,3],
classes=1000):
img_input = Input(shape=input_shape)
x = conv2d_bn(img_input, 32, 3, 3, strides=(2, 2), padding='valid')
x = conv2d_bn(x, 32, 3, 3, padding='valid')
x = conv2d_bn(x, 64, 3, 3)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = conv2d_bn(x, 80, 1, 1, padding='valid')
x = conv2d_bn(x, 192, 3, 3, padding='valid')
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
#--------------------------------#
# Block1 35x35
#--------------------------------#
# Block1 part1
# 35 x 35 x 192 -> 35 x 35 x 256
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 32, 1, 1)
# 64+64+96+32 = 256 nhwc-0123
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=3,
name='mixed0')
# Block1 part2
# 35 x 35 x 256 -> 35 x 35 x 288
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
# 64+64+96+64 = 288
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=3,
name='mixed1')
# Block1 part3
# 35 x 35 x 288 -> 35 x 35 x 288
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
# 64+64+96+64 = 288
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=3,
name='mixed2')
#--------------------------------#
# Block2 17x17
#--------------------------------#
# Block2 part1
# 35 x 35 x 288 -> 17 x 17 x 768
branch3x3 = conv2d_bn(x, 384, 3, 3, strides=(2, 2), padding='valid')
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(
branch3x3dbl, 96, 3, 3, strides=(2, 2), padding='valid')
branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = layers.concatenate(
[branch3x3, branch3x3dbl, branch_pool], axis=3, name='mixed3')
# Block2 part2
# 17 x 17 x 768 -> 17 x 17 x 768
branch1x1 = conv2d_bn(x, 192, 1, 1)
branch7x7 = conv2d_bn(x, 128, 1, 1)
branch7x7 = conv2d_bn(branch7x7, 128, 1, 7)
branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
branch7x7dbl = conv2d_bn(x, 128, 1, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 1, 7)
branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
x = layers.concatenate(
[branch1x1, branch7x7, branch7x7dbl, branch_pool],
axis=3,
name='mixed4')
# Block2 part3 and part4
# 17 x 17 x 768 -> 17 x 17 x 768 -> 17 x 17 x 768
for i in range(2):
branch1x1 = conv2d_bn(x, 192, 1, 1)
branch7x7 = conv2d_bn(x, 160, 1, 1)
branch7x7 = conv2d_bn(branch7x7, 160, 1, 7)
branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
branch7x7dbl = conv2d_bn(x, 160, 1, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 1, 7)
branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
branch_pool = AveragePooling2D(
(3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
x = layers.concatenate(
[branch1x1, branch7x7, branch7x7dbl, branch_pool],
axis=3,
name='mixed' + str(5 + i))
# Block2 part5
# 17 x 17 x 768 -> 17 x 17 x 768
branch1x1 = conv2d_bn(x, 192, 1, 1)
branch7x7 = conv2d_bn(x, 192, 1, 1)
branch7x7 = conv2d_bn(branch7x7, 192, 1, 7)
branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
branch7x7dbl = conv2d_bn(x, 192, 1, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
x = layers.concatenate(
[branch1x1, branch7x7, branch7x7dbl, branch_pool],
axis=3,
name='mixed7')
#--------------------------------#
# Block3 8x8
#--------------------------------#
# Block3 part1
# 17 x 17 x 768 -> 8 x 8 x 1280
branch3x3 = conv2d_bn(x, 192, 1, 1)
branch3x3 = conv2d_bn(branch3x3, 320, 3, 3,
strides=(2, 2), padding='valid')
branch7x7x3 = conv2d_bn(x, 192, 1, 1)
branch7x7x3 = conv2d_bn(branch7x7x3, 192, 1, 7)
branch7x7x3 = conv2d_bn(branch7x7x3, 192, 7, 1)
branch7x7x3 = conv2d_bn(
branch7x7x3, 192, 3, 3, strides=(2, 2), padding='valid')
branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = layers.concatenate(
[branch3x3, branch7x7x3, branch_pool], axis=3, name='mixed8')
# Block3 part2 part3
# 8 x 8 x 1280 -> 8 x 8 x 2048 -> 8 x 8 x 2048
for i in range(2):
branch1x1 = conv2d_bn(x, 320, 1, 1)
branch3x3 = conv2d_bn(x, 384, 1, 1)
branch3x3_1 = conv2d_bn(branch3x3, 384, 1, 3)
branch3x3_2 = conv2d_bn(branch3x3, 384, 3, 1)
branch3x3 = layers.concatenate(
[branch3x3_1, branch3x3_2], axis=3, name='mixed9_' + str(i))
branch3x3dbl = conv2d_bn(x, 448, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 384, 3, 3)
branch3x3dbl_1 = conv2d_bn(branch3x3dbl, 384, 1, 3)
branch3x3dbl_2 = conv2d_bn(branch3x3dbl, 384, 3, 1)
branch3x3dbl = layers.concatenate(
[branch3x3dbl_1, branch3x3dbl_2], axis=3)
branch_pool = AveragePooling2D(
(3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
x = layers.concatenate(
[branch1x1, branch3x3, branch3x3dbl, branch_pool],
axis=3,
name='mixed' + str(9 + i))
# 平均池化后全连接。
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)
inputs = img_input
model = Model(inputs, x, name='inception_v3')
return model
def preprocess_input(x):
x /= 255.
x -= 0.5
x *= 2.
return x
if __name__ == '__main__':
model = InceptionV3()
model.load_weights("inception_v3_weights_tf_dim_ordering_tf_kernels.h5")
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(299, 299))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))
MobileNet模型是Google针对手机等嵌入式设备提出的一种轻量级的深层神经网络,其使用的核心思想便是深度可分离卷积 depthwise separable convolution。
对于一个卷积点而言:
假设有一个3×3大小的卷积层,其输入通道为16、输出通道为32。具体为,32个3×3大小的卷积核会遍历16个通道中的每个数据,最后可得到所需的32个输出通道,所需参数为16×32×3×3=4608个。
应用深度可分离卷积,用16个3×3大小的卷积核分别遍历16通道的数据,得到了16个特征图谱,接着用32个1×1大小的卷积核遍历这16个特征图谱,所需参数为16×3×3+16×32×1×1=656个。
可以看出来depthwise separable convolution可以减少模型的参数。
#-------------------------------------------------------------#
# MobileNet的网络部分
#-------------------------------------------------------------#
import warnings
import numpy as np
from keras.preprocessing import image
from keras.models import Model
from keras.layers import DepthwiseConv2D,Input,Activation,Dropout,Reshape,BatchNormalization,GlobalAveragePooling2D,GlobalMaxPooling2D,Conv2D
from keras.applications.imagenet_utils import decode_predictions
from keras import backend as K
def MobileNet(input_shape=[224,224,3],
depth_multiplier=1,
dropout=1e-3,
classes=1000):
img_input = Input(shape=input_shape)
# 224,224,3 -> 112,112,32
x = _conv_block(img_input, 32, strides=(2, 2))
# 112,112,32 -> 112,112,64
x = _depthwise_conv_block(x, 64, depth_multiplier, block_id=1)
# 112,112,64 -> 56,56,128
x = _depthwise_conv_block(x, 128, depth_multiplier,
strides=(2, 2), block_id=2)
# 56,56,128 -> 56,56,128
x = _depthwise_conv_block(x, 128, depth_multiplier, block_id=3)
# 56,56,128 -> 28,28,256
x = _depthwise_conv_block(x, 256, depth_multiplier,
strides=(2, 2), block_id=4)
# 28,28,256 -> 28,28,256
x = _depthwise_conv_block(x, 256, depth_multiplier, block_id=5)
# 28,28,256 -> 14,14,512
x = _depthwise_conv_block(x, 512, depth_multiplier,
strides=(2, 2), block_id=6)
# 14,14,512 -> 14,14,512
x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=7)
x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=8)
x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=9)
x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=10)
x = _depthwise_conv_block(x, 512, depth_multiplier, block_id=11)
# 14,14,512 -> 7,7,1024
x = _depthwise_conv_block(x, 1024, depth_multiplier,
strides=(2, 2), block_id=12)
x = _depthwise_conv_block(x, 1024, depth_multiplier, block_id=13)
# 7,7,1024 -> 1,1,1024
x = GlobalAveragePooling2D()(x)
x = Reshape((1, 1, 1024), name='reshape_1')(x)
x = Dropout(dropout, name='dropout')(x)
x = Conv2D(classes, (1, 1),padding='same', name='conv_preds')(x)
x = Activation('softmax', name='act_softmax')(x)
x = Reshape((classes,), name='reshape_2')(x)
inputs = img_input
model = Model(inputs, x, name='mobilenet_1_0_224_tf')
model_name = 'mobilenet_1_0_224_tf.h5'
model.load_weights(model_name)
return model
def _conv_block(inputs, filters, kernel=(3, 3), strides=(1, 1)):
x = Conv2D(filters, kernel,
padding='same',
use_bias=False,
strides=strides,
name='conv1')(inputs)
x = BatchNormalization(name='conv1_bn')(x)
return Activation(relu6, name='conv1_relu')(x)
def _depthwise_conv_block(inputs, pointwise_conv_filters,
depth_multiplier=1, strides=(1, 1), block_id=1):
x = DepthwiseConv2D((3, 3),
padding='same',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False,
name='conv_dw_%d' % block_id)(inputs)
x = BatchNormalization(name='conv_dw_%d_bn' % block_id)(x)
x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
x = Conv2D(pointwise_conv_filters, (1, 1),
padding='same',
use_bias=False,
strides=(1, 1),
name='conv_pw_%d' % block_id)(x)
x = BatchNormalization(name='conv_pw_%d_bn' % block_id)(x)
return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)
def relu6(x):
return K.relu(x, max_value=6)
def preprocess_input(x):
x /= 255.
x -= 0.5
x *= 2.
return x
if __name__ == '__main__':
model = MobileNet(input_shape=(224, 224, 3))
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', x.shape)
preds = model.predict(x)
print(np.argmax(preds))
print('Predicted:', decode_predictions(preds,1)) # 只显示top1
问题背景:
1) 架构遵循应用
你也许会被 Google Brain 或者 Deep Mind 这些有想象力的实验室所发明的那些耀眼的新模型所吸引,但是其中许多要么是不可能实现的,要么是不实用的对于你的需求。或许你应该使用对你的特定应用最有意义的模型,这种模型或许非常简单,但是仍然很强大,例如 VGG。
2) 路径的激增
每年ImageNet Challenge的赢家都比上一年的冠军使用更加深层的网络。从AlexNet 到Inception到Resnets,有”网络的路径数量成倍增长”的趋势。
3) 追求简约
更大的并不一定是更好的。
4)增加对称性
无论是在建筑上,还是在生物上,对称性被认为是质量和工艺的标志。
5) 金字塔形状
你总是在表征能力和减少冗余或者无用信息之间权衡。CNNs通常会降低激活函数的采样,并会增加从输入层到最终层之间的连接通道。
6) 过渡训练
另一个权衡是训练准确度和泛化能力。用正则化的方法类似 drop-out 或 drop-path进行提升泛化能力,这是神经网络的重要优势。用比实际用例更难的问题训练网络,以提高泛化性能。
7) 覆盖问题的空间
为了扩大训练数据和提升泛化能力,要使用噪声和人工增加训练集的大小。例如随机旋转、裁剪和一些图像增强操作。
8) 递增的功能结构
当架构变得成功时,它们会简化每一层的“工作”。在非常深的神经网络中,每个层只会递增地修改输入。在ResNets中,每一层的输出可能类似于输入。所以,在实践中,请在ResNet中使用短的跳过长度。
9) 标准化层的输入
标准化是可以使计算层的工作变得更加容易的一条捷径,并且在实际中可以提升训练的准确性。标准化把所有层的输入样本放在了一个平等的基础上(类似于单位转换),这允许反向传播可以更有效地训练。
10)使用微调过的预训练网络(fine tuning)
机器学习公司 Diffbot 的 CEO Mike Tung 说,“如果你的视觉数据和 ImageNet 相似,那么用预训练网络会帮助你学习得更快”。低水平的CNN通常可以被重复使用,因为它们大多能够检测到像线条和边缘这些常见的模式。比如,用自己设计的层替换分类层,并且用你特定的数据去训练最后的几个层。
11)使用循环的学习率
学习率的实验会消耗大量的时间,并且会让你遇到错误。自适应学习率在计算上可能是非常昂贵的,但是循环学习率不会。使用循环学习率时,你可以设置一组最大最小边界,并且在这个范围改变它。