TensorFlow 2.0和tf.keras提供了三种方式来实现神经网络模型:Sequential API
Functional API
Model subclassing
下面我将分别使用这三种方式来创建深度学习模型,并使用它们在CIFAR-10数据集上训练,然后将训练过程用图片保存下来。
代码结构
├── output
│ ├── class.png
│ ├── functional.png
│ └── sequential.png
├── model
│ ├── __init__.py
│ └── models.py
└── train.py
文件model.py中包含三个函数分别代表实现模型的三种方式,train.py将使用我们实现的模型在CIFAR-10数据集上训练,然后将训练时的准确率和损失以图形的方式输出到output文件夹中。
Sequential API
Keras Sequential API是实现Keras模型最简单的方式,就行它的名字所形容的一样,它将按照序列的方式实现模型,模型中的各个层就像一个队列一样排列起来组成一个完整的模型。但是Keras Sequential API有一定局限性,它不能创建以下模型结构:共享层
模型分支
多个输入分支
多个输出分支
下面我将使用TensorFlow2.0中的Keras来实现一个小的神经网络结构,首先在model.py中导入必要的包:
# import the necessary packages
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import concatenate
然后使用Keras Sequential API实现浅层模型如下。这里我自定义了一个模型,我在实现的时候使用model.add()方法将模型组件按顺序添加到一起。
def shallownet_sequential(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" ordering
model = Sequential()
inputShape = (height, width, depth)
# define the first (and only) CONV => RELU layer
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
# softmax classifier
model.add(Flatten())
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
Functional API
相比Sequential API,Functional API是我们实现模型更加常用的方式。Functional API更加灵活。使用Functional API,我们可以创建出更加复杂的模型:在模型中设置多个输入或多个输出
在模型中定义分支结构
在模型中使用共享层
在模型中使用循环结构
另外任何使用Sequential API实现的模型都可以使用Functional API实现。下面我将使用TensorFlow2.0中keras的Functional API来实现一个小型的GoogLeNet。
在MiniGoogLeNet 中包含三种网络模块:conv_module:对输入层先卷积Conv,然后batch normalization,然后ReLU层激活。我们将
称为conv_module。
inception_module:对同一层的输入使用两个(或多个)分支的卷积,第一个卷积分支使用1×1卷积,而第二个卷积分支使用3×3卷积,确保1×1和3×3卷积的输出维度大小相同。最后将两个分支的输出沿着通道维度连接在一起。
downsample_module:此模块负责减小输入层的尺寸。downsample_module也有两个分支。第一个分支使用步长为2的3×3卷积,第二个分支使步长为2的最大3×3池化。然后将两个分支的输出沿通道维度连接在一起。
我先实现上面三种网络模块,然后使用这三种网络模块实现MiniGoogLeNet。
MiniGoogLeNet的代码如下:
def minigooglenet_functional(width, height, depth, classes):
def conv_module(x, K, kX, kY, stride, chanDim, padding="same"):
# define a CONV => BN => RELU pattern
x = Conv2D(K, (kX, kY), strides=stride, padding=padding)(x)
x = BatchNormalization(axis=chanDim)(x)
x = Activation("relu")(x)
# return the block
return x
def inception_module(x, numK1x1, numK3x3, chanDim):
# define two CONV modules, then concatenate across the
# channel dimension
conv_1x1 = conv_module(x, numK1x1, 1, 1, (1, 1), chanDim)
conv_3x3 = conv_module(x, numK3x3, 3, 3, (1, 1), chanDim)
x = concatenate([conv_1x1, conv_3x3], axis=chanDim)
# return the block
return x
def downsample_module(x, K, chanDim):
# define the CONV module and POOL, then concatenate
# across the channel dimensions
conv_3x3 = conv_module(x, K, 3, 3, (2, 2), chanDim,
padding="valid")
pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = concatenate([conv_3x3, pool], axis=chanDim)
# return the block
return x
# initialize the input shape to be "channels last" and the
# channels dimension itself
inputShape = (height, width, depth)
chanDim = -1
# define the model input and first CONV module
inputs = Input(shape=inputShape)
x = conv_module(inputs, 96, 3, 3, (1, 1), chanDim)
# two Inception modules followed by a downsample module
x = inception_module(x, 32, 32, chanDim)
x = inception_module(x, 32, 48, chanDim)
x = downsample_module(x, 80, chanDim)
# four Inception modules followed by a downsample module
x = inception_module(x, 112, 48, chanDim)
x = inception_module(x, 96, 64, chanDim)
x = inception_module(x, 80, 80, chanDim)
x = inception_module(x, 48, 96, chanDim)
x = downsample_module(x, 96, chanDim)
# two Inception modules followed by global POOL and dropout
x = inception_module(x, 176, 160, chanDim)
x = inception_module(x, 176, 160, chanDim)
x = AveragePooling2D((7, 7))(x)
x = Dropout(0.5)(x)
# softmax classifier
x = Flatten()(x)
x = Dense(classes)(x)
x = Activation("softmax")(x)
# create the model
model = Model(inputs, x, name="minigooglenet")
# return the constructed network architecture
return model
Model subclassing
Model subclassing顾名思义就是继承Model类。这有点类似于面向对象编程,实际上Keras中的所有模型都继承了Model类。使用这种方式我们可以完全按照我们的意愿编写我们的模型,我么可以在网络中使用我们自定义的层,自定义的损失函数,自定义的激活函数等等。当你发现TensorFlow和Keras中没有你想要的层,那你可以使用Model subclassing的方式来实现你的模型。(如果要写论文做开源码可能会经常用)
使用Model subclassing时我们要覆写Model类中的__init__方法和call方法。__init__方法中定义我们要使用的层,这里可以使用Keras自带的层。call方法中实现模型的inference顺序。
下面我将使用Model subclassing来实现一个小型的VGG网络:
class MiniVGGNetModel(Model):
def __init__(self, classes, chanDim=-1):
# call the parent constructor
super(MiniVGGNetModel, self).__init__()
# initialize the layers in the first (CONV => RELU) * 2 => POOL
# layer set
self.conv1A = Conv2D(32, (3, 3), padding="same")
self.act1A = Activation("relu")
self.bn1A = BatchNormalization(axis=chanDim)
self.conv1B = Conv2D(32, (3, 3), padding="same")
self.act1B = Activation("relu")
self.bn1B = BatchNormalization(axis=chanDim)
self.pool1 = MaxPooling2D(pool_size=(2, 2))
# initialize the layers in the second (CONV => RELU) * 2 => POOL
# layer set
self.conv2A = Conv2D(32, (3, 3), padding="same")
self.act2A = Activation("relu")
self.bn2A = BatchNormalization(axis=chanDim)
self.conv2B = Conv2D(32, (3, 3), padding="same")
self.act2B = Activation("relu")
self.bn2B = BatchNormalization(axis=chanDim)
self.pool2 = MaxPooling2D(pool_size=(2, 2))
# initialize the layers in our fully-connected layer set
self.flatten = Flatten()
self.dense3 = Dense(512)
self.act3 = Activation("relu")
self.bn3 = BatchNormalization()
self.do3 = Dropout(0.5)
# initialize the layers in the softmax classifier layer set
self.dense4 = Dense(classes)
self.softmax = Activation("softmax")
def call(self, inputs):
# build the first (CONV => RELU) * 2 => POOL layer set
x = self.conv1A(inputs)
x = self.act1A(x)
x = self.bn1A(x)
x = self.conv1B(x)
x = self.act1B(x)
x = self.bn1B(x)
x = self.pool1(x)
# build the second (CONV => RELU) * 2 => POOL layer set
x = self.conv2A(x)
x = self.act2A(x)
x = self.bn2A(x)
x = self.conv2B(x)
x = self.act2B(x)
x = self.bn2B(x)
x = self.pool2(x)
# build our FC layer set
x = self.flatten(x)
x = self.dense3(x)
x = self.act3(x)
x = self.bn3(x)
x = self.do3(x)
# build the softmax classifier
x = self.dense4(x)
x = self.softmax(x)
# return the constructed model
return x
训练模型
train.py用来使用我们的模型在CIFAR-10数据集上训练,然后将训练过程保存到图片中,代码如下:
# USAGE
# python train.py --model sequential --plot output/sequential.png
# python train.py --model functional --plot output/functional.png
# python train.py --model class --plot output/class.png
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
import tensorflow as tf
physical_device = tf.config.experimental.list_physical_devices("GPU")
tf.config.experimental.set_memory_growth(physical_device[0], True)
# there seems to be an issue with TensorFlow 2.0 throwing non-critical
# warnings regarding gradients when using the model sub-classing
# feature -- I found that by setting the logging level I can suppress
# the warnings from showing up (likely won't be required in future
# releases of TensorFlow)
import logging
logging.getLogger("tensorflow").setLevel(logging.CRITICAL)
# import the necessary packages
from model.models import MiniVGGNetModel
from model.models import minigooglenet_functional
from model.models import shallownet_sequential
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, default="sequential",
choices=["sequential", "functional", "class"],
help="type of model architecture")
ap.add_argument("-p", "--plot", type=str, required=True,
help="path to output plot file")
args = vars(ap.parse_args())
# args={"model":"sequential","plot":"output/sequential.png"}
# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-2
BATCH_SIZE = 128
NUM_EPOCHS = 60
# initialize the label names for the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog",
"frog", "horse", "ship", "truck"]
# load the CIFAR-10 dataset
print("[INFO] loading CIFAR-10 dataset...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
# scale the data to the range [0, 1]
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0
# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=18, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest")
# check to see if we are using a Keras Sequential model
if args["model"] == "sequential":
# instantiate a Keras Sequential model
print("[INFO] using sequential model...")
model = shallownet_sequential(32, 32, 3, len(labelNames))
# check to see if we are using a Keras Functional model
elif args["model"] == "functional":
# instantiate a Keras Functional model
print("[INFO] using functional model...")
model = minigooglenet_functional(32, 32, 3, len(labelNames))
# check to see if we are using a Keras Model class
elif args["model"] == "class":
# instantiate a Keras Model sub-class model
print("[INFO] using model sub-classing...")
model = MiniVGGNetModel(len(labelNames))
# initialize the optimizer compile the model and
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / NUM_EPOCHS)
print("[INFO] training network...")
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
validation_data=(testX, testY),
steps_per_epoch=trainX.shape[0] // BATCH_SIZE,
epochs=NUM_EPOCHS,
verbose=1)
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BATCH_SIZE)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
# determine the number of epochs and then construct the plot title
N = np.arange(0, NUM_EPOCHS)
title = "Training Loss and Accuracy on CIFAR-10 ({})".format(
args["model"])
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])
这里会下载CIFAR-10数据集,网络不好可能下载不下来。我们可以先用迅雷下载下来重命名为cifar-10-batches-py.tar.gz,存放到C:\Users\用户名\.keras\datasets(ubuntu是~/.keras/datasets)文件夹下。
然后运行如果报错说:tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
说明显卡内存不足,那么在代码的开头加入如下两句,动态分配显存:
physical_device = tf.config.experimental.list_physical_devices("GPU")
tf.config.experimental.set_memory_growth(physical_device[0], True)
首先使用Sequential API,命令行运行:
python train.py --model sequential --plot output/sequential.png
开始训练:
训练结束:
训练过程acc/loss:
然后使用Functional API,命令行运行:
python train.py --model functional --plot output/functional.png
结果如下:
最后使用Model subclassing,命令行运行:
python train.py --model class --plot output/class.png
结果如下:
总结
Sequential API使用最简单,但同时局限最大,很多模型结构无法实现。
Functional API在保证较大灵活性的同时使用简单,大部分情况下用Functional API可以满足我们的需求。
Model subclassing灵活性最大,可以实现任何我们需要的模型结构,当你要实现一些自定义网络结构、损失函数、激活方法时,可以使用Model subclassing的方式实现。