本篇学习记录主要包括:《Python深度学习》的第5章(深度学习用于计算机视觉)的第2节(在小型数据集上从头开始训练一个卷积神经网络)内容。
相关知识点:
dropout
;下载链接:https://www.kaggle.com/c/dogs-vs-cats/data
该数据集包含25000张猫狗图片(每个类别都有 12500 张)。
本次暂时只用到其中一小部分数据:每个类别各1000个样本作为训练集,各500个样本作验证集,各500个样本作测试集。
将原始文件解压之后得到: sampleSubmission.csv
, test1.zip
, train.zip
;将 test1.zip
解压后得到 test1/
,包含12500张为分类猫狗图片;train.zip
解压后得到25000张带标签的猫狗分类图片。
数据集划分的结果就是:
train: cat(1000), dog(1000)
val: cat(500), dog(500)
test: cat(500), dog(500)
import os
## 将前1000张猫狗图片复制到目标目录,作为训练集
for i in range(1000):
os.system("cp ./dogs-vs-cats/train/cat.{}.jpg ./dogs-vs-cats/small_dt/train/train_cats/".format(i))
os.system("cp ./dogs-vs-cats/train/dog.{}.jpg ./dogs-vs-cats/small_dt/train/train_dogs/".format(i))
## 将1000-1500的猫狗图片复制到验证集目录;
for j in range(1000,1500):
os.system("cp ./dogs-vs-cats/train/cat.{}.jpg ./dogs-vs-cats/small_dt/validation/validation_cats/".format(j))
os.system("cp ./dogs-vs-cats/train/dog.{}.jpg ./dogs-vs-cats/small_dt/validation/validation_dogs/".format(j))
## 将1500-2000的猫狗图片复制到测试集目录
for k in range(1500, 2000):
os.system("cp ./dogs-vs-cats/train/cat.{}.jpg ./dogs-vs-cats/small_dt/test/test_cats/".format(k))
os.system("cp ./dogs-vs-cats/train/dog.{}.jpg ./dogs-vs-cats/small_dt/test/test_dogs/".format(k))
处理步骤:
读取图像文件;
将 JPEG 文件解码为 RGB 像素网格;
将这些像素网格转换为浮点张量;
将像素值 (0-255) 缩放到 [0,1] 之间 (较少的数值便于神经网络处理);
注意目录结构必须是:
train/
train_cats/
train_dogs/
validation/
validation_cats/
validation_dogs/
test/
test_cats/
test_dogs
from keras.preprocessing.image import ImageDataGenerator ## 可以将硬盘上的图像文件自动转换为预处理好的张量批量
## 将所有图像乘以 1./255 进行像素值的缩放
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
## 创建批量张量
train_generator = train_datagen.flow_from_directory(directory="./dogs-vs-cats/small_dt/train/", ## 训练集图像文件所在目录
target_size=(150, 150), ## 将所有图像的大小调整为 150x150
batch_size=20, ## 批量大小
class_mode="binary") ## 因为时二分类问题,所以用二进制标签 (0,1)
validation_generator = test_datagen.flow_from_directory(directory="./dogs-vs-cats/small_dt/validation/", ## 验证集图像文件所在目录
target_size=(150, 150),
batch_size=20,
class_mode="binary")
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
for data_batch, labels_batch in train_generator:
print(data_batch.shape)
print(labels_batch.shape)
break
(20, 150, 150, 3)
(20,)
相比于之前的 MNIST,猫狗数据集相对而言数据量较大,所以相应的卷积网络也会比之前的更大一些。
和之前的 MNIST 类似,本次的卷积神经网络也是有 Conv2D层 (relu 为激活函数)和 MaxPooling2D 层交替组成。
增加网络深度既可以增大网络容量,也可以减小特征图的尺寸,使之在输入 Flatten 层时尺寸不会太大。
网络中特征图的深度在逐渐增大 (从32增加到128),而特征图的尺寸在逐渐减小 (从150x150 减到 7x7),这几乎是所有卷积网络的模式。
from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation="relu", input_shape=(150, 150, 3))) ## cnn1 (32个分类器,卷积核为3x3)
model.add(layers.MaxPool2D((2,2))) ## maxpooling 1
model.add(layers.Conv2D(64, (3,3), activation="relu")) ## cnn3
model.add(layers.MaxPool2D((2,2))) ## maxpooling 2
model.add(layers.Conv2D(128, (3,3), activation="relu")) ## cnn 3
model.add(layers.MaxPool2D((2,2))) ## maxpooling 3
model.add(layers.Conv2D(128, (3,3), activation="relu")) ## cnn4
model.add(layers.MaxPool2D((2,2))) ## maxpooling 4
model.add(layers.Flatten()) ## Flatten
model.add(layers.Dense(512, activation="relu")) ## FC1 (512个隐藏单元)
model.add(layers.Dense(1, activation="sigmoid")) ## 输出层 (1个隐藏单元,对应1个输出结果;二分类问题,激活函数为 sigmoid)
Metal device set to: Apple M1
systemMemory: 8.00 GB
maxCacheSize: 2.67 GB
查看 model 的特征维度随着每层的变化而产生的变化:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 148, 148, 32) 896
max_pooling2d (MaxPooling2D (None, 74, 74, 32) 0
)
conv2d_1 (Conv2D) (None, 72, 72, 64) 18496
max_pooling2d_1 (MaxPooling (None, 36, 36, 64) 0
2D)
conv2d_2 (Conv2D) (None, 34, 34, 128) 73856
max_pooling2d_2 (MaxPooling (None, 17, 17, 128) 0
2D)
conv2d_3 (Conv2D) (None, 15, 15, 128) 147584
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0
2D)
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 512) 3211776
dense_1 (Dense) (None, 1) 513
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________
关于上面结果中 Param 列参数的个数的计算 (以第一个 896 为例): 参数个数(896) = 卷积核个数(32) * 卷积核大小(3x3x3) + 卷积核个数(32)
## 配置并编译模型
from keras import optimizers
model.compile(optimizer=optimizers.RMSprop(learning_rate=1e-4),
loss="binary_crossentropy", ## 二分类问题用 二元交叉熵损失函数
metrics=["acc"])
fit_generator()
方法对数据进行拟合:history = model.fit_generator(
train_generator,
steps_per_epoch=100, ## 因为训练样本一共有2000个,批量大小为20,所以读取完全部训练样本需要100个批量.
epochs=30, ## 训练30次
validation_data=validation_generator,
validation_steps=50 ## 验证样本一共1000个,批量大小为20,读取完所有验证样本需要50个批量.
)
Epoch 1/30
/var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/2996549298.py:1: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
history = model.fit_generator(
2023-06-25 13:34:09.683332: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
100/100 [==============================] - 8s 69ms/step - loss: 0.6898 - acc: 0.5255 - val_loss: 0.6737 - val_acc: 0.5600
Epoch 2/30
100/100 [==============================] - 7s 67ms/step - loss: 0.6615 - acc: 0.6030 - val_loss: 0.6755 - val_acc: 0.5670
Epoch 3/30
100/100 [==============================] - 7s 66ms/step - loss: 0.6243 - acc: 0.6525 - val_loss: 0.6122 - val_acc: 0.6680
Epoch 29/30
100/100 [==============================] - 7s 67ms/step - loss: 0.0574 - acc: 0.9845 - val_loss: 0.9850 - val_acc: 0.7270
Epoch 30/30
100/100 [==============================] - 7s 67ms/step - loss: 0.0482 - acc: 0.9875 - val_loss: 0.9249 - val_acc: 0.7320
model.save("cats_and_dogs_small_1.h5")
## 训练损失和验证损失
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values)+1)
plt.plot(epochs, loss_values, "bo", label="Training loss") ## "bo" 表示蓝色圆点
plt.plot(epochs, val_loss_values, "b", label="Validation loss") ## "bo" 表示蓝色实线
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()
## 练精度和验证精度
acc_values = history_dict["acc"]
val_acc_values = history_dict["val_acc"]
plt.plot(epochs, acc_values, "bo", label="Training accuracy") ## "bo" 表示蓝色圆点
plt.plot(epochs, val_acc_values, "b", label="Validation accuracy") ## "bo" 表示蓝色实线
plt.title("Training and validation accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()
由于训练样本较少,所以模型很可能出现过拟合。除了之前提到的 dropout
和 L2正则化
等降低过拟合的方法外,还可以使用一种针对计算机视觉领域的新方法——数据增强(data augmentation)。
数据增强指从现有的训练样本中生成更多的训练数据,利用多种能够生成可信图像的随机变换来增强样本。(将图像作不同的变化(翻转等)使模型在训练时不会两次查看完全相同的图像))
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40, ## 图像随机旋转的角度范围 (0-180)
width_shift_range=0.2, ## 图像在水平方向上平移的范围 (相对于总宽度的比例)
height_shift_range=0.2, ## 图像在垂直方向上平移的范围 (相对于总高度的比例)
shear_range=0.2, ## 随机错切变换的角度
zoom_range=0.2, ## 图像随机缩放的范围
horizontal_flip=True ## 随机将一半图像水平翻转
)
test_datagen = ImageDataGenerator(rescale=1./255) ## 验证数据不能增强
## 创建批量张量
train_generator = train_datagen.flow_from_directory(directory="./dogs-vs-cats/small_dt/train/", ## 训练集图像文件所在目录
target_size=(150, 150), ## 将所有图像的大小调整为 150x150
batch_size=20, ## 批量大小
class_mode="binary") ## 因为时二分类问题,所以用二进制标签 (0,1)
validation_generator = test_datagen.flow_from_directory(directory="./dogs-vs-cats/small_dt/validation/", ## 验证集图像文件所在目录
target_size=(150, 150),
batch_size=20,
class_mode="binary")
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
dropout
的卷积网络:## 定义模型框架
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation="relu", input_shape=(150, 150,3)))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation="relu"))
model.add(layers.Dense(1, activation="sigmoid"))
## 配置并编译模型
model.compile(optimizer=optimizers.RMSprop(learning_rate=1e-4),
loss="binary_crossentropy",
metrics=["acc"])
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=validation_generator,
validation_steps=50
)
model.save("cats_and_dogs_small_2.h5")
Epoch 1/50
/var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/2595555689.py:1: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
history = model.fit_generator(
100/100 [==============================] - 8s 73ms/step - loss: 0.6944 - acc: 0.5055 - val_loss: 0.6940 - val_acc: 0.5000
Epoch 2/50
100/100 [==============================] - 8s 77ms/step - loss: 0.6901 - acc: 0.5235 - val_loss: 0.6918 - val_acc: 0.5000
Epoch 3/50
100/100 [==============================] - 8s 79ms/step - loss: 0.6780 - acc: 0.5765 - val_loss: 0.6721 - val_acc: 0.5440
Epoch 48/50
100/100 [==============================] - 11s 112ms/step - loss: 0.4791 - acc: 0.7715 - val_loss: 0.5140 - val_acc: 0.7340
Epoch 49/50
100/100 [==============================] - 10s 102ms/step - loss: 0.4683 - acc: 0.7725 - val_loss: 0.4514 - val_acc: 0.7720
Epoch 50/50
100/100 [==============================] - 11s 106ms/step - loss: 0.4825 - acc: 0.7790 - val_loss: 0.4511 - val_acc: 0.7810
## 训练损失和验证损失
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values)+1)
plt.plot(epochs, loss_values, "bo", label="Training loss") ## "bo" 表示蓝色圆点
plt.plot(epochs, val_loss_values, "b", label="Validation loss") ## "bo" 表示蓝色实线
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()
## 训练精度和验证精度
import matplotlib.pyplot as plt
history_dict = history.history
acc_values = history_dict["acc"]
val_acc_values = history_dict["val_acc"]
epochs = range(1, len(acc_values)+1)
plt.plot(epochs, acc_values, "bo", label="Training accuracy") ## "bo" 表示蓝色圆点
plt.plot(epochs, val_acc_values, "b", label="Validation accuracy") ## "bo" 表示蓝色实线
plt.title("Training and validation accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()