我们的深度学习数据集包括1,191张口袋妖怪图像,(存在于口袋妖怪世界中的动物般的生物,流行的电视节目,视频游戏和交易卡系列)。
我们的目标是使用Keras和深度学习训练卷积神经网络,以识别和分类这些神奇宝贝。
我们将认识到的口袋妖怪包括:
Bulbasaur(234图像)
Charmander(238图像)
Squirtle(223图像)
皮卡丘(234图像)
Mewtwo(239图像)
项目结构
├── dataset
│ ├── bulbasaur [234 entries]
│ ├── charmander [238 entries]
│ ├── mewtwo [239 entries]
│ ├── pikachu [234 entries]
│ └── squirtle [223 entries]
├── examples [6 entries]
├── pyimagesearch
│ ├── __init__.py
│ └── smallervggnet.py
├── plot.png
├── lb.pickle
├── pokedex.model
├── classify.py
└── train.py
有3个目录:
dataset:包含五个类,每个类都有自己的子目录,使解析类标签变得容易。
示例:包含我们将用于测试CNN的图像。
pyimagesearch模块:包含我们的SmallerVGGNet模型类。
根目录中有5个文件:
plot.png:训练脚本运行后生成的训练/测试精度和损失图。
lb.pickle:我们的LabelBinarizer序列化对象文件 - 它包含类名查找mechamisn的类索引。
pokedex.model:这是我们的序列化Keras卷积神经网络模型文件(即“权重文件”)。
train.py:我们将使用此脚本训练我们的Keras CNN,绘制准确性/丢失,然后将CNN和标签二进制文件序列化到磁盘。
classify.py:我们的测试脚本。
使用的CNN架构是VGGNet网络的更小,更紧凑的变体,由Simonyan和Zisserman在他们的2014年论文“用于大规模图像识别的超深度卷积网络”中介绍。
类似VGGNet的架构的特点是:
仅使用堆叠在彼此顶部的3×3卷积层来增加深度
通过最大池化减小卷大小
在softmax分类器之前,网络末端的完全连接层
深度学习开发环境配置的例子:
预配置的实例:
smallervggnet.py
# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
class SmallerVGGNet:
@staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
width:图像宽度尺寸。
height:图像高度尺寸。
深度:图像的深度 - 也称为通道数。
classes:数据集中的类数(这将影响我们模型的最后一层)。使用了5个Pokemon课程,但是每个物种下载了足够的示例图像,可以使用807 Pokemon物种
采用CONV => RELU => POOL块。
卷积层具有32个滤波器和3×3内核。 我们使用RELU激活函数,然后进行批量标准化。
我们的POOL层使用3 x 3 POOL大小将空间尺寸从96 x 96快速缩小到32 x 32(我们将使用96 x 96 x 3输入图像来训练我们的网络,我们将在下一节中看到)。
从代码块中可以看出,我们还将在网络架构中使用dropout。 Dropout通过将节点从当前层随机断开连接到下一层来工作。 在训练批次期间随机断开连接的过程有助于自然地在模型中引入冗余 - 层中没有一个单个节点负责预测某个类,对象,边缘或角落。
train.py
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from pyimagesearch.smallervggnet import SmallerVGGNet
import matplotlib.pyplot as plt
from imutils import paths
import numpy as np
import argparse
import random
import pickle
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", required=True,
help="path to output model")
ap.add_argument("-l", "--labelbin", required=True,
help="path to output label binarizer")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output accuracy/loss plot")
args = vars(ap.parse_args())
# initialize the number of epochs to train for, initial learning rate,
# batch size, and image dimensions
EPOCHS = 100
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)
# initialize the data and labels
data = []
labels = []
# grab the image paths and randomly shuffle them
print("[INFO] loading images...")
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
# extract the class label from the image path and update the
# labels list
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {:.2f}MB".format(
data.nbytes / (1024 * 1000.0)))
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
labels, test_size=0.2, random_state=42)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# initialize the model
print("[INFO] compiling model...")
model = SmallerVGGNet.build(width=IMAGE_DIMS[1], height=IMAGE_DIMS[0],
depth=IMAGE_DIMS[2], classes=len(lb.classes_))
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"])
# save the label binarizer to disk
print("[INFO] serializing label binarizer...")
f = open(args["labelbin"], "wb")
f.write(pickle.dumps(lb))
f.close()
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
N = EPOCHS
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="upper left")
plt.savefig(args["plot"])
python train.py --dataset dataset --model pokedex.model --labelbin lb.pickle
classify.py
# import the necessary packages
from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="path to trained model model")
ap.add_argument("-l", "--labelbin", required=True,
help="path to label binarizer")
ap.add_argument("-i", "--image", required=True,
help="path to input image")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
output = image.copy()
# pre-process the image for classification
image = cv2.resize(image, (96, 96))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
# load the trained convolutional neural network and the label
# binarizer
print("[INFO] loading network...")
model = load_model(args["model"])
lb = pickle.loads(open(args["labelbin"], "rb").read())
# classify the input image
print("[INFO] classifying image...")
proba = model.predict(image)[0]
idx = np.argmax(proba)
label = lb.classes_[idx]
# we'll mark our prediction as "correct" of the input image filename
# contains the predicted label text (obviously this makes the
# assumption that you have named your testing image files this way)
filename = args["image"][args["image"].rfind(os.path.sep) + 1:]
correct = "correct" if filename.rfind(label) != -1 else "incorrect"
# build the label and draw the label on the image
label = "{}: {:.2f}% ({})".format(label, proba[idx] * 100, correct)
output = imutils.resize(output, width=400)
cv2.putText(output, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 255, 0), 2)
# show the output image
print("[INFO] {}".format(label))
cv2.imshow("Output", output)
cv2.waitKey(0)
python classify.py --model pokedex.model --labelbin lb.pickle \
--image examples/charmander_counter.png
该模型的主要局限之一是少量的训练数据。测试了各种图像,有时分类不正确。当发生这种情况时,我更仔细地检查输入图像+网络,发现图像中最主要的颜色会显着影响分类。
例如,图像中的大量红色和橙色可能会返回“Charmander”作为标签。同样,图像中的大量黄色通常会产生“皮卡丘”标签。
这部分是由于我们的输入数据。大部分照片来自电影/电视节目中的粉丝插图或剧照。而且,每个类只有有限的数据量(~225-250个图像)。
理想情况下,在训练卷积神经网络时,每个类应至少有500-1,000个图像。