教程取自于Google官方课程之TensorFlow 入门实操课程,授权于中国大学MOOC。
第二章:计算机视觉
2.1 计算机视觉
2.2 加载Fashion MNIST
Calling load_data on this object will give you two sets of two lists, these will be the training and testing values for the graphics that contain the clothing items and their labels.
在mnist对象上调用load_data方法会得到两个元组,各自包含两个列表。这些列表存储了服装用品的训练与测试图像数据及标签值。
# 加载mnist数据集
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
print(train_images.shape) # 查看数据大小
print(train_images[0]) # 查看数据灰度值
数据集有60000张图片,像素为28*28。
print(train_labels.shape) # 查看标签大小
print(train_labels[:5]) # 查看前5个标签
import matplotlib.pyplot as plt
plt.imshow(train_images[0]) # 显示第一张图片
You'll notice that all of the values in the number are between 0 and 255. If we are training a neural network, for various reasons it's easier if we treat all values as between 0 and 1, a process called 'normalizing'...and fortunately in Python it's easy to normalize a list like this without looping. You do it like this:
你会注意到,数字中的所有值都在0和255之间。如果我们要训练一个神经网络,出于多种原因,如果把所有的值都处理成0和1之间,那就更容易得到较好的训练效果。这个过程叫做 "归一化"......幸运的是,在Python中,很容易对这样的列表进行归一化,而不需要循环。可以这样做:
training_images = training_images / 255.0
test_images = test_images / 255.0
2.3 构造神经元网络模型
# 构造神经元网络模型
import tensorflow as tf
model = keras.Sequential([
keras.layers.Flatten(input_shape = (28,28)), # 定义输入层
keras.layers.Dense(128, activation = tf.nn.relu), # 定义中间层,128个神经元和激活函数relu
keras.layers.Dense(10, activation = tf.nn.softmax) # 定义输出层,10个分类和激活函数softmax
])
# 查看模型结构
model.summary()
Sequential: That defines a SEQUENCE of layers in the neural network
Flatten: Remember earlier where our images were a square, when you printed them out? Flatten just takes that square and turns it into a 1 dimensional set.
Dense: Adds a layer of neurons
Each layer of neurons need an activation function to tell them what to do. There's lots of options, but just use these for now.
Relu effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.
Softmax takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!
Sequential。这定义了神经网络中的层数序列。一开始学习神经元网络总是使用序列模型。
Flatten。还记得上面将图像打印出来的时候是一个正方形吗?扁平化只是把这个正方形变成了一个一维的集合。把二维数组变成一维数组。
Dense:增加一层神经元。
每一层神经元都需要一个激活函数 activation来告诉它们输出什么。有很多选项,但目前只用这些(relu和softmax)。
Relu的意思是 "如果X>0返回X,否则返回0"--所以它的作用是它只把大于0的值传递给网络中的下一层,小于0的也当作0。
Softmax激活函数接收到一组值后,选择其中最大的一个输出。例如,上一层的输出为[0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05],Softmax就省去了你在其中寻找最大的值,并把它变成[0,0,0,0,0,1,0,0,0,0,0] Softmax的意思是 "如果X>0,则返回X,否则返回0" -- 所以它的作用是只把0或更大的值传给下一层的网络。--其目的是节省大量的编码!
由于每一层都有一个bias,因此对应的参数值为(784+1)*128=100480。
2.4 训练和评估模型
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics=['accuracy']) # 定义优化算法和损失函数
model.fit(train_images, train_labels, epochs = 5) # 训练模型
model.evaluate(test_images, test_labels) # 评估模型
可以看出训练后的泛化误差略大于训练误差,模型是可以接受的。
2.5 自动终止训练
第三章:卷积介绍
3.1 卷积神经网络
In short, you take an array (usually 3x3 or 5x5) and pass it over the image. By changing the underlying pixels based on the formula within that matrix, you can do things like edge detection. So, for example, if you look at the above link, you'll see a 3x3 that is defined for edge detection where the middle cell is 8, and all of its neighbors are -1. In this case, for each pixel, you would multiply its value by 8, then subtract the value of each neighbor. Do this for every pixel, and you'll end up with a new image that has the edges enhanced.
This is perfect for computer vision, because often it's features that can get highlighted like this that distinguish one item for another, and the amount of information needed is then much less...because you'll just train on the highlighted features.
That's the concept of Convolutional Neural Networks. Add some layers to do convolution before you have the dense layers, and then the information going to the dense layers is more focussed, and possibly more accurate.
简而言之,如果取一个二维数组(通常是3x3或5x5)并将其应用到图像上。通过根据该矩阵内的公式改变底层像素,就可以进行图像边缘检测等工作。例如上面的链接,会看到一个3x3的矩阵,它是为边缘检测而定义的,其中间的单元格是8,而所有相邻单元格都是-1。在这种情况下,对于每个像素,把它的值乘以8,然后减去它周边像素的值(因为每个都乘了-1)。扫描整个图像,对每个像素都这样做,最终会得到一张边缘被增强的新图像。
这种计算对于计算机视觉来说是非常理想的,因为通常情况下,能够像这样被突出显示的特征才是区分一个物品和另一个物品的关键。卷积使得所需要的信息量会少很多......因为只需要对突出显示的特征进行训练。
这就是卷积神经网络的概念。在全连接层之前,增加一些层来做卷积,那么输入全连接层的信息就会更加集中,也可能更加准确。
3.2 卷积网络程序
卷积网络模型实际上就是在全连接模型的基础上增加了卷积层和池化层,另外由于要用到卷积,训练数据要改变维度(Reshape)。
# 构造卷积网络模型
import tensorflow as tf
from tensorflow import keras
# 加载数据
mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images / 255.0
test_images= test_images / 255.0
#定 义卷积模型结构
model = keras.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)), # 增加卷积层,参数见下文说明
tf.keras.layers.MaxPooling2D(2, 2), # 增加池化层
tf.keras.layers.Conv2D(64, (3,3), activation='relu'), # 再重复一次卷积
tf.keras.layers.MaxPooling2D(2,2),
keras.layers.Flatten(), # 对输入扁平化处理,即转为全连接
keras.layers.Dense(128, activation = tf.nn.relu),
keras.layers.Dense(10, activation = tf.nn.softmax)
])
# 查看模型结构
model.summary()
# 训练模型
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics=['accuracy']) # 定义优化算法和损失函数
model.fit(train_images.reshape(-1,28,28,1), train_labels, epochs = 5) # 训练的图像要转为4维
# 评估模型
model.evaluate(test_images.reshape(-1,28,28,1), test_labels)
output
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_10 (Conv2D) (None, 26, 26, 64) 640
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 13, 13, 64) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 11, 11, 64) 36928
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 5, 5, 64) 0
_________________________________________________________________
flatten_5 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_10 (Dense) (None, 128) 204928
_________________________________________________________________
dense_11 (Dense) (None, 10) 1290
=================================================================
Total params: 243,786
Trainable params: 243,786
Non-trainable params: 0
====================================================================================================
Epoch 1/5
1875/1875 [==============================] - 45s 24ms/step - loss: 0.4329 - accuracy: 0.8428
Epoch 2/5
1875/1875 [==============================] - 45s 24ms/step - loss: 0.2897 - accuracy: 0.8941
Epoch 3/5
1875/1875 [==============================] - 44s 23ms/step - loss: 0.2463 - accuracy: 0.9095
Epoch 4/5
1875/1875 [==============================] - 44s 24ms/step - loss: 0.2134 - accuracy: 0.9212
Epoch 5/5
1875/1875 [==============================] - 46s 24ms/step - loss: 0.1875 - accuracy: 0.9297
====================================================================================================
313/313 [==============================] - 6s 18ms/step - loss: 0.2539 - accuracy: 0.9071
[0.2538785934448242, 0.9071000218391418]
Now instead of the input layer at the top, you're going to add a Convolution. The parameters are:
1.The number of convolutions you want to generate. Purely arbitrary, but good to start with something in the order of 32
2.The size of the Convolution, in this case a 3x3 grid
3.The activation function to use -- in this case we'll use relu, which you might recall is the equivalent of returning x when x>0, else returning 0
- In the first layer, the shape of the input data.
You'll follow the Convolution with a MaxPooling layer which is then designed to compress the image, while maintaining the content of the features that were highlighted by the convlution. By specifying (2,2) for the MaxPooling, the effect is to quarter the size of the image. Without going into too much detail here, the idea is that it creates a 2x2 array of pixels, and picks the biggest one, thus turning 4 pixels into 1. It repeats this across the image, and in so doing halves the number of horizontal, and halves the number of vertical pixels, effectively reducing the image by 25%.
You can call model.summary() to see the size and shape of the network, and you'll notice that after every MaxPooling layer, the image size is reduced in this way.
首先要添加一个卷积层。参数是
- 你想要生成的卷积数(过滤器数量)。这个数值是任意的,但最好是从32开始的倍数。
- 卷积的大小(过滤器的大小),在本例中为3x3网格。这是最常用的尺寸。
- 要使用的激活函数 -- 在本例中,我们将使用relu,你可能还记得它相当于当x>0时返回x,否则返回0。
- 在第一层,设定输入数据的形状。
在卷积层之后加上一个MaxPooling层,用来压缩图像,同时保持卷积所强调的特征内容。通过为MaxPooling指定(2,2),效果是将图像的大小缩小四分之一。它的想法是创建一个2x2的像素数组,然后选取最大的一个,从而将4个像素变成1个,在整个图像中重复这样做,这样做的结果是将水平像素的数量减半,垂直像素的数量减半,有效地将图像缩小25%。
可以调用model.summary()来查看网络的大小和形状,你会注意到,每一个MaxPooling(池化)层之后,图像的大小都会以这种方式减少为原来的1/4。
3.3 将卷积和池化的结果可视化
print(test_labels[:100])
output
[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]
import matplotlib.pyplot as plt
f, axarr = plt.subplots(3,4)
FIRST_IMAGE=0
SECOND_IMAGE=7
THIRD_IMAGE=26
CONVOLUTION_NUMBER = 1
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
for x in range(0,4):
f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[0,x].grid(False)
f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[1,x].grid(False)
f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[2,x].grid(False)
This code will show us the convolutions graphically. The print (test_labels[;100]) shows us the first 100 labels in the test set, and you can see that the ones at index 0, index 7 and index 26 are all the same value (9). They're all shoes. Let's take a look at the result of running the convolution on each, and you'll begin to see common features between them emerge. Now, when the DNN is training on that data, it's working with a lot less, and it's perhaps finding a commonality between shoes based on this convolution/pooling combination.
这段代码将以图形方式向我们展示卷积的结果。print(test_labels[;100])向我们展示了测试集中的前100个标签,你可以看到索引0、索引7和索引26的标签都是相同的值(9),它们都是鞋子。让我们来看看对图像做卷积操作的结果,会看到它们之间的共同特征出现。之后,当DNN在该数据上进行训练时,模型训练的工作内容就少了很多,模型会在卷积/池化的基础上找到鞋子图像的共性。
第四章:更复杂的图像应用
4.1 项目概要
项目目标是训练识别人和马图片的模型。
首先是准备训练数据,通过wget下载到tmp目录下。
# 训练数据
!wget --no-check-certificate \
https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip \
-O /tmp/horse-or-human.zip
# 验证数据
!wget --no-check-certificate \
https://storage.googleapis.com/laurencemoroney-blog.appspot.com/validation-horse-or-human.zip \
-O /tmp/horse-or-human.zip
output
--2020-11-27 11:30:24-- https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.27.144, 216.58.200.240, 172.217.160.112, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.27.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149574867 (143M) [application/zip]
Saving to: ‘/tmp/horse-or-human.zip’
/tmp/horse-or-human 100%[===================>] 142.65M 9.09MB/s in 11s
2020-11-27 11:30:37 (12.4 MB/s) - ‘/tmp/horse-or-human.zip’ saved [149574867/149574867]
接下来通过zipfile解压图片。
import os
import zipfile
local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()
4.2 ImageDataGenerator
One thing to pay attention to in this sample: We do not explicitly label the images as horses or humans. If you remember with the handwriting example earlier, we had labelled 'this is a 1', 'this is a 7' etc. Later you'll see something called an ImageGenerator being used -- and this is coded to read images from subdirectories, and automatically label them from the name of that subdirectory. So, for example, you will have a 'training' directory containing a 'horses' directory and a 'humans' one. ImageGenerator will label the images appropriately for you, reducing a coding step.
Let's define each of these directories:
这里需要注意的是,我们并没有明确地将图像标注为马或人。如果还记得之前的手写数字例子,它的训练数据已经标注了 "这是一个1","这是一个7 "等等。 稍后,我们使用一个叫做ImageGenerator的类--用它从子目录中读取图像,并根据子目录的名称自动给图像贴上标签。所以,会有一个 "训练 "目录,其中包含一个 "马匹 "目录和一个 "人类 "目录。ImageGenerator将为你适当地标注图片,从而减少一个编码步骤。(不仅编程上更方便,而且可以避免一次性把所有训练数据载入内存,而导致内存不够等问题。)
让我们分别定义这些目录。
# Directory with our training horse pictures
train_horse_dir = os.path.join('/tmp/horse-or-human/horses')
# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')
Now, let's see what the filenames look like in the horses and humans training directories:
现在,让我们看看 "马 "和 "人 "训练目录中的文件名是什么样的。
train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])
train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])
output
['horse16-9.png', 'horse50-4.png', 'horse50-9.png', 'horse23-5.png', 'horse16-1.png', 'horse32-3.png', 'horse26-9.png', 'horse50-2.png', 'horse11-6.png', 'horse44-9.png']
['human16-24.png', 'human06-27.png', 'human14-21.png', 'human06-06.png', 'human15-17.png', 'human10-19.png', 'human07-01.png', 'human16-20.png', 'human01-07.png', 'human02-06.png']
Let's find out the total number of horse and human images in the directories:
我们来看看目录中马和人的图片总数。
print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))
output
total training horse images: 500
total training human images: 527
Now let's take a look at a few pictures to get a better sense of what they look like. First, configure the matplot parameters:
现在我们来看看几张图片,以便对它们的样子有个直观感受。首先,配置matplot参数。
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4
# Index for iterating over images
pic_index = 0
Now, display a batch of 8 horse and 8 human pictures. You can rerun the cell to see a fresh batch each time:
接下来,显示一批8张马和8张人的图片。每次重新运行单元格,都会看到一个新的批次(另外8张马和8张人)。
# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)
pic_index += 8
next_horse_pix = [os.path.join(train_horse_dir, fname)
for fname in train_horse_names[pic_index-8:pic_index]]
next_human_pix = [os.path.join(train_human_dir, fname)
for fname in train_human_names[pic_index-8:pic_index]]
for i, img_path in enumerate(next_horse_pix+next_human_pix):
# Set up subplot; subplot indices start at 1
sp = plt.subplot(nrows, ncols, i + 1)
sp.axis('Off') # Don't show axes (or gridlines)
img = mpimg.imread(img_path)
plt.imshow(img)
plt.show()
Let's set up data generators that will read pictures in our source folders, convert them to float32 tensors, and feed them (with their labels) to our network. We'll have one generator for the training images and one for the validation images. Our generators will yield batches of images of size 300x300 and their labels (binary).
As you may already know, data that goes into neural networks should usually be normalized in some way to make it more amenable to processing by the network. (It is uncommon to feed raw pixels into a convnet.) In our case, we will preprocess our images by normalizing the pixel values to be in the [0, 1] range (originally all values are in the [0, 255] range).
In Keras this can be done via the keras.preprocessing.image.ImageDataGenerator class using the rescale parameter. This ImageDataGenerator class allows you to instantiate generators of augmented image batches (and their labels) via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with the Keras model methods that accept data generators as inputs: fit_generator, evaluate_generator, and predict_generator.
让我们设置训练数据生成器(ImageDataGenerator),它将读取源文件夹中的图片,将它们转换为float32
多维数组,并将图像数据(连同它们的标签)反馈给神经元网络。总共需要两个生成器,有用于产生训练图像,一个用于产生验证图像。生成器将产生一批大小为300x300的图像及其标签(0或1)。
前面的课中我们已经知道如何对训练数据做归一化,进入神经网络的数据通常应该以某种方式进行归一化,以使其更容易被网络处理。在这个例子中,我们将通过将像素值归一化到[0, 1]
范围内(最初所有的值都在[0, 255]
范围内)来对图像进行预处理。
在Keras中,可以通过keras.preprocessing.image.ImageDataGenerator
类使用rescale
参数来实现归一化。通过ImageDataGenerator
类的.flow(data, labels)
或.flow_from_directory(directory)
,可以创建生成器。然后,这些生成器可以作为输入Keras方法的参数,如fit_generator
、evaluate_generator
和predict_generator
都可接收生成器实例为参数。
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)
# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
'/tmp/horse-or-human/', # This is the source directory for training images
target_size=(150, 150), # All images will be resized to 150x150
batch_size=32, #128
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')
让我们设置训练数据生成器(ImageDataGenerator),它将读取源文件夹中的图片,将它们转换为float32
多维数组,并将图像数据(连同它们的标签)反馈给神经元网络。总共需要两个生成器,有用于产生训练图像,一个用于产生验证图像。生成器将产生一批大小为300x300的图像及其标签(0或1)。
前面的课中我们已经知道如何对训练数据做归一化,进入神经网络的数据通常应该以某种方式进行归一化,以使其更容易被网络处理。在这个例子中,我们将通过将像素值归一化到[0, 1]
范围内(最初所有的值都在[0, 255]
范围内)来对图像进行预处理。
在Keras中,可以通过keras.preprocessing.image.ImageDataGenerator
类使用rescale
参数来实现归一化。通过ImageDataGenerator
类的.flow(data, labels)
或.flow_from_directory(directory)
,可以创建生成器。然后,这些生成器可以作为输入Keras方法的参数,如fit_generator
、evaluate_generator
和predict_generator
都可接收生成器实例为参数。
4.3 构建并训练模型
let's start defining the model:
Step 1 will be to import tensorflow.
We then add convolutional layers as in the previous example, and flatten the final result to feed into the densely connected layers.
Finally we add the densely connected layers.
Note that because we are facing a two-class classification problem, i.e. a binary classification problem, we will end our network with a sigmoid activation, so that the output of our network will be a single scalar between 0 and 1, encoding the probability that the current image is class 1 (as opposed to class 0).
让我们开始定义模型:
第一步是导入 tensorflow.
然后,像前面的例子一样添加卷积层,并将最终结果扁平化,以输送到全连接的层去。
最后我们添加全连接层。
需要注意的是,由于我们面对的是一个两类分类问题,即二类分类问题,所以我们会用sigmoid激活函数作为模型的最后一层,这样我们网络的输出将是一个介于0和1之间的有理数,即当前图像是1类(而不是0类)的概率。
import tensorflow as tf
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
tf.keras.layers.Dense(1, activation='sigmoid')
])
Next, we'll configure the specifications for model training. We will train our model with the binary_crossentropy
loss, because it's a binary classification problem and our final activation is a sigmoid. (For a refresher on loss metrics, see the Machine Learning Crash Course.) We will use the rmsprop
optimizer with a learning rate of 0.001
. During training, we will want to monitor classification accuracy.
NOTE: In this case, using the RMSprop optimization algorithm is preferable to stochastic gradient descent (SGD), because RMSprop automates learning-rate tuning for us. (Other optimizers, such as Adam and Adagrad, also automatically adapt the learning rate during training, and would work equally well here.)
接下来,我们将配置模型训练的参数。我们将用 "binary_crossentropy(二元交叉熵)"衡量损失,因为这是一个二元分类问题,最终的激活函数是一个sigmoid。关于损失度量的复习,请参见机器学习速成班。我们将使用rmsprop
作为优化器,学习率为0.001
。在训练过程中,我们将希望监控分类精度。
NOTE.我们将使用学习率为0.001
的rmsprop
优化器。在这种情况下,使用RMSprop优化算法比随机梯度下降(SGD)更可取,因为RMSprop可以为我们自动调整学习率。(其他优化器,如Adam和Adagrad,也会在训练过程中自动调整学习率,在这里也同样有效。)
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='binary_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
Let's train for 15 epochs -- this may take a few minutes to run.
Do note the values per epoch.
The Loss and Accuracy are a great indication of progress of training. It's making a guess as to the classification of the training data, and then measuring it against the known label, calculating the result. Accuracy is the portion of correct guesses.
让我们训练15个epochs--这可能需要几分钟的时间完成运行。
请注意每次训练后的数值。
损失和准确率是训练进展的重要指标。模型对训练数据的类别进行预测,然后根据已知标签进行评估,计算准确率。准确率是指正确预测的比例。
history = model.fit(
train_generator,
steps_per_epoch=8,
epochs=15,
verbose=1)
output
Epoch 1/15
8/8 [==============================] - 2s 277ms/step - loss: 0.6836 - acc: 0.5781
Epoch 2/15
8/8 [==============================] - 2s 279ms/step - loss: 0.6074 - acc: 0.7031
Epoch 3/15
8/8 [==============================] - 2s 272ms/step - loss: 0.3751 - acc: 0.8555
Epoch 4/15
8/8 [==============================] - 2s 264ms/step - loss: 0.7221 - acc: 0.8722
Epoch 5/15
8/8 [==============================] - 2s 266ms/step - loss: 0.2367 - acc: 0.9023
Epoch 6/15
8/8 [==============================] - 2s 231ms/step - loss: 0.1246 - acc: 0.9559
Epoch 7/15
8/8 [==============================] - 2s 264ms/step - loss: 0.1748 - acc: 0.9414
Epoch 8/15
8/8 [==============================] - 2s 274ms/step - loss: 0.1481 - acc: 0.9414
Epoch 9/15
8/8 [==============================] - 2s 287ms/step - loss: 0.2387 - acc: 0.9163
Epoch 10/15
8/8 [==============================] - 2s 258ms/step - loss: 0.1319 - acc: 0.9648
Epoch 11/15
8/8 [==============================] - 2s 267ms/step - loss: 0.1084 - acc: 0.9609
Epoch 12/15
8/8 [==============================] - 2s 269ms/step - loss: 0.1240 - acc: 0.9570
Epoch 13/15
8/8 [==============================] - 2s 267ms/step - loss: 0.1534 - acc: 0.9258
Epoch 14/15
8/8 [==============================] - 2s 242ms/step - loss: 0.1095 - acc: 0.9780
Epoch 15/15
8/8 [==============================] - 2s 270ms/step - loss: 0.0511 - acc: 0.9883
Let's now take a look at actually running a prediction using the model. This code will allow you to choose 1 or more files from your file system, it will then upload them, and run them through the model, giving an indication of whether the object is a horse or a human.
接下来,看看使用模型进行实际预测。这段代码将允许你从文件系统中选择1个或多个文件,然后它将上传它们,并通过模型判断给出图像是马还是人。
import numpy as np
from google.colab import files
from tensorflow.keras.preprocessing import image
uploaded = files.upload()
for fn in uploaded.keys():
# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a human")
else:
print(fn + " is a horse")
4.4 优化模型参数
To get a feel for what kind of features our convnet has learned, one fun thing to do is to visualize how an input gets transformed as it goes through the convnet.
Let's pick a random image from the training set, and then generate a figure where each row is the output of a layer, and each image in the row is a specific filter in that output feature map. Rerun this cell to generate intermediate representations for a variety of training images.
要想了解 convnet(卷积层)学到了什么样的特征,一个有趣的办法是将模型每个卷积层的输出当作图像可视化。
让我们从训练集中随机选取一张图像,然后将每一层的输出排列在一行里,生成一个汇总图。行中的每张图像都是一个特定过滤器输出的特征。每次运行下面这个单元的代码,就会随机显示一张图像的中间输出结果。
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)
img = load_img(img_path, target_size=(300, 300)) # this is a PIL image
x = img_to_array(img) # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape) # Numpy array with shape (1, 150, 150, 3)
# Rescale by 1/255
x /= 255
# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)
# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]
# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Just do this for the conv / maxpool layers, not the fully-connected layers
n_features = feature_map.shape[-1] # number of features in feature map
# The feature map has shape (1, size, size, n_features)
size = feature_map.shape[1]
# We will tile our images in this matrix
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):
# Postprocess the feature to make it visually palatable
x = feature_map[0, :, :, i]
x -= x.mean()
x /= x.std()
x *= 64
x += 128
x = np.clip(x, 0, 255).astype('uint8')
# We'll tile each filter into this big horizontal grid
display_grid[:, i * size : (i + 1) * size] = x
# Display the grid
scale = 20. / n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')
Before running the next exercise, run the following cell to terminate the kernel and free memory resources:
运行以下单元格可以终止内核并释放内存资源。当计算资源不够时需要进行释放。
import os, signal
os.kill(os.getpid(), signal.SIGKILL)
构造神经元网络模型时,一定会考虑需要几个卷积层?过滤器应该几个?全连接层需要几个神经元?
最先想到的肯定是手动修改那些参数,然后观察训练的效果(损失和准确度),从而判断参数的设置是否合理。但是那样很繁琐,因为参数组合会有很多,训练时间很长。再进一步,可以手动编写一些循环,通过遍历来搜索合适的参数。但是最好利用专门的框架来搜索参数,不太容易出错,效果也比前两种方法更好。
Kerastuner就是一个可以自动搜索模型训练参数的库。它的基本思路是在需要调整参数的地方插入一个特殊的对象(可指定参数范围),然后调用类似训练那样的search方法即可。
接下来首先准备训练数据和需要加载的库。
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import RMSprop
train_datagen = ImageDataGenerator(rescale=1/255)
validation_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory('/tmp/horse-or-human/',
target_size=(150, 150),batch_size=32,class_mode='binary')
validation_generator = validation_datagen.flow_from_directory('/tmp/validation-horse-or-human/',
target_size=(150, 150), batch_size=32,class_mode='binary')
from kerastuner.tuners import Hyperband
from kerastuner.engine.hyperparameters import HyperParameters
import tensorflow as tf
接着创建HyperParameters对象,然后在模型中插入Choice、Int等调参用的对象。
hp=HyperParameters()
def build_model(hp):
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(hp.Choice('num_filters_top_layer',values=[16,64],default=16), (3,3),
activation='relu', input_shape=(150, 150, 3)))
model.add(tf.keras.layers.MaxPooling2D(2, 2))
for i in range(hp.Int("num_conv_layers",1,3)):
model.add(tf.keras.layers.Conv2D(hp.Choice(f'num_filters_layer{i}',values=[16,64],default=16), (3,3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(2,2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(hp.Int("hidden_units",128,512,step=32), activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer=RMSprop(lr=0.001),metrics=['acc'])
return model
然后创建Hyperband对象,这是Kerastuner支持的四种方法的其中一种,其优点是能较四童话第查找参数。具体资料可以到Kerastuner的网站获取。
最后调用search方法。
build_model,
objective='val_acc',
max_epochs=10,
directory='horse_human_params',
hyperparameters=hp,
project_name='my_horse_human_project'
)
tuner.search(train_generator,epochs=10,validation_data=validation_generator)
搜索到最优参数后,可以通过下面的程序,用tuner对象提取最优参数构建神经元网络模型。并调用summary方法观察优化后的网络结构。
best_hps=tuner.get_best_hyperparameters(1)[0]
print(best_hps.values)
model=tuner.hypermodel.build(best_hps)
model.summary()