3.1 神经网络剖析
3.1.1 层:深度学习的基础组件 3.1.2 模型:层构成的网络 3.1.3 损失函数与优化器:配置学习过程的关键
3.2 Keras简介
3.2.1 Keras、Tensorflow、Theano和CNTK
3.3 建立深度学习工作站
Jupyter
3.4 电影评论分类:二分类问题
3.4.1 IMDB数据集 3.4.2 准备数据 3.4.3 构建网络 3.4.4 验证你的方法
3.4.5 使用训练好的网络在新数据上生成预测结果 3.4.6 进一步实验
3.5 新闻分类:多分类问题
3.5.1 路透社数据集 3.5.2 准备数据 3.5.3 构建网络 3.5.4 验证你的方法 完整代码
3.5.5 在新数据上生成预测结果 3.5.6 处理标签和损失的另一种方法
3.5.7 中间层维度足够大的重要性 3.5.8 进一步实验 3.5.9 小结
3.6 预测房价:回归问题
3.6.1 波士顿房价数据集 3.6.2 准备数据 3.6.3 构建网络 3.6.4 利用K折验证来验证你的方法
层,多层(网络模型)
输入数据和相应的目标
损失函数
优化器
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784,)))
model.add(layers.Dense(32))
双分支网络(two-branch)
多头网络(multihead)
Inception模块
损失函数(目标函数):最小化,能衡量当前任务是否已经成功完成。
优化器:决定如何基于损失函数对网络进行更新。
对于二分类问题,就可以选用二元交叉熵(binary crossentropy)损失函数;
对于多分类问题,可以用分类交叉熵(categorical crossentropy)损失函数;
对于序列数据问题,可以用联结时序分类(CTC)损失函数
特性:
目前Keras有三个后端实现:Tensorflow后端(Google)、Theano后端(蒙特利尔大学)、CNTK(Microsoft)。
其中在Tensorflow的GPU上运行时叫做NVIDIA CUDA(cuDNN)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', input_shape=(784,)))
model.add(layers.Dense(10, activation = 'softmax'))
# 下面是用函数式API定义的相同模型
input_tensor = layers.Input(shape = (784, ))
x = layers.Dense(32, activation = 'relu')(input_tensor)
output_tensor = layers.Dense(10, activation = 'softmax')(x)
model = models.Model(inputs = input_tensor, outputs = output_tensor)
定义好模型后,配置学习过程是在编译这一步,需要制定模型使用的优化器和损失函数。
from keras import optimizers
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = 'mse',
metrics = ['accuracy'])
最后,学习过程通过fit()方法将输入数据的Numpy数组传入模型,与Scikit-Learn类似。
model.fit(input_tensor, target_tensor, batch_size = 128, epochs = 10)
安装GPU 这里推荐我写的另一篇blog:win10+Anaconda3+tensorflow(gpu)+cuda9.0+cudnn7.1+ide(sublime)+好用的科学工具包配置
首推这一个,详细可以见我另外一个学习笔记Google框架:Tensorflow
来自互联网电影数据库50000条严重两极分化的评论。训练和测试各25000条。
# 3-1 加载IMDB数据库
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
print(train_data[0])
print(train_labels[0])
# 可以迅速把某条评论解码为英文单词
word_index = imdb.get_word_index()
reverse_word_index = dict(
[(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join(
[reverse_word_index.get(i - 3, '?') for i in train_data[0]])
# 输出
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
1
# 3-2 将整数序列编码为二进制矩阵
import numpy as np
def vectorize_sequences(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
print(x_train[0])
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# 输出 [0. 1. 1. ... 0. 0. 0.]
# 3-3 模型定义
from keras import models
from keras import layers
# 序列数据
model = models.Sequential()
#使用两个中间层,没吃呢个16个隐藏单元
model.add(layers.Dense(16, activation = 'relu', input_shape = (10000,) ))
model.add(layers.Dense(16, activation = 'relu'))
# 用sigmoid对最后一层激活输出一个0-1范围内概率值
model.add(layers.Dense(1, activation = 'sigmoid'))
在编译器处选择优化器和激活函数,这里使用RMSProp优化器和binary_crossentropy损失函数来配置模型
# 3-4 编译模型
model.compile(optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
卓和可以自己通过向loss和metrics参数传入函数对象
# 3-5 配置优化器
from keras import optimizers
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
# 3-6 使用自定义的损失和指标
from keras import losses
from keras import metrics
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = losses.binary_crossentropy,
metrics = [metrics.binary_accuracy])
为了监控训练过程中在未见数据集上的精度,需要将原始训练的数据留出10000个样本作为验证集。
# 3-7 留出验证集
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
# 3-8 训练模型
model.compile(optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = ['acc'])
# history 会记录每一次的运行结果
history = model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = (x_val, y_val))
history_dict = history.history
history_dict.keys()
# dict_keys(['val_acc', 'acc', 'val_loss', 'loss'])
# 3-9 绘制训练损失和验证损失
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss_values, 'b', label = 'Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()
# 3-10 绘制训练精度和验证精度
plt.clf()
acc = history_dict['acc']
val_acc = history_dict['val_acc']
plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# 从头开始训练一个模型
model = models.Sequential()
model.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
# [0.28807684420585633, 0.88504]
model.predict(x_test)
model.predict(x_test)
model.predict(x_test)
array([[0.23583308],
[0.99974436],
[0.85348594],
...,
[0.12692562],
[0.06978337],
[0.6001638 ]], dtype=float32)
# 使用三个隐藏层
model1 = models.Sequential()
model1.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(16, activation='relu'))
model1.add(layers.Dense(16, activation='relu'))
model1.add(layers.Dense(1, activation='sigmoid'))
model1.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model1.fit(x_train, y_train, epochs=4, batch_size=512)
results1 = model1.evaluate(x_test, y_test)
print(results1)
# 使用一个隐藏层
model2 = models.Sequential()
model2.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model2.add(layers.Dense(1, activation='sigmoid'))
model2.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model2.fit(x_train, y_train, epochs=4, batch_size=512)
results2 = model2.evaluate(x_test, y_test)
print(results2)
# 使用三个隐藏层
model1 = models.Sequential()
model1.add(layers.Dense(32, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(32, activation='relu'))
model1.add(layers.Dense(1, activation='sigmoid'))
model1.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model1.fit(x_train, y_train, epochs=4, batch_size=512)
results1 = model1.evaluate(x_test, y_test)
print(results1)
# 使用一个隐藏层
model2 = models.Sequential()
model2.add(layers.Dense(64, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(64, activation='relu'))
model2.add(layers.Dense(1, activation='sigmoid'))
model2.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model2.fit(x_train, y_train, epochs=4, batch_size=512)
results2 = model2.evaluate(x_test, y_test)
print(results2)
model = models.Sequential()
model.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='mse',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
model = models.Sequential()
model.add(layers.Dense(16, activation = 'tanh', input_shape=(10000,)))
model.add(layers.Dense(16, activation='tanh'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
数据包含许多短新闻及对应的主题。
# 3-12 加载路透社数据集
from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
print(len(train_data))
print(len(test_data))
# 3-13 将索引解码为新闻文本
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ''.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])
将测试数据和训练数据向量化
# 3-14 编码数据
import numpy as np
def vetorize_sequences(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
one-hot 编码:将标签向量化
def to_one_hot(labels, dimension = 46):
results = np.zeros((len(labels), dimension))
for i,label in enumerate(labels):
results[i, label] = 1
return results
# 使用one-hot编码来作为标签列表的分类编码
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
# 使用to_categorical将标签向量化
'''
from keras.utils.np_utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
'''
从imdb的二分类变成现在46分类,每层使用64个单元。
# 3-15 模型定义
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
# 3-16 编译模型
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# 3-17 留出验证集
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
# 3-18 训练模型
history = model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data=(x_val, y_val))
# 3-19 绘制训练损失和验证损失
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss, 'b', label = 'Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# 3-20 绘制训练精度和验证精度
import matplotlib.pyplot as plt
loss = history.history['acc']
val_loss = history.history['val_acc']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# 3-21 从头开始训练一个模型
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs=9,
batch_size=512,
validation_data=(x_val, y_val))
results = model.evaluate(x_test, one_hot_test_labels)
predict()
# 3-22 在新数据上生成预测结果
predictions = model.predict(x_test)
print(predictions[0].shape)
# (46,)
print(np.sum(predictions[0]))
# 0.9999998
print(np.argmax(predictions[0]))
# 3
y_train = np.array(train_labels)
y_test = np.array(test_labels)
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['acc'])
中间层的单元一定要大于分类数
# 3-23 具有信息瓶颈足够大的重要性
model = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics=['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 128,
validation_data = (x_val, y_val))
# acc: 0.8582 - val_loss: 1.5473 - val_acc: 0.7200
精度为72,下降了接近10个百分比。
one-hot编码,categorical_crossentropy作为损失函数
将标签编码为整数,使用sparse_categorical_crossentropy为损失函数
预测的是连续值,而不是离散的标签。
输入的数据每个特征是有不同的取值范围。
# 3-24 加载波士顿房价数据
from keras.datasets import boston_housing
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
print(train_data.shape)
# (404, 13)
print(test_data.shape)
# (102, 13)
print(train_targets)
对数据特征标准化(normalize),对输入数据减去平均值,再除以标准差。
# 3-25 准备数据
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /=std
test_data -= mean
test_data /= std
由于数据样本很小,使用一个小网络,减少过拟合程度。
from keras import models
from keras import layers
def build_model():
model = models.Sequential()
model.add(layers.Dense(64, activation='relu',input_shape=(train_data.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
return model
K折交叉验证:份数据集分成K份,每次在K-1个模型上训练
# 3-27 K折验证
import numpy as np
k = 4
num_val_samples = len(train_data) // k
num_epochs = 100
all_scores = []
for i in range(k):
print('processing fold #', i)
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# concatenate()拼接数组
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis = 0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis = 0)
model = build_model()
model.fit(partial_train_data, partial_train_targets,
epochs=num_epochs, batch_size=1, verbose=0)
val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
all_scores.append(val_mae)
记录每一轮的验证分数(下面注意书中代码有误,history的key只有val_loss)
# 3-28 保存每折的验证结果
import numpy as np
k = 4
num_val_samples = len(train_data) // k
num_epochs = 500
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# concatenate()拼接数组
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis = 0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis = 0)
model = build_model()
history = model.fit(partial_train_data, partial_train_targets,
validation_data = (val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_loss']
all_mae_histories.append(mae_history)
# 3-29 计算所有轮次中的K折验证分数平均值
`average_mae_history = [np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
# 3-30 绘制验证分数
import matplotlib.pyplot as plt
plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
因为纵轴范围太大,删除前10个数据点,重新绘制
def smooth_curve(points, factor = 0.9):
smoothed_points = []
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous * factor + point * (1 - factor))
else:
smoothed_points.append(point)
return smoothed_points
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
使用最后训练的模型参数,再在所有数据上训练处最终的生产模型,观察在测试集上的性能。
# 3-32 训练最终模型
model = build_model()
model.fit(train_data, train_targets, epochs = 80, batch_size = 16, verbose=0)
test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)
print(test_mae_score)
102/102 [==============================] - 1s 5ms/step
19.081563463398055