!pip3 install tensorflow==2.0.0a0
%matplotlib inline
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: tensorflow==2.0.0a0 in /usr/local/lib/python3.7/site-packages (2.0.0a0)
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (3.7.0)
Requirement already satisfied: termcolor>=1.1.0 in ./Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.1.0)
Requirement already satisfied: tb-nightly<1.14.0a20190302,>=1.14.0a20190301 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0a20190301)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.19.0)
Requirement already satisfied: absl-py>=0.7.0 in ./Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.0)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.33.1)
Requirement already satisfied: numpy<2.0,>=1.14.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.16.2)
Requirement already satisfied: astor>=0.6.0 in ./Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.9)
Requirement already satisfied: tf-estimator-nightly<1.14.0.dev2019030116,>=1.14.0.dev2019030115 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0.dev2019030115)
Requirement already satisfied: google-pasta>=0.1.2 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.1.4)
Requirement already satisfied: keras-applications>=1.0.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.7)
Requirement already satisfied: gast>=0.2.0 in ./Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.2.2)
Requirement already satisfied: six>=1.10.0 in ./Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.12.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow==2.0.0a0) (40.8.0)
Requirement already satisfied: markdown>=2.6.8 in ./Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (3.0.1)
Requirement already satisfied: werkzeug>=0.11.15 in ./Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (0.14.1)
Requirement already satisfied: h5py in ./Library/Python/3.7/lib/python/site-packages (from keras-applications>=1.0.6->tensorflow==2.0.0a0) (2.9.0)
tf.keras
进行训练和评估本教程包含了两种方式使用Tensorflow 2.0
对模型进行训练、评估和推理。
使用内建的API函数进行训练和验证,如如model.fit()
、model.evaluate()
和model.predict()
,该部分包含在“使用内建的方法训练和评估”目录下
使用GradientTape
和eager
执行器从头开始编写自定义的循环,该部分包含在“从头开始编写训练和评估循环”目录下
无论你采用上述的那种方式,无论对于Sequential
模型、功能API创建的模型亦或是编写的继承的模型,模型的训练和评估都是严格按照相同的方式进行。
该教程不包含分布式训练 。
当使用模型内建的方法进行训练的时候,你必须为模型提供Numpy
类型的数据或者tf.data.Dataset
类。在下面的例子中,我们会使用mnist
的数据来示范如何使用optimizers
,losses
和metrics
。
我们使用下面的模型(这里使用的是功能API创建的,也可以使用Sequential
或者子类继承的方式创建)
from tensorflow import keras
import tensorflow as tf
inputs = keras.Input(shape=(784, ), name='digits')
x = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = keras.layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
这是一个典型的端到端的模型,包含训练、对训练数据进行分割用于验证以及使用测试数据进行评估。
# 加载mnist数据
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# 对数据进行处理(这里是numpy类型的数据)
x_train = x_train.reshape(60000, 784).astype('float32')/255.0
x_test = x_test.reshape(10000, 784).astype('float32')/255.0
# 拆分出验证集
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# 配置训练参数(包括optimizer, loss, metrics)
model.compile(optimizer=keras.optimizers.RMSprop(),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=[keras.metrics.SparseCategoricalAccuracy()])
# 训练,输出会保留在history中
print("Fit model on training data")
history = model.fit(x_train, y_train, batch_size=64, epochs=3,
# 使用验证数据进行验证
validation_data=(x_val, y_val))
# 输出
print('history dict: ', history.history)
# 测试
print('\nEvaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc: ', results)
# 推理
print('\nGenerate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape: ', predictions.shape)
Fit model on training data
Train on 50000 samples, validate on 10000 samples
Epoch 1/3
50000/50000 [==============================] - 3s 51us/sample - loss: 0.3450 - sparse_categorical_accuracy: 0.9020 - val_loss: 0.1878 - val_sparse_categorical_accuracy: 0.9446
Epoch 2/3
50000/50000 [==============================] - 2s 42us/sample - loss: 0.1640 - sparse_categorical_accuracy: 0.9505 - val_loss: 0.1512 - val_sparse_categorical_accuracy: 0.9560
Epoch 3/3
50000/50000 [==============================] - 2s 43us/sample - loss: 0.1200 - sparse_categorical_accuracy: 0.9630 - val_loss: 0.1267 - val_sparse_categorical_accuracy: 0.9611
history dict: {'loss': [0.34502086482048033, 0.16395757224321367, 0.12004149877429009], 'sparse_categorical_accuracy': [0.902, 0.95052, 0.963], 'val_loss': [0.1877518826007843, 0.1512176885008812, 0.12673389554023742], 'val_sparse_categorical_accuracy': [0.9446, 0.956, 0.9611]}
Evaluate on test data
10000/10000 [==============================] - 0s 14us/sample - loss: 0.1220 - sparse_categorical_accuracy: 0.9625
test loss, test acc: [0.12199614572525025, 0.9625]
Generate predictions for 3 samples
predictions shape: (3, 10)
optimizer
、loss
和metrics
你必须指定优化器、损失函数以及用于观测的量(可选)才能够对模型进行训练。这些都是在compile
方法中进行指定。
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=[keras.metrics.SparseCategoricalAccuracy()])
metrics
必须是一个列表,你可以为模型指定任意多个metrics
。
如果你的模型有多个输出,那么你可以为每个输出指定不同的loss
和metrics
,以及每个输出的loss
所占的权重,你可以在多输入多输出模型的数据载入
那一段找到更多关于此的细节。
值得注意的是,很多情况下,loss
和metrics
可以通过简化的字符型名字来指定。
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
为了后面简化代码,我这里把模型的定义和配置写作函数。
def get_uncompiled_model():
inputs = keras.Input(shape=(784, ), name='digits')
x = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = keras.layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
return keras.Model(inputs=inputs, outputs=outputs)
def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
return model
optimizer
、loss
和metrics
一般情况下,你无需自己从头开始编写optimizer
、loss
和metrics
,因为tf.keras
已经内建了这些函数。
Optimizers
:SGD()
(包含动量或者不包含)、RMSprop()
、Adam()
等。
Losses
:MeanSquaredError()
、KLDIvergence()
、CosineSimilarity()
等。
Metrics
:AUC()
、Precision()
、Recall()
等。
loss
和metrics
如果你需要自己编写metrics
,那么你可以通过继承Metric
类来实现,你必须实现下面4个方法:
__init__(self)
,该函数中你可以创建相应的状态变量update_state(self, y_true, y_pred, sample_weight=None)
,使用真实值和预测值来更新状态变量result(self)
,通过状态变量来计算最终的输出结果reset_states(self)
,来为metrics
进行复位状态的更新和计算结果需要分开进行(update_state
函数和result
函数分别负责),这是因为在有些例子中,计算结果的代价十分昂贵,只能周期的进行。
下面是一个实现CategoricalTruePositives
的例子,用于统计多少样本被正确分类了:
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name='binary_true_positives', **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name='tp', initializer='zeros')
def update_state(self, y_true, y_pred, sample_weight=None):
y_pred = tf.argmax(y_pred)
values = tf.cast(tf.equal(tf.cast(y_pred, tf.int32), tf.cast(y_true, tf.int32)), tf.float32)
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, tf.float32)
values = tf.multiply(values, sample_weight)
return self.true_positives.assign_add(tf.reduce_sum(values))
def result(self):
return tf.identity(self.true_positives)
def reset_state(self):
return self.true_positives.assign(0.)
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy',
metrics=[CategoricalTruePositives()])
model.fit(x_train, y_train, batch_size=64, epochs=3)
Epoch 1/3
50000/50000 [==============================] - 2s 44us/sample - loss: 0.0951 - binary_true_positives: 8108.0000
Epoch 2/3
50000/50000 [==============================] - 2s 37us/sample - loss: 0.0769 - binary_true_positives: 8154.0000
Epoch 3/3
50000/50000 [==============================] - 2s 37us/sample - loss: 0.0659 - binary_true_positives: 8378.0000
loss
和metrics
虽然大部分的loss
和metrics
可以使用真实值以及模型的输出计算,但是这并不是绝对的。例如正则化损失函数就仅需要激活层,但激活层并不一定是模型的输出。
这种情况下,可以在自定义的层的call
方法中使用self.add_loss(loss_value)
函数来添加损失函数。下面是一个添加激活正则化的例子(在tf.keras
的所有内建层中,激活正则化均已经实现,这里只是为了演示)
class ActivityRegularizationLayer(keras.layers.Layer):
def call(self, inputs):
self.add_loss(tf.reduce_sum(inputs) * 0.1)
return inputs
inputs = keras.Input(shape=(784, ), name='digits')
x = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = ActivityRegularizationLayer()(x)
x = keras.layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train, batch_size=64, epochs=1)
50000/50000 [==============================] - 2s 45us/sample - loss: 2.5083
你可以对metrics
采用同样的添加方式。
class MetricsLoggingLayer(keras.layers.Layer):
def call(self, inputs):
self.add_metric(keras.backend.std(inputs), name='std_of_activation', aggregation='mean')
return inputs
inputs = keras.Input(shape=(784, ), name='digits')
x = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = MetricsLoggingLayer()(x)
x = keras.layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train, batch_size=64, epochs=1)
50000/50000 [==============================] - 2s 45us/sample - loss: 0.3379 - std_of_activation: 0.9304
对于使用功能API的情况,你可以调用模型的add_loss(loss_tensor)
或者add_metric(metric_tensor, name, aggregation)
函数来添加。
下面是一个例子:
inputs = keras.Input(shape=(784, ), name='digits')
x1 = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = keras.layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
model.add_loss(tf.reduce_sum(x1) * 0.1)
model.add_metric(keras.backend.std(x1), name='std_of_activation', aggregation='mean')
model.compile(optimizer=keras.optimizers.RMSprop(lr=1e-3),
loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train, batch_size=64, epochs=1)
50000/50000 [==============================] - 2s 44us/sample - loss: 2.5026 - std_of_activation: 0.0020
在上面介绍的端到端的模型中,我们使用validation_data
参数向模型传递Numpy
类型的验证数据,该验证过程在每一个epoch
结束的时候自动进行,并计算loss
和metrics
。
不过我们也有其他实现该功能的方式:validation_split
参数,该参数允许你自动从训练集中拆分出验证集,它反映了验证集占训练集的比例,因此是一个介于0~1
之间的数字。例如validation_split=0.2
表示使用20%
的数据进行验证,validation_split=0.6
表示使用60%
的数据进行验证。
验证集的拆分是根据fit
方法接受的训练数据,并且在乱序之前进行的。
你可以在使用Numpy
类型训练数据时单独validation_split
参数。
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, epochs=3, validation_split=0.2)
Train on 40000 samples, validate on 10000 samples
Epoch 1/3
40000/40000 [==============================] - 2s 55us/sample - loss: 0.3716 - sparse_categorical_accuracy: 0.8937 - val_loss: 0.2320 - val_sparse_categorical_accuracy: 0.9297
Epoch 2/3
40000/40000 [==============================] - 2s 45us/sample - loss: 0.1752 - sparse_categorical_accuracy: 0.9484 - val_loss: 0.1841 - val_sparse_categorical_accuracy: 0.9429
Epoch 3/3
40000/40000 [==============================] - 2s 42us/sample - loss: 0.1259 - sparse_categorical_accuracy: 0.9631 - val_loss: 0.1617 - val_sparse_categorical_accuracy: 0.9528
tf.data
的数据集进行训练和验证在上面的几段中,已经介绍了在使用Numpy
数据的情况下,如何指定loss
、metrics
和optimizer
,以及如何使用validation_data
参数和validation_split
参数进行验证。
下面开始介绍如何在使用tf.data
的数据集的情况下,实现上述的功能。
TensorFlow 2.0
提供了一组函数API用于实现快速、可扩展的数据载入和处理方式。
更详细关于tf.data
的介绍,可以查阅相关的官方文档。
你可以直接向fit()
、evaluate()
和predict()
直接传递tf.data
对象。
model = get_compiled_model()
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(64)
model.fit(train_dataset, epochs=3)
print('\n Evaluate: ')
model.evaluate(test_dataset)
Epoch 1/3
782/782 [==============================] - 4s 5ms/step - loss: 0.3378 - sparse_categorical_accuracy: 0.9052A: 1s - loss: 0.4109 - sparse_categor
Epoch 2/3
782/782 [==============================] - 4s 5ms/step - loss: 0.1578 - sparse_categorical_accuracy: 0.9532
Epoch 3/3
782/782 [==============================] - 3s 3ms/step - loss: 0.1137 - sparse_categorical_accuracy: 0.9652
Evaluate:
157/157 [==============================] - 0s 3ms/step - loss: 0.1341 - sparse_categorical_accuracy: 0.9602
[0.13405862184698178, 0.9602]
可以看到,数据集在每个epoch
结束的时候被重置了,是可以重复使用的。
如果需要仅训练指定的批次数,那么可以通过指定steps_per_epoch
来实现。指定该参数后,在每个epoch
仅仅会训练指定的批次数,然后就会切换到下个epoch
。
如果设置的steps_per_epoch
大于每个epoch
的最大批次数,数据集不会在每个epoch
的最后进行重置,而是尝试获取下一个批次的数据,但是数据集还是会耗尽(除非数据集中的数据是无限的),这样就会抛出异常。
model = get_compiled_model()
model.fit(train_dataset, epochs=3, steps_per_epoch=100)
Epoch 1/3
100/100 [==============================] - 1s 11ms/step - loss: 0.7947 - sparse_categorical_accuracy: 0.7967
Epoch 2/3
100/100 [==============================] - 0s 4ms/step - loss: 0.3605 - sparse_categorical_accuracy: 0.8981
Epoch 3/3
100/100 [==============================] - 0s 5ms/step - loss: 0.3116 - sparse_categorical_accuracy: 0.9095
model = get_compiled_model()
model.fit(train_dataset, epochs=3, steps_per_epoch=1000)
Epoch 1/3
782/1000 [======================>.......] - ETA: 0s - loss: 0.3400 - sparse_categorical_accuracy: 0.9035
WARNING: Logging before flag parsing goes to stderr.
W0401 16:30:26.257521 4726285760 training_generator.py:228] Your dataset ran out of data; interrupting training. Make sure that your dataset can generate at least `steps_per_epoch * epochs` batches (in this case, 3000 batches). You may need to use the repeat() function when building your dataset.
可以通过validation_data
参数传递一个tf.data
数据集对象来进行验证。
model = get_compiled_model()
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)
model.fit(train_dataset, epochs=3, validation_data=val_dataset)
Epoch 1/3
782/782 [==============================] - 4s 5ms/step - loss: 0.3431 - sparse_categorical_accuracy: 0.9030 - val_loss: 0.2100 - val_sparse_categorical_accuracy: 0.9361
Epoch 2/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1580 - sparse_categorical_accuracy: 0.9530 - val_loss: 0.1550 - val_sparse_categorical_accuracy: 0.9532
Epoch 3/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1163 - sparse_categorical_accuracy: 0.9649 - val_loss: 0.1329 - val_sparse_categorical_accuracy: 0.9590
在每个epoch
的末尾,模型会访问验证数据集,来计算验证loss
以及metrics
。
如果仅需要使用部分批次的验证集数据进行验证,那么可以指定validation_steps
参数,该参数会在切换到下个批次前使用指定的验证集批次进行验证。
model = get_compiled_model()
model.fit(train_dataset, epochs=3, validation_data=val_dataset, validation_steps=10) #使用验证集的前10个批次进行验证
Epoch 1/3
782/782 [==============================] - 4s 5ms/step - loss: 0.3367 - sparse_categorical_accuracy: 0.9052 - val_loss: 0.3116 - val_sparse_categorical_accuracy: 0.9109
Epoch 2/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1521 - sparse_categorical_accuracy: 0.9547 - val_loss: 0.2327 - val_sparse_categorical_accuracy: 0.9281
Epoch 3/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1096 - sparse_categorical_accuracy: 0.9672 - val_loss: 0.1995 - val_sparse_categorical_accuracy: 0.9469
需要注意的是,验证集在每次使用后都会被重置(所以在每个epoch
结束时使用的都是相同的批次进行验证)。
当使用tf.data
数据集进行训练时,参数validation_split
是不能使用的。
除了Numpy
数据和tf.data
数据集外,Panda
数据结构以及Python
的生成表达式数据也是支持的。
一般情况下,在数据集比较小时(可以放入内存),推荐使用Numpy
的格式,否则推荐使用tf.data
数据格式。
除了输入数据和标签外,在训练时,可能还需要传入样本权重和类权重。
Numpy
类型数据时,通过sample_weight
和class_weight
参数tf.data
类型数据时,需要tf.data
对象返回一个元组,包括(input_batch, target_batch, sample_weight_batch)
sample_weight
是一个数字集合(list
),该集合标记了在计算loss
时,每个样本所占的权重。常用于非平衡的分类问题中(虽然十分少见),当权重设置为0或1时,也可以作为一个样本的筛选(指定哪些样本对loss
没有贡献)。
class_weight
则是一个字典,它指定了每个类别在计算loss
时所占的权重。比如说类别0
的权重是类别1
的两倍,那么可以这么指定class_weight={0:1.0, 1:0.5}
。
下面是一个Numpy
类型数据的例子,我们通过sample_weight
和class_weight
为第五个类别(数字5
)设定更高的权重。
import numpy as np
model = get_compiled_model()
class_weight = {0:1., 1:1., 2:1., 3:1., 4:1., 5:2., 6:1., 7:1., 8:1., 9:1.}
model.fit(x_train, y_train, epochs=4, batch_size=64, class_weight=class_weight)
model = get_compiled_model()
sample_weight = np.ones(shape=len(y_train))
sample_weight[y_train == 5] = 2.
model.fit(x_train, y_train, epochs=4, batch_size=64, sample_weight=sample_weight)
Epoch 1/4
50000/50000 [==============================] - 2s 43us/sample - loss: 0.3731 - sparse_categorical_accuracy: 0.9009
Epoch 2/4
50000/50000 [==============================] - 2s 39us/sample - loss: 0.1743 - sparse_categorical_accuracy: 0.9511
Epoch 3/4
50000/50000 [==============================] - 2s 39us/sample - loss: 0.1279 - sparse_categorical_accuracy: 0.9639
Epoch 4/4
50000/50000 [==============================] - 2s 39us/sample - loss: 0.1013 - sparse_categorical_accuracy: 0.9716
Epoch 1/4
50000/50000 [==============================] - 2s 45us/sample - loss: 0.3715 - sparse_categorical_accuracy: 0.9017
Epoch 2/4
50000/50000 [==============================] - 2s 39us/sample - loss: 0.1742 - sparse_categorical_accuracy: 0.9521
Epoch 3/4
50000/50000 [==============================] - 2s 37us/sample - loss: 0.1298 - sparse_categorical_accuracy: 0.9642
Epoch 4/4
50000/50000 [==============================] - 2s 38us/sample - loss: 0.1039 - sparse_categorical_accuracy: 0.97031s - loss: 0.0905 - spa
这里是一个相对应的使用tf.data
数据集的例子.
model = get_compiled_model()
sample_weight = np.ones(shape=len(y_train))
sample_weight[y_train == 5] = 2.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train, sample_weight))
train_dataset = train_dataset.shuffle(buffer_size=1000).batch(64)
model.fit(train_dataset, epochs=3)
Epoch 1/3
782/782 [==============================] - 4s 5ms/step - loss: 0.3678 - sparse_categorical_accuracy: 0.9034
Epoch 2/3
782/782 [==============================] - 3s 3ms/step - loss: 0.1743 - sparse_categorical_accuracy: 0.9526
Epoch 3/3
782/782 [==============================] - 3s 3ms/step - loss: 0.1286 - sparse_categorical_accuracy: 0.9643
在上面的例子中,模型只是单输入(尺寸为(764, )的张量)和单输出(尺寸为(10, )的张量),但是如果模型有多个输入或输出该如何处理?
考虑下面一个模型,模型的一个输入时图片(32, 32, 3)
,另一个输入是一个时间序列(None, 10)
(分别对应时间和特征)。模型的输出为两个:评分(1, )
和基于5类别分类的置信度(5, )
,模型的结构如下:
image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')
x1 = keras.layers.Conv2D(3, 3)(image_input)
x1 = keras.layers.GlobalMaxPooling2D()(x1)
x2 = keras.layers.Conv1D(3, 3)(timeseries_input)
x2 = keras.layers.GlobalMaxPooling1D()(x2)
x = keras.layers.concatenate([x1, x2])
score_output = keras.layers.Dense(1, name='score_output')(x)
class_output = keras.layers.Dense(5, name='class_output')(x)
model = keras.Model(inputs=[image_input, timeseries_input], outputs=[score_output, class_output])
下面我们将该模型的结构输出成图片。
keras.utils.plot_model(model, 'model.png', show_shapes=True)
在设置模型参数时,我们可以为每个输出指定不同的loss
计算方法。
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(), keras.losses.CategoricalCrossentropy()])
如果只指定一个loss
,那么所有的输出都会使用这个loss
的计算方法。
对于metrics
也是类似。
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(), keras.losses.CategoricalCrossentropy()],
metrics=[[keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError()],
[keras.metrics.CategoricalAccuracy()]])
由于我们为每一个输出指定了名字,那么我们就可以通过字典的方式传递loss
和metrics
。
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy()},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError()],
'class_output': keras.metrics.CategoricalAccuracy()})
如果你的输出多于两个,那么推荐使用通过名称和字典的方式进行传递。
如果想自定义设置每个输出在loss
中的权重,可以通过loss_weight
参数实现。比如说给予score_output
2倍的权重,增大score_output
对loss
的影响。
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy()},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError()],
'class_output': keras.metrics.CategoricalAccuracy()},
loss_weight={'score_output': 2., 'class_weight': 1.})
当然,如果一个输出对loss
没有贡献,也可以在计算的时候忽略该输出。
# 列表方式传递
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[None, keras.losses.CategoricalCrossentropy()])
# 或者字典方式
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss={'class_output': keras.losses.CategoricalCrossentropy()})
W0401 16:31:39.905513 4726285760 training_utils.py:1152] Output score_output missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to score_output.
在使用fit
对模型进行训练时,参数传递的方法和compile
类似,既可以通过列表的方式传递Numpy
数据,也可以根据指定的输入名通过字典的方式进行传递。
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(), keras.losses.CategoricalCrossentropy()])
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))
# 使用列表
model.fit([img_data, ts_data], [score_targets, class_targets], batch_size=32, epochs=3)
# 使用字典
model.fit({'img_input': img_data, 'ts_input': ts_data}, {'score_output': score_targets, 'class_output': class_targets}, batch_size=32, epochs=3)
Epoch 1/3
100/100 [==============================] - 0s 3ms/sample - loss: 10.5777 - score_output_loss: 0.1472 - class_output_loss: 10.4305
Epoch 2/3
100/100 [==============================] - 0s 376us/sample - loss: 18.5945 - score_output_loss: 0.2185 - class_output_loss: 18.3760
Epoch 3/3
100/100 [==============================] - 0s 386us/sample - loss: 21.3233 - score_output_loss: 0.2348 - class_output_loss: 21.0885
Epoch 1/3
100/100 [==============================] - 0s 1ms/sample - loss: 20.0684 - score_output_loss: 0.2963 - class_output_loss: 19.7721
Epoch 2/3
100/100 [==============================] - 0s 1ms/sample - loss: 18.5125 - score_output_loss: 0.2869 - class_output_loss: 18.2257
Epoch 3/3
100/100 [==============================] - 0s 1ms/sample - loss: 18.1863 - score_output_loss: 0.2465 - class_output_loss: 17.9398
下面是一个在使用tf.data
数据集时候,采用名称字典的做法。
train_data = tf.data.Dataset.from_tensor_slices(({'img_input': img_data, 'ts_input': ts_data}, {'score_output': score_targets, 'class_output': class_targets}))
train_data = train_data.shuffle(buffer_size=1024).batch(64)
model.fit(train_data, epochs=3)
Epoch 1/3
2/2 [==============================] - 0s 42ms/step - loss: 17.8938 - score_output_loss: 0.2505 - class_output_loss: 17.8035
Epoch 2/3
2/2 [==============================] - 0s 50ms/step - loss: 17.5434 - score_output_loss: 0.2285 - class_output_loss: 17.4333
Epoch 3/3
2/2 [==============================] - 0s 56ms/step - loss: 17.4054 - score_output_loss: 0.2073 - class_output_loss: 17.3286
但是个人不推荐这种方式。
tf.keras
提供了一系列的回调函数对象,这些对象可以在训练过程的不同时间点被调用(epoch
的开始和结束、批的结束等等),通过这些调用,可以执行如下的操作:
epoch
后进行验证)fit
方法时,使用列表的方式通过callbacks
参数指定。model = get_compiled_model()
callbacks = [keras.callbacks.EarlyStopping(
# 监测val_loss的值
monitor='val_loss',
# 如果val_loss的提升小于1e-2,那么就停止训练
min_delta=1e-2,
# 这种情况需要至少位置2个epochs才会停止
patience=2, verbose=1)]
model.fit(x_train, y_train, batch_size=64, epochs=20, callbacks=callbacks, validation_split=0.2)
Train on 40000 samples, validate on 10000 samples
Epoch 1/20
40000/40000 [==============================] - 2s 53us/sample - loss: 0.3721 - sparse_categorical_accuracy: 0.8964 - val_loss: 0.2575 - val_sparse_categorical_accuracy: 0.9213
Epoch 2/20
40000/40000 [==============================] - 2s 47us/sample - loss: 0.1726 - sparse_categorical_accuracy: 0.9493 - val_loss: 0.1694 - val_sparse_categorical_accuracy: 0.9483
Epoch 3/20
40000/40000 [==============================] - 2s 42us/sample - loss: 0.1247 - sparse_categorical_accuracy: 0.9631 - val_loss: 0.1645 - val_sparse_categorical_accuracy: 0.9507
Epoch 4/20
40000/40000 [==============================] - 2s 44us/sample - loss: 0.0987 - sparse_categorical_accuracy: 0.9707 - val_loss: 0.1579 - val_sparse_categorical_accuracy: 0.9541
Epoch 5/20
40000/40000 [==============================] - 2s 43us/sample - loss: 0.0811 - sparse_categorical_accuracy: 0.9761 - val_loss: 0.1279 - val_sparse_categorical_accuracy: 0.9638
Epoch 6/20
40000/40000 [==============================] - 2s 43us/sample - loss: 0.0684 - sparse_categorical_accuracy: 0.9801 - val_loss: 0.1444 - val_sparse_categorical_accuracy: 0.9567
Epoch 7/20
40000/40000 [==============================] - 2s 48us/sample - loss: 0.0579 - sparse_categorical_accuracy: 0.9830 - val_loss: 0.1375 - val_sparse_categorical_accuracy: 0.9620
Epoch 00007: early stopping
ModelChekpoint
,保存模型EarlyStopping
,当验证的参数不再提升时,自动停止训练TensorBoard
,周期性的写日志,以便可以使用TensorBoard
进行可视化CSVLogger
,保存训练的loss
和观测值到CSV
文件可以通过继承keras.callbacks.Callback
类来实现自己的回调函数类,该类通过self.model
变量来访问关联的模型。
下面是一个简单的保存loss
的回调:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs):
self.losses = []
def on_batch_end(self, batch, logs):
self.losses.append(logs.get('loss'))
如果你在一个较大的数据集上训练模型,那么周期性的保存模型快照就很有必要。
最简单的保存模型快照的方法就是使用ModelCheckpoint
回调。
model = get_compiled_model()
callbacks = [keras.callbacks.ModelCheckpoint(
# 保存文件的路径名称
filepath='model_{epoch}.h5',
# 通过监测cal_loss来保存表现最好的模型
save_best_only=True,
monitor='val_loss', verbose=1)]
model.fit(x_train, y_train, batch_size=64, epochs=3, callbacks=callbacks, validation_split=0.2)
Train on 40000 samples, validate on 10000 samples
Epoch 1/3
38976/40000 [============================>.] - ETA: 0s - loss: 0.3799 - sparse_categorical_accuracy: 0.8924
Epoch 00001: val_loss improved from inf to 0.22875, saving model to model_1.h5
40000/40000 [==============================] - 2s 52us/sample - loss: 0.3753 - sparse_categorical_accuracy: 0.8936 - val_loss: 0.2287 - val_sparse_categorical_accuracy: 0.9311
Epoch 2/3
39872/40000 [============================>.] - ETA: 0s - loss: 0.1769 - sparse_categorical_accuracy: 0.9474
Epoch 00002: val_loss improved from 0.22875 to 0.18618, saving model to model_2.h5
40000/40000 [==============================] - 2s 50us/sample - loss: 0.1770 - sparse_categorical_accuracy: 0.9474 - val_loss: 0.1862 - val_sparse_categorical_accuracy: 0.9436
Epoch 3/3
39552/40000 [============================>.] - ETA: 0s - loss: 0.1284 - sparse_categorical_accuracy: 0.9620- ETA: 0s - loss: 0.1354 - sparse_categorical_
Epoch 00003: val_loss improved from 0.18618 to 0.15636, saving model to model_3.h5
40000/40000 [==============================] - 2s 48us/sample - loss: 0.1280 - sparse_categorical_accuracy: 0.9621 - val_loss: 0.1564 - val_sparse_categorical_accuracy: 0.9534
当然,你也可以自己实现保存模型的回调。
当训练比较深的网络时,一种常见的技巧就是逐渐减小学习率,一般称为学习率衰减。
学习率衰减可以是静态的(根据一定的算法,每个epoch
或者批进行衰减),也可以是动态的(根据验证过程loss
的值动态进行调整)。
可以通过传递一个学习率衰减方法的对象给优化器,来实现静态学习率衰减。
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate, decay_steps=100000, decay_rate=.96, staircase=True)
optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)
keras
还提供了其他内建的学习率衰减方法:ExponentialDecay
(指数衰减)、PiecewiseConstantDecay
(分段常数)、PolynomialDecay
(多项式衰减)、InverseTimeDecay
(逆时间衰减)等。
由于上面的学习率衰减方法对象无法访问到验证时监测的参数,因此不能实现动态学习率衰减(比如说,可以在验证loss
不再提升的时候进行学习率衰减)。
但是回调是可以访问到所有的监测参数的,因此可以通过回调来修改优化器的学习率。事实上,tf.keras
已经内建了类似ReduceLROnPlateau
的方法。
loss
的可视化在训练的过程中,最好监测训练过程的办法是使用TensorBoard
,这是一个基于网页的可视化工具,你可以在本地运行它,能够提供如下的功能:
loss
和监测参数如果你使用pip
安装的tensorflow
,那么就可以使用下面的命令启动TensorBoard
。
tensorboard --logdir=/full_path_to_your_logs
TensorBoard
回调最简单使用TensorBoard
的方法就是在调用fit
方法时使用相应的回调,此时仅需要填写日志文件的路径就可以。
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(train_data, epochs=10, callbacks=[tensorboard_cbk])
TensorBoard
回调提供了很多有用的参数,除了可以指定日志的路径,还可以指定是否记录嵌入空间以及矩形图,还可以指定多久记录一次。
keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs',
embeddings_freq=0, # 记录嵌入的频率
histogram_freq=0, # 矩形图的频率
update_freq='epoch' # 多久记录一次,默认是每个epoch一次)
在训练和验证的过程中,如果你想使用更加底层的API,而不是使用模型提供的fit
和evaluate
方法,那么你可以自己进行编写。这也很容易,但是你需要有做很多的调试工作的准备。
GradientTape
的端到端的例子在GradientTape
域内部调用模型,可以根据损失函数来检索可训练参数(可以通过model.trainable_variables
获取)的梯度。配合上优化器,就可以更新这些可训练参数。
我们使用上面提到的mnist
的例子,然后使用梯度下降自己编写训练过程。
model = get_uncompiled_model() # 获取模型
optimizer = keras.optimizers.SGD(1e-3) # 使用SGD优化
loss_fn = keras.losses.SparseCategoricalCrossentropy() # loss
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
for epoch in range(3):
print('Start of Epoch %s' % epoch)
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train) # 获取前向传播的结果
loss_value = loss_fn(y_batch_train, logits) # 计算loss
grads = tape.gradient(loss_value, model.trainable_variables) # 根据loss计算可训练参数的梯度
optimizer.apply_gradients(zip(grads, model.trainable_variables))
if 0 == step % 200:
print('Train loss (one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far %s samples' % ((step + 1) * batch_size))
Start of Epoch 0
Train loss (one batch) at step 0: 2.3442137241363525
Seen so far 64 samples
Train loss (one batch) at step 200: 2.271237850189209
Seen so far 12864 samples
Train loss (one batch) at step 400: 2.2097158432006836
Seen so far 25664 samples
Train loss (one batch) at step 600: 2.095700979232788
Seen so far 38464 samples
Start of Epoch 1
Train loss (one batch) at step 0: 2.0597667694091797
Seen so far 64 samples
Train loss (one batch) at step 200: 1.9625885486602783
Seen so far 12864 samples
Train loss (one batch) at step 400: 1.8829933404922485
Seen so far 25664 samples
Train loss (one batch) at step 600: 1.707170844078064
Seen so far 38464 samples
Start of Epoch 2
Train loss (one batch) at step 0: 1.686545729637146
Seen so far 64 samples
Train loss (one batch) at step 200: 1.5554258823394775
Seen so far 12864 samples
Train loss (one batch) at step 400: 1.4583208560943604
Seen so far 25664 samples
Train loss (one batch) at step 600: 1.3026995658874512
Seen so far 38464 samples
metrics
的底层API现在我们来加入metrics
,可以直接在训练循环中复用内建的metrics
(当然也可以重头编写),需要遵循下面几点:
metrics
对象。metrics.update_state()
方法。metrics
数值时,调用metrics.result()
方法。metrics
数值时(一般在每个epoch
结束),调用metrics.reset_states()
方法。下面就根据这几点来实现在每个epoch
的结束,在验证的过程中计算SparseCategoricalAccuary
:
model = get_uncompiled_model() # 获取模型
optimizer = keras.optimizers.SGD(1e-3) # 使用SGD优化
loss_fn = keras.losses.SparseCategoricalCrossentropy() # loss
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
for epoch in range(3):
print('Start of Epoch %s' % epoch)
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train) # 获取前向传播的结果
loss_value = loss_fn(y_batch_train, logits) # 计算loss
grads = tape.gradient(loss_value, model.trainable_variables) # 根据loss计算可训练参数的梯度
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_acc_metric(y_batch_train, logits) # 更新在训练集中metric的状态
if 0 == step % 200:
print('Train loss (one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far %s samples' % ((step + 1) * batch_size))
train_acc = train_acc_metric.result() # 获取在训练集中准确度的结果
print('Training acc over epoch: %s' % float(train_acc))
for x_batch_val, y_batch_val in val_dataset:
val_logits = model(x_batch_val)
val_acc_metric(y_batch_val, val_logits) # 更新在验证集中metric的状态
val_acc = val_acc_metric.result() # 获取结果
print('Validation acc over epoch: %s' % float(val_acc))
Start of Epoch 0
Train loss (one batch) at step 0: 2.326174259185791
Seen so far 64 samples
Train loss (one batch) at step 200: 2.2674777507781982
Seen so far 12864 samples
Train loss (one batch) at step 400: 2.1468617916107178
Seen so far 25664 samples
Train loss (one batch) at step 600: 2.054070472717285
Seen so far 38464 samples
Training acc over epoch: 0.299560010433197
Validation acc over epoch: 0.46230000257492065
Start of Epoch 1
Train loss (one batch) at step 0: 1.9417946338653564
Seen so far 64 samples
Train loss (one batch) at step 200: 1.9358477592468262
Seen so far 12864 samples
Train loss (one batch) at step 400: 1.7922141551971436
Seen so far 25664 samples
Train loss (one batch) at step 600: 1.668813705444336
Seen so far 38464 samples
Training acc over epoch: 0.4215500056743622
Validation acc over epoch: 0.5505499839782715
Start of Epoch 2
Train loss (one batch) at step 0: 1.5339230298995972
Seen so far 64 samples
Train loss (one batch) at step 200: 1.5494320392608643
Seen so far 12864 samples
Train loss (one batch) at step 400: 1.4023218154907227
Seen so far 25664 samples
Train loss (one batch) at step 600: 1.2693426609039307
Seen so far 38464 samples
Training acc over epoch: 0.501479983329773
Validation acc over epoch: 0.6060666441917419
loss
前面介绍了可以通过在call
方法中调用self.add_loss(value)
添加正则化的loss
。在实际使用中,你需要将这些添加loss
的和在训练循环中使用(除非是你清楚这些loss
对模型没有贡献)。我们在这里依然使用上面添加正则化loss
的例子。
class ActivityRegularizationLayer(keras.layers.Layer):
def call(self, inputs):
self.add_loss(tf.reduce_sum(inputs) * 0.1)
return inputs
inputs = keras.Input(shape=(784, ), name='digits')
x = keras.layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = ActivityRegularizationLayer()(x)
x = keras.layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
当使用logits = model(x_train)
时,在前向的过程中就会创建并计算这些loss
,然后添加到model.losses
集合中。
logits = model(x_train[:64])
print(model.losses)
[]
实际上,在开始调用__call__
方法时会清除这些loss
的状态,因此只会在一次前向过程中产生损失。多次调用模型,然后查询loss
的值,只会显示最后一次创建的最新值。
logits = model(x_train[ : 64])
logits = model(x_train[64 : 128])
logits = model(x_train[128 : 192])
print(model.losses)
[]
在训练过程中计算所有loss
的和,只需要在训练时将sum(model.losses)
加到loss
和当中。
optimizer = keras.optimizers.SGD(1e-3) # 使用SGD优化
for epoch in range(3):
print('Start of Epoch %s' % epoch)
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train) # 获取前向传播的结果
loss_value = loss_fn(y_batch_train, logits) # 计算loss
loss_value += sum(model.losses)
grads = tape.gradient(loss_value, model.trainable_variables) # 根据loss计算可训练参数的梯度
optimizer.apply_gradients(zip(grads, model.trainable_variables))
if 0 == step % 200:
print('Train loss (one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far %s samples' % ((step + 1) * batch_size))
Start of Epoch 0
Train loss (one batch) at step 0: 69.08877563476562
Seen so far 64 samples
Train loss (one batch) at step 200: 2.3639211654663086
Seen so far 12864 samples
Train loss (one batch) at step 400: 2.3201820850372314
Seen so far 25664 samples
Train loss (one batch) at step 600: 2.350968360900879
Seen so far 38464 samples
Start of Epoch 1
Train loss (one batch) at step 0: 2.3102896213531494
Seen so far 64 samples
Train loss (one batch) at step 200: 2.302211284637451
Seen so far 12864 samples
Train loss (one batch) at step 400: 2.304335832595825
Seen so far 25664 samples
Train loss (one batch) at step 600: 2.3280816078186035
Seen so far 38464 samples
Start of Epoch 2
Train loss (one batch) at step 0: 2.3040671348571777
Seen so far 64 samples
Train loss (one batch) at step 200: 2.3006362915039062
Seen so far 12864 samples
Train loss (one batch) at step 400: 2.301215648651123
Seen so far 25664 samples
Train loss (one batch) at step 600: 2.3195436000823975
Seen so far 38464 samples