1.保存模型参数
2.保存整个模型
回调函数保存
手动保存
训练期间保存模型(以 checkpoints 形式保存),Checkpoint是一个二进制文件,它保存了权重、偏置项、梯度以及其他所有的变量的取值,扩展名为.ckpt
keras.callbacks.ModelCheckpoint(filepath, monitor=‘val_loss’, verbose=0, save_best_only=False, save_weights_only=False, mode=‘auto’, period=1)
import tensorflow as tf
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
# 定义一个简单的序列模型
def create_model():
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model1 = create_model()
checkpoint_path1 = "training_1/cp.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path1, save_weights_only=True,verbose=0)
model1.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5260 - accuracy: 0.5590 - val_loss: 1.0336 - val_accuracy: 0.7450
Epoch 2/10
1000/1000 - 0s - loss: 0.6987 - accuracy: 0.8240 - val_loss: 0.6933 - val_accuracy: 0.8130
Epoch 3/10
1000/1000 - 0s - loss: 0.4780 - accuracy: 0.8760 - val_loss: 0.5774 - val_accuracy: 0.8360
Epoch 4/10
1000/1000 - 0s - loss: 0.3682 - accuracy: 0.9010 - val_loss: 0.5263 - val_accuracy: 0.8450
Epoch 5/10
1000/1000 - 0s - loss: 0.3029 - accuracy: 0.9250 - val_loss: 0.4847 - val_accuracy: 0.8420
Epoch 6/10
1000/1000 - 0s - loss: 0.2572 - accuracy: 0.9340 - val_loss: 0.4661 - val_accuracy: 0.8560
Epoch 7/10
1000/1000 - 0s - loss: 0.2252 - accuracy: 0.9490 - val_loss: 0.4509 - val_accuracy: 0.8540
Epoch 8/10
1000/1000 - 0s - loss: 0.1855 - accuracy: 0.9600 - val_loss: 0.4275 - val_accuracy: 0.8570
Epoch 9/10
1000/1000 - 0s - loss: 0.1605 - accuracy: 0.9670 - val_loss: 0.4292 - val_accuracy: 0.8590
Epoch 10/10
1000/1000 - 0s - loss: 0.1421 - accuracy: 0.9710 - val_loss: 0.4227 - val_accuracy: 0.8650
checkpoint 回调选项:
回调提供了几个选项,为 checkpoint 提供唯一名称并调整 checkpoint 频率。
训练一个新模型,每两个 epochs 保存一次唯一命名的 checkpoint :
model2 = create_model()
checkpoint_path2 = "training_2/cp-{epoch:04d}.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path2,period=2,monitor="val_accuracy",
save_best_only=True,mode="max",save_weights_only=True,verbose=0)
model2.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of samples seen.
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.6386 - accuracy: 0.5140 - val_loss: 1.1132 - val_accuracy: 0.7000
Epoch 2/10
1000/1000 - 0s - loss: 0.7385 - accuracy: 0.8020 - val_loss: 0.7291 - val_accuracy: 0.7950
Epoch 3/10
1000/1000 - 0s - loss: 0.4973 - accuracy: 0.8720 - val_loss: 0.5980 - val_accuracy: 0.8280
Epoch 4/10
1000/1000 - 0s - loss: 0.3961 - accuracy: 0.8930 - val_loss: 0.5367 - val_accuracy: 0.8410
Epoch 5/10
1000/1000 - 0s - loss: 0.3182 - accuracy: 0.9170 - val_loss: 0.4925 - val_accuracy: 0.8560
Epoch 6/10
1000/1000 - 0s - loss: 0.2772 - accuracy: 0.9250 - val_loss: 0.5132 - val_accuracy: 0.8410
Epoch 7/10
1000/1000 - 0s - loss: 0.2298 - accuracy: 0.9470 - val_loss: 0.4731 - val_accuracy: 0.8530
Epoch 8/10
1000/1000 - 0s - loss: 0.2083 - accuracy: 0.9480 - val_loss: 0.4472 - val_accuracy: 0.8590
Epoch 9/10
1000/1000 - 0s - loss: 0.1766 - accuracy: 0.9670 - val_loss: 0.4370 - val_accuracy: 0.8610
Epoch 10/10
1000/1000 - 0s - loss: 0.1465 - accuracy: 0.9660 - val_loss: 0.4363 - val_accuracy: 0.8660
model3 = create_model()
# checkpoint_path3 = "training_3"
checkpoint_path1 = "training_3.h5"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path3, save_weights_only=False,verbose=0)
model3.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 1.5397 - accuracy: 0.5510 - val_loss: 1.0164 - val_accuracy: 0.7510
Epoch 2/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.6896 - accuracy: 0.8190 - val_loss: 0.6884 - val_accuracy: 0.8020
Epoch 3/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.4903 - accuracy: 0.8640 - val_loss: 0.5970 - val_accuracy: 0.8200
Epoch 4/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.3805 - accuracy: 0.9050 - val_loss: 0.5356 - val_accuracy: 0.8390
Epoch 5/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.3025 - accuracy: 0.9240 - val_loss: 0.4885 - val_accuracy: 0.8510
Epoch 6/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.2783 - accuracy: 0.9250 - val_loss: 0.4767 - val_accuracy: 0.8500
Epoch 7/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.2241 - accuracy: 0.9400 - val_loss: 0.4600 - val_accuracy: 0.8490
Epoch 8/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.2078 - accuracy: 0.9500 - val_loss: 0.4576 - val_accuracy: 0.8490
Epoch 9/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.1631 - accuracy: 0.9670 - val_loss: 0.4456 - val_accuracy: 0.8630
Epoch 10/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.1491 - accuracy: 0.9700 - val_loss: 0.4264 - val_accuracy: 0.8570
model4 = create_model()
model4.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5830 - accuracy: 0.5460 - val_loss: 1.0903 - val_accuracy: 0.6990
Epoch 2/10
1000/1000 - 0s - loss: 0.7136 - accuracy: 0.8010 - val_loss: 0.7605 - val_accuracy: 0.7720
Epoch 3/10
1000/1000 - 0s - loss: 0.4949 - accuracy: 0.8690 - val_loss: 0.5975 - val_accuracy: 0.8190
Epoch 4/10
1000/1000 - 0s - loss: 0.3896 - accuracy: 0.8920 - val_loss: 0.5583 - val_accuracy: 0.8350
Epoch 5/10
1000/1000 - 0s - loss: 0.3083 - accuracy: 0.9200 - val_loss: 0.5231 - val_accuracy: 0.8460
Epoch 6/10
1000/1000 - 0s - loss: 0.2686 - accuracy: 0.9330 - val_loss: 0.4792 - val_accuracy: 0.8460
Epoch 7/10
1000/1000 - 0s - loss: 0.2288 - accuracy: 0.9370 - val_loss: 0.4640 - val_accuracy: 0.8560
Epoch 8/10
1000/1000 - 0s - loss: 0.1974 - accuracy: 0.9530 - val_loss: 0.4606 - val_accuracy: 0.8570
Epoch 9/10
1000/1000 - 0s - loss: 0.1637 - accuracy: 0.9680 - val_loss: 0.4609 - val_accuracy: 0.8480
Epoch 10/10
1000/1000 - 0s - loss: 0.1481 - accuracy: 0.9710 - val_loss: 0.4347 - val_accuracy: 0.8660
model4.save_weights('./checkpoints/my_checkpoint')
model5 = create_model()
model5.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5685 - accuracy: 0.5620 - val_loss: 1.0435 - val_accuracy: 0.7190
Epoch 2/10
1000/1000 - 0s - loss: 0.6908 - accuracy: 0.8280 - val_loss: 0.7122 - val_accuracy: 0.7970
Epoch 3/10
1000/1000 - 0s - loss: 0.4719 - accuracy: 0.8750 - val_loss: 0.6066 - val_accuracy: 0.8010
Epoch 4/10
1000/1000 - 0s - loss: 0.3819 - accuracy: 0.8930 - val_loss: 0.5493 - val_accuracy: 0.8250
Epoch 5/10
1000/1000 - 0s - loss: 0.3040 - accuracy: 0.9230 - val_loss: 0.5098 - val_accuracy: 0.8350
Epoch 6/10
1000/1000 - 0s - loss: 0.2652 - accuracy: 0.9350 - val_loss: 0.4669 - val_accuracy: 0.8480
Epoch 7/10
1000/1000 - 0s - loss: 0.2081 - accuracy: 0.9490 - val_loss: 0.4626 - val_accuracy: 0.8440
Epoch 8/10
1000/1000 - 0s - loss: 0.1879 - accuracy: 0.9610 - val_loss: 0.4384 - val_accuracy: 0.8530
Epoch 9/10
1000/1000 - 0s - loss: 0.1588 - accuracy: 0.9650 - val_loss: 0.4378 - val_accuracy: 0.8510
Epoch 10/10
1000/1000 - 0s - loss: 0.1538 - accuracy: 0.9660 - val_loss: 0.4495 - val_accuracy: 0.8540
model5.save('model5')
INFO:tensorflow:Assets written to: model5\assets
在部署模型时,我们的第一步往往是将训练好的整个模型完整导出为一系列标准格式的文件,然后即可在不同的平台上部署模型文件。无需建立模型的源代码即可再次运行模型,适用于模型的分享和部署。TensorFlow Serving(服务器端部署模型)、TensorFlow Lite(移动端部署模型)以及 TensorFlow.js 都会用到这一格式。
将模型保存为HDF5文件
# 创建一个新的模型实例
model6 = create_model()
# 训练模型
model6.fit(train_images, train_labels, epochs=5)
# 将整个模型保存为HDF5文件
model6.save('my_model.h5')
Train on 1000 samples
Epoch 1/5
1000/1000 [==============================] - 0s 254us/sample - loss: 1.6071 - accuracy: 0.5330
Epoch 2/5
1000/1000 [==============================] - 0s 56us/sample - loss: 0.7051 - accuracy: 0.8150
Epoch 3/5
1000/1000 [==============================] - 0s 59us/sample - loss: 0.4864 - accuracy: 0.8680
Epoch 4/5
1000/1000 [==============================] - 0s 56us/sample - loss: 0.3656 - accuracy: 0.9130
Epoch 5/5
1000/1000 [==============================] - 0s 59us/sample - loss: 0.3108 - accuracy: 0.9150
创建一个新的未经训练的模型。仅恢复模型的权重时,必须具有与原始模型具有相同网络结构的模型。由于模型具有相同的结构,您可以共享权重,尽管它是模型的不同实例。 现在重建一个新的未经训练的模型,并在测试集上进行评估。未经训练的模型将在机会水平(chance levels)上执行(准确度约为10%):
# 创建一个基本模型实例
model7 = create_model()
# 评估模型
loss, acc = model7.evaluate(test_images, test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 2.3194 - accuracy: 0.0990
Untrained model, accuracy: 9.90%
然后从 checkpoint 加载权重并重新评估:
# 加载权重
model7.load_weights(checkpoint_path1)
# 重新评估模型
loss,acc = model7.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.5116 - accuracy: 0.8650
Restored model, accuracy: 86.50%
import os
checkpoint_dir = os.path.dirname(checkpoint_path2)
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
'training_2\\cp-0010.ckpt'
model8= create_model()
# 加载最后一次保存模型的权重
model8.load_weights(latest)
# 重新评估模型
loss,acc = model3.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.4467 - accuracy: 0.8570
Restored model, accuracy: 85.70%
指定文件名恢复
# model.ckpt-8
model9= create_model()
model9.load_weights('training_2\cp-0002.ckpt')
# 重新评估模型
loss,acc = model9.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.8354 - accuracy: 0.7950
Restored model, accuracy: 79.50%
从HDF5文件中恢复模型
# 重新创建完全相同的模型,包括其权重和优化程序
new_model = tf.keras.models.load_model('my_model.h5')
# 显示网络结构
new_model.summary()
Model: "sequential_14"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_28 (Dense) (None, 128) 100480
_________________________________________________________________
dropout_14 (Dropout) (None, 128) 0
_________________________________________________________________
dense_29 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.5647 - accuracy: 0.8430
Restored model, accuracy: 84.30%
恢复回调函数保存的整个模型
new_model = tf.keras.models.load_model('training_3')
# 显示网络结构
new_model.summary()
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_18 (Dense) (None, 128) 100480
_________________________________________________________________
dropout_9 (Dropout) (None, 128) 0
_________________________________________________________________
dense_19 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________