导入相关模块,查看版本
import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)
2.9.1
获取数据集,使用mnist数据集前1000个样本
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
定义一个简单模型
# 创建模型
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(512, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.metrics.SparseCategoricalAccuracy()])
return model
model = create_model()
model.summary()
Model: “sequential”
dense (Dense) (None, 512) 401920
dropout (Dropout) (None, 512) 0
dense_1 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
#在训练期间保存模型(以 checkpoints 形式保存)
#创建一个只在训练期间保存权重的 tf.keras.callbacks.ModelCheckpoint 回调:
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
model.fit(train_images,
train_labels,
epochs=10,
validation_data=(test_images, test_labels),
callbacks=[cp_callback]) # Pass callback to training
os.listdir(checkpoint_dir)
[‘checkpoint’, ‘cp.ckpt.data-00000-of-00001’, ‘cp.ckpt.index’]
生成了三个文件
只要两个模型共享相同的架构,您就可以在它们之间共享权重。因此,当从仅权重恢复模型时,创建一个与原始模型具有相同架构的模型,然后设置其权重。
现在,重新构建一个未经训练的全新模型并基于测试集对其进行评估。未经训练的模型将以机会水平执行(约 10% 的准确率):
# 创建一个新的模型
model = create_model()
# 测试模型
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 2.3593 - sparse_categorical_accuracy: 0.0950 - 140ms/epoch - 4ms/step
Untrained model, accuracy: 9.50%
加载训练好的权重
加载
model.load_weights(checkpoint_path)
# 重新测试模型
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 0.4071 - sparse_categorical_accuracy: 0.8740 - 57ms/epoch - 2ms/step
Restored model, accuracy: 87.40%
checkpoint 回调选项 回调提供了几个选项,为 checkpoint 提供唯一名称并调整 checkpoint 频率。
训练一个新模型,每五个 epochs 保存一次唯一命名的 checkpoint :
# 在文件命名格式中加入epoch
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
batch_size = 32
# 创建一个回调函数,5个epoch保存一次模型
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# 重新创建模型
model = create_model()
# 保存未训练的模型 epoch=0
model.save_weights(checkpoint_path.format(epoch=0))
# 训练模型
model.fit(train_images,
train_labels,
epochs=50,
batch_size=batch_size,
callbacks=[cp_callback],
validation_data=(test_images, test_labels),
verbose=0)
Epoch 5: saving model to training_2\cp-0005.ckpt
Epoch 10: saving model to training_2\cp-0010.ckpt
Epoch 15: saving model to training_2\cp-0015.ckpt
Epoch 20: saving model to training_2\cp-0020.ckpt
Epoch 25: saving model to training_2\cp-0025.ckpt
Epoch 30: saving model to training_2\cp-0030.ckpt
Epoch 35: saving model to training_2\cp-0035.ckpt
Epoch 40: saving model to training_2\cp-0040.ckpt
Epoch 45: saving model to training_2\cp-0045.ckpt
Epoch 50: saving model to training_2\cp-0050.ckpt
os.listdir(checkpoint_dir)
[‘checkpoint’,
‘cp-0000.ckpt.data-00000-of-00001’,
‘cp-0000.ckpt.index’,
‘cp-0005.ckpt.data-00000-of-00001’,
‘cp-0005.ckpt.index’,
‘cp-0010.ckpt.data-00000-of-00001’,
‘cp-0010.ckpt.index’,
‘cp-0015.ckpt.data-00000-of-00001’,
‘cp-0015.ckpt.index’,
‘cp-0020.ckpt.data-00000-of-00001’,
‘cp-0020.ckpt.index’,
‘cp-0025.ckpt.data-00000-of-00001’,
‘cp-0025.ckpt.index’,
‘cp-0030.ckpt.data-00000-of-00001’,
‘cp-0030.ckpt.index’,
‘cp-0035.ckpt.data-00000-of-00001’,
‘cp-0035.ckpt.index’,
‘cp-0040.ckpt.data-00000-of-00001’,
‘cp-0040.ckpt.index’,
‘cp-0045.ckpt.data-00000-of-00001’,
‘cp-0045.ckpt.index’,
‘cp-0050.ckpt.data-00000-of-00001’,
‘cp-0050.ckpt.index’]
每5个epoch保存的模型文件
#加载最新保存的模型
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
‘training_2\cp-0050.ckpt’
# 创建新模型
model = create_model()
# 加载权重
model.load_weights(latest)
# 重新测试模型
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 0.4857 - sparse_categorical_accuracy: 0.8780 - 206ms/epoch - 6ms/step
Restored model, accuracy: 87.80%
# 保存
model.save_weights('./checkpoints/my_checkpoint')
# 创建新模型
model = create_model()
# 加载手动保存的权重
model.load_weights('./checkpoints/my_checkpoint')
# 测试
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 0.4857 - sparse_categorical_accuracy: 0.8780 - 212ms/epoch - 7ms/step
Restored model, accuracy: 87.80%
保存整个模型 调用 model.save 将保存模型的结构,权重和训练配置保存在单个文件/文件夹中。这可以导出模型,以便在不访问原始 Python 代码的情况下使用它。
整个模型可以保存为两种不同的文件格式(SavedModel 和 HDF5)。TensorFlow SavedModel 格式是 TF2.x 中的默认文件格式。但是,模型能够以 HDF5 格式保存。下面详细介绍了如何以两种文件格式保存整个模型。
保存完整模型会非常有用——您可以在 TensorFlow.js(Saved Model, HDF5)加载它们,然后在 web 浏览器中训练和运行它们,或者使用 TensorFlow Lite 将它们转换为在移动设备上运行(Saved Model, HDF5)
自定义对象(例如,子类化模型或层)在保存和加载时需要特别注意。请参阅下面的保存自定义对象*部分
SavedModel 格式 SavedModel 格式是另一种序列化模型的方式。以这种格式保存的模型可以使用 tf.keras.models.load_model 恢复,并且与 TensorFlow Serving 兼容。SavedModel 指南详细介绍了如何应用/检查 SavedModel。以下部分说明了保存和恢复模型的步骤。
# 创建新模型并训练
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# 创建文件夹,保存模型
!mkdir -p saved_model
model.save('saved_model/my_model')
从保存的模型重新加载一个新的 Keras 模型:不需要模型结构代码,直接导入
new_model = tf.keras.models.load_model('saved_model/my_model')
new_model.summary()
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))
print(new_model.predict(test_images).shape)
32/32 - 0s - loss: 0.4483 - sparse_categorical_accuracy: 0.8550 - 127ms/epoch - 4ms/step
Restored model, accuracy: 85.50%
32/32 [==============================] - 0s 2ms/step
(1000, 10)
# 创建新模型并训练
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# 保存为hdf5格式
model.save('my_model.h5')
# 加载模型
new_model = tf.keras.models.load_model('my_model.h5')
# 展示模型结构
new_model.summary()
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))
32/32 - 0s - loss: 0.4187 - sparse_categorical_accuracy: 0.8580 - 114ms/epoch - 4ms/step
Restored model, accuracy: 85.80%