面对升级后的人工智能学习框架 TensorFlow,应如何将代码升级到新版本中?承接上篇文章,本文将继续手把手指引您的 TensorFlow 代码升级过程,让您从 TensorFlow 2 的简洁和便利中受益。
↓点击观看代码迁移视频教程↓
如何将代码迁移至 TensorFlow 2
检查点兼容性
TensorFlow 2.0 使用基于对象的检查点。
如果您足够小心,旧版的基于名称的检查点仍然可以加载。如果进行代码转换,可能需要对变量名进行变更,但是也有变通的办法。
最简单的方法是将新模型中的名字和检查点的名字保持一致:
您依旧可以为所有变量设置 name
参数。
Keras 模型也有一个 name
参数,模型将其设置成自有变量的前缀。
v1.name_scope
函数可以用于设置变量名前缀。这与 tf.variable_scope
有极大的不同。因其只影响名称,不追踪变量和重用。
如果无法适用于您的用例,您可以尝试 v1.train.init_from_checkpoint
函数。该函数带有一个 assignment_map
参数,可以指定从旧名称到新名称的映射。
请注意:
基于对象的检查点可以延迟加载,但基于名称的检查点不能,要求在构建函数时调用所有变量。一些模型只有当您调用build
或在一个批次的数据上运行模型时才会构建变量。
TensorFlow Estimator 代码库包含一个转换工具,可以将预制 Estimator 的检查点从 TensorFlow 1.X 升级到 2.0。因此,可以作为如何为类似用例构建工具的示例。
已保存模型的兼容性
已保存模型没有太大的兼容性问题。
TensorFlow 1.x saved_models 可以在 TensorFlow 2.0 中使用。
如果 TensorFlow 2.0 能够支持 TensorFlow 1.x 的所有算子,TensorFlow 2.0 saved_models 甚至可以加载 TensorFlow 1.x 中的工作。
Graph.pb 或 Graph.pbtxt
没有直接的方法将原始的Graph.pb
文件升级到 TensorFlow 2.0。您最好的选择是升级生成文件的代码。
但如果您有一个 Frozen graph(tf.Graph
的一种,其中的变量变为常量),那么可以使用v1.wrap_function
将其转换为concrete_function
:
def wrap_frozen_graph(graph_def, inputs, outputs):
def _imports_graph_def():
tf.compat.v1.import_graph_def(graph_def, name="")
wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
import_graph = wrapped_import.graph
return wrapped_import.prune(
tf.nest.map_structure(import_graph.as_graph_element, inputs),
tf.nest.map_structure(import_graph.as_graph_element, outputs))
此例中有个 Frozen graph:
path = tf.keras.utils.get_file(
'inception_v1_2016_08_28_frozen.pb',
'http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz',
untar=True)
Downloading data from http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz
24698880/24695710 [==============================] - 2s 0us/step
加载 tf.GraphDef
:
graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(path,'rb').read())
打包至 concrete_function
:
inception_func = wrap_frozen_graph(
graph_def, inputs='input:0',
outputs='InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu:0')
向其传递张量,作为输入:
input_img = tf.ones([1,224,224,3], dtype=tf.float32)
inception_func(input_img).shape
TensorShape([1, 28, 28, 96])
使用 Estimator 进行训练
TensorFlow 2.0 支持 Estimator。
当您使用 Estimator 时,可以使用 TensorFlow 1.x 中的input_fn()
、tf.estimator.TrainSpec
和tf.estimator.EvalSpec
。
下面是结合训练和评估规格使用input_fn
的一个例子。
创建 input_fn 以及训练 / 评估规格:
# Define the estimator's input_fn
def input_fn():
datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
BUFFER_SIZE = 10000
BATCH_SIZE = 64
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label[..., tf.newaxis]
train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
return train_data.repeat()
# Define train & eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
steps=STEPS_PER_EPOCH)
使用 Keras 模型定义
TensorFlow 2.0 中构建 Estimator 的方法会有一些不同。
我们建议您使用 Keras 定义自己的模型,然后使用 tf.keras.estimator.model_to_estimator
实用程序将模型变为 Estimator。下面的代码展示创建和训练 Estimator 时如何使用这个实用工具。
def make_model():
return tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu',
kernel_regularizer=tf.keras.regularizers.l2(0.02),
input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(10)
])
model = make_model()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
estimator = tf.keras.estimator.model_to_estimator(
keras_model = model
)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp4q8g11bh
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp4q8g11bh
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using the Keras model provided.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp4q8g11bh', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp4q8g11bh', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp4q8g11bh/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp4q8g11bh/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tmp4q8g11bh/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting from: /tmp/tmp4q8g11bh/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Warm-started 8 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp4q8g11bh/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp4q8g11bh/model.ckpt.
INFO:tensorflow:loss = 2.7495928, step = 0
INFO:tensorflow:loss = 2.7495928, step = 0
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp4q8g11bh/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp4q8g11bh/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:31Z
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:31Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp4q8g11bh/model.ckpt-25
INFO:tensorflow:Restoring parameters from /tmp/tmp4q8g11bh/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.15786s
INFO:tensorflow:Inference Time : 1.15786s
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:32
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:32
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.615625, global_step = 25, loss = 1.4754493
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.615625, global_step = 25, loss = 1.4754493
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp4q8g11bh/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp4q8g11bh/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.3878889.
INFO:tensorflow:Loss for final step: 0.3878889.
({'accuracy': 0.615625, 'loss': 1.4754493, 'global_step': 25}, [])
使用自定义的 model_fn
如果您已经有一个需要维护的自定义 Estimator model_fn
,可以转换您的 model_fn
以使用 Keras 模型。
然而,由于兼容性问题,自定义 model_fn
仍旧会在 1.x 版本图模式中运行。这意味着既没有即刻运行,也没有自动控制依赖项。
只需要很小的变更就能自定义 model_fn
如果您倾向于对现有代码进行最少程度的改动,可以使用 optimizers
和 metrics
之类的 tf.compat.v1
符号,使您的自定义 model_fn
能够在 TF 2.0 中正常工作。
在自定义 model_fn
中使用 Keras 模型的方式与在自定义训练循环中的使用方式类似:
根据 mode
参数适当地设置 training
阶段。
以显式方式将模型的 trainable_variables
传递给优化器。
但相对于自定义循环 ,有一些重要的区别:
提取损失时,不要使用 Model.losses
,改为使用Model.get_losses_for
。
使用 Model.get_updates_for
提取模型的更新。
请注意:
“更新”是每一个批次后,需要应用于模型的变更。例如,layers.BatchNormalization 层中均值和方差的移动平均值。
下方的代码将创建一个来自 model_fn
的 Estimator,可说明所有这些问题。
def my_model_fn(features, labels, mode):
model = make_model()
optimizer = tf.compat.v1.train.AdamOptimizer()
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
training = (mode == tf.estimator.ModeKeys.TRAIN)
predictions = model(features, training=training)
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
total_loss=loss_fn(labels, predictions) + tf.math.add_n(reg_losses)
accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
predictions=tf.math.argmax(predictions, axis=1),
name='acc_op')
update_ops = model.get_updates_for(None) + model.get_updates_for(features)
minimize_op = optimizer.minimize(
total_loss,
var_list=model.trainable_variables,
global_step=tf.compat.v1.train.get_or_create_global_step())
train_op = tf.group(minimize_op, update_ops)
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op, eval_metric_ops={'accuracy': accuracy})
# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp20ozvzqk
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp20ozvzqk
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp20ozvzqk', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp20ozvzqk', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp20ozvzqk/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp20ozvzqk/model.ckpt.
INFO:tensorflow:loss = 3.206945, step = 0
INFO:tensorflow:loss = 3.206945, step = 0
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp20ozvzqk/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp20ozvzqk/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:35Z
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:35Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp20ozvzqk/model.ckpt-25
INFO:tensorflow:Restoring parameters from /tmp/tmp20ozvzqk/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.22467s
INFO:tensorflow:Inference Time : 1.22467s
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:37
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:37
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.509375, global_step = 25, loss = 1.4795268
INFO:tensorflow:Saving dict for global step 25: accuracy = 0.509375, global_step = 25, loss = 1.4795268
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp20ozvzqk/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp20ozvzqk/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.6385002.
INFO:tensorflow:Loss for final step: 0.6385002.
({'accuracy': 0.509375, 'loss': 1.4795268, 'global_step': 25}, [])
使用 TF 2.0 符号的自定义 model_fn
如果您想摆脱所有 TF 1.x 符号,然后将自定义的 model_fn 升级到原生 TF 2.0,您需要将 optimizers 和 metric 更新为tf.keras.optimizers
和tf.keras.metrics
。
在自定义的model_fn
中,除了以上 变更,还需要进行更多升级:
请使用tf.keras.optimizers
,而不要使用v1.train.Optimizer
。
以显式方式将模型的trainable_variables
传递到tf.keras.optimizers
。
如要计算train_op/minimize_op
,
如果损失是标量损失 Tensor(非可调用型),请使用Optimizer.get_updates()
。返回列表中的第一个元素是所需的train_op/minimize_op
。
如果损失是可调用型(比如一个函数),请使用Optimizer.minimize()
来获取train_op/minimize_op
。
评估时,请使用tf.keras.metrics
,而不要使用tf.compat.v1.metrics
。
对于上述的my_model_fn
示例,使用 2.0 符号的迁移后代码显示为:
def my_model_fn(features, labels, mode):
model = make_model()
training = (mode == tf.estimator.ModeKeys.TRAIN)
loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
predictions = model(features, training=training)
# Get both the unconditional losses (the None part)
# and the input-conditional losses (the features part).
reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
total_loss=loss_obj(labels, predictions) + tf.math.add_n(reg_losses)
# Upgrade to tf.keras.metrics.
accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
accuracy = accuracy_obj.update_state(
y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))
train_op = None
if training:
# Upgrade to tf.keras.optimizers.
optimizer = tf.keras.optimizers.Adam()
# Manually assign tf.compat.v1.global_step variable to optimizer.iterations
# to make tf.compat.v1.train.global_step increased correctly.
# This assignment is a must for any `tf.train.SessionRunHook` specified in
# estimator, as SessionRunHooks rely on global step.
optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
# Get both the unconditional updates (the None part)
# and the input-conditional updates (the features part).
update_ops = model.get_updates_for(None) + model.get_updates_for(features)
# Compute the minimize_op.
minimize_op = optimizer.get_updates(
total_loss,
model.trainable_variables)[0]
train_op = tf.group(minimize_op, *update_ops)
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op,
eval_metric_ops={'Accuracy': accuracy_obj})
# Create the Estimator & Train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpeltbj_0v
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpeltbj_0v
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpeltbj_0v', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpeltbj_0v', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpeltbj_0v/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpeltbj_0v/model.ckpt.
INFO:tensorflow:loss = 2.9231493, step = 0
INFO:tensorflow:loss = 2.9231493, step = 0
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpeltbj_0v/model.ckpt.
INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpeltbj_0v/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:40Z
INFO:tensorflow:Starting evaluation at 2019-12-21T03:00:40Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpeltbj_0v/model.ckpt-25
INFO:tensorflow:Restoring parameters from /tmp/tmpeltbj_0v/model.ckpt-25
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [1/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [2/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [3/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [4/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Evaluation [5/5]
INFO:tensorflow:Inference Time : 1.13835s
INFO:tensorflow:Inference Time : 1.13835s
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:41
INFO:tensorflow:Finished evaluation at 2019-12-21-03:00:41
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.728125, global_step = 25, loss = 1.6920828
INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.728125, global_step = 25, loss = 1.6920828
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpeltbj_0v/model.ckpt-25
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpeltbj_0v/model.ckpt-25
INFO:tensorflow:Loss for final step: 0.43470243.
INFO:tensorflow:Loss for final step: 0.43470243.
({'Accuracy': 0.728125, 'loss': 1.6920828, 'global_step': 25}, [])
预制 Estimator
TensorFlow 2.0 API 依然支持tf.estimator.DNN*
、tf.estimator.Linear*
和tf.estimator.DNNLinearCombined*
系列中的预制 Estimator,但某些参数已变更:
input_layer_partitioner
:已在 2.0 中删除。
loss_reduction
:已升级为tf.keras.losses.Reduction
,而不是tf.compat.v1.losses.Reduction
。其默认值也从tf.compat.v1.losses.Reduction.SUM
变更为tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE
。
optimizer
、dnn_optimizer
和linear_optimizer
:此类参数已更新为tf.keras.optimizers
,而不是tf.compat.v1.train.Optimizer
。
如要迁移上述变更:
input_layer_partitioner
无需迁移,因为在 TF 2.0 中分布策略会自动对其处理。
对于loss_reduction
,请查看tf.keras.losses.Reduction
获取支持的选项。
对于optimizer
参数,如果您没有传入optimizer
、dnn_optimizer
或linear_optimizer
参数,又或您已经在代码中将optimizer
的参数指定为string
,那么您无需做任何变更。系统会默认使用tf.keras.optimizers
。否则,您需要将其从tf.compat.v1.train.Optimizer
更新为相应的tf.keras.optimizers
。
检查点转换器
因为tf.keras.optimizers
会生成不同的参数组并保存到检查点,所以迁移至keras.optimizers
会破坏使用 TF 1.x 保存的检查点。为了让旧的检查点在您迁移至 TF 2.0 后仍然可用,请尝试 检查点转换器工具。
! curl -O https://raw.githubusercontent.com/tensorflow/estimator/master/tensorflow_estimator/python/estimator/tools/checkpoint_converter.py
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15371 100 15371 0 0 37674 0 --:--:-- --:--:-- --:--:-- 37581
此工具有内置的帮助:
! python checkpoint_converter.py -h
2019-12-21 03:00:42.978036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2019-12-21 03:00:42.978321: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2019-12-21 03:00:42.978341: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
usage: checkpoint_converter.py [-h]
{dnn,linear,combined} source_checkpoint
source_graph target_checkpoint
positional arguments:
{dnn,linear,combined}
The type of estimator to be converted. So far, the
checkpoint converter only supports Canned Estimator.
So the allowed types include linear, dnn and combined.
source_checkpoint Path to source checkpoint file to be read in.
source_graph Path to source graph file to be read in.
target_checkpoint Path to checkpoint file to be written out.
optional arguments:
-h, --help show this help message and exit
TensorShape
这个类已简化为保留 int
, 而不是 tf.compat.v1.Dimension
对象。因此不需要调用 .value()
来获得 int
。
仍然可以从 tf.TensorShape.dims
中访问单独的 tf.compat.v1.Dimension
对象。
以下所示为 TensorFlow 1.x 和 TensorFlow 2.0 之间的区别。
# Create a shape and choose an index
i = 0
shape = tf.TensorShape([16, None, 256])
shape
TensorShape([16, None, 256])
如果 TF 1.x 中您的代码是这样:
value = shape[i].value
请在 TF 2.0 中改为:
value = shape[i]
value
16
如果 TF 1.x 中您的代码是这样:
for dim in shape:
value = dim.value
print(value)
请在 TF 2.0 中改为:
for value in shape:
print(value)
16
None
256
如果 TF 1.x 中您的代码是这样(或使用任何其他维度方法):
dim = shape[i]
dim.assert_is_compatible_with(other_dim)
请在 TF 2.0 中改为:
other_dim = 16
Dimension = tf.compat.v1.Dimension
if shape.rank is None:
dim = Dimension(None)
else:
dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
True
shape = tf.TensorShape(None)
if shape:
dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
如果等级已知,则 tf.TensorShape
的布尔值是 True
,否则是 False
。
print(bool(tf.TensorShape([]))) # Scalar
print(bool(tf.TensorShape([0]))) # 0-length vector
print(bool(tf.TensorShape([1]))) # 1-length vector
print(bool(tf.TensorShape([None]))) # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100]))) # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None))) # A tensor with unknown rank.
True
True
True
True
True
True
False
移除 tf.colocate_with
:TensorFlow 的设备布局算法已极大改善。因此不再需要。如果移除会导致性能下降,请提交错误报告。
使用来自 tf.config
的对应函数替换 v1.ConfigProto
的用法。
整个过程是:
运行升级脚本。
移除 contrib 符号。
将您的模型切换为面向对象的风格 (Keras)。
尽可能使用tf.keras
或tf.estimator
训练和评估循环。
否则,请使用自定义循环,但注意避免会话和集合。
将代码转换为具有 TensorFlow 2.0 特点需要花费一点点功夫,但每一处变更都能带来:
代码更短小精炼。
逻辑更清晰简明。
调试更加方便。
《代码迁移》教程上篇,点击链接。
想入门机器学习,不要错过 TensorFlow 官方在中国大学慕课平台推出的《 TensorFlow 入门实操课程》!点击此处学习吧!有任何学习疑问,欢迎前往【社区】版块与大家积极讨论哦。
还可扫码关注 TensorFlow 官方微信公众号( TensorFlow_official ),回复“MOOC”即可获取更多学习资料!