这篇文章将说明怎么同时导入多个预训练模型进行训练。
前面的文章 TensorFlow 使用预训练模型 ResNet-50 介绍了怎么导入一个单模型预训练参数对模型进行 finetune,但对一些复杂的任务,可能需要对多个模型进行组合,比如如下的模型并行:
或者模型级联:
这个时候就需要一次导入多个预训练模型参数,然后进行训练。
现在来看多模型并行的情况(多模型级联一样),以双模型并行为例。仍然沿用文章 TensorFlow 使用预训练模型 ResNet-50 的代码,首先定义模型结构,只需要修改 model.py 中的 predict
函数(以 ResNet-50 和 VGG-16 双模型为例):
def predict(self, preprocessed_inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
# ResNet-50
with slim.arg_scope(nets.resnet_v1.resnet_arg_scope()):
net_resnet, _ = nets.resnet_v1.resnet_v1_50(
preprocessed_inputs, num_classes=self.num_classes,
is_training=self._is_training)
net_resnet = tf.squeeze(net_resnet, axis=[1, 2])
# VGG-16
with slim.arg_scope(nets.vgg.vgg_arg_scope()):
net_vgg, _ = nets.vgg.vgg_16(
preprocessed_inputs, num_classes=self.num_classes,
is_training=self._is_training)
logits = tf.add(net_resnet, net_vgg)
prediction_dict = {'logits': logits}
return prediction_dict
然后在项目中添加如下文件(命名为:model_utils.py):
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 29 11:36:07 2018
@author: shirhe-lyh
Modified from:
1.https://github.com/tensorflow/models/blob/master/research/maskgan/
model_utils/model_utils.py
2.https://github.com/tensorflow/models/blob/master/research/maskgan/
train_mask_gan.py
"""
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
def retrieve_init_savers(var_scopes_dict=None,
checkpoint_exclude_scopes_dict=None):
"""Retrieve a dictionary of all the initial savers for the models.
Args:
var_scopes_dict: A dictionary of variable scopes for the models.
checkpoint_exclude_scopes_dict: A dictionary of comma-separated list of
scopes of variables to exclude when restoring from a checkpoint.
Returns:
A dictionary of init savers.
"""
if var_scopes_dict is None:
return None
# Dictionary of init savers
init_savers = {}
for key, scope in var_scopes_dict.items():
trainable_vars = [
v for v in tf.trainable_variables() if v.op.name.startswith(scope)]
exclusions = []
checkpoint_exclude_scopes = checkpoint_exclude_scopes_dict.get(
key, None)
if checkpoint_exclude_scopes:
exclusions = [scope.strip() for scope in
checkpoint_exclude_scopes.split(',')]
variables_to_restore = []
for var in trainable_vars:
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
if not excluded:
variables_to_restore.append(var)
init_saver = tf.train.Saver(var_list=variables_to_restore)
init_savers[key] = init_saver
return init_savers
def init_fn(init_savers, sess):
"""The init_fn to be passed to the Supervisor.
Args:
init_savers: Dictionary of init_savers in the format:
'init_saver_name': init_saver.
sess: A TensorFlow Session object.
"""
# Load the weights for ResNet
if FLAGS.resnet_ckpt:
print('Restoring checkpoint from %s.' % FLAGS.resnet_ckpt)
tf.logging.info('Restoring checkpoint from %s.' % FLAGS.resnet_ckpt)
resnet_init_saver = init_savers['ResNet']
resnet_init_saver.restore(sess, FLAGS.resnet_ckpt)
# Load the weights for VGG
if FLAGS.vgg_ckpt:
print('Restoring checkpoint from %s.' % FLAGS.vgg_ckpt)
tf.logging.info('Restoring checkpoint from %s.' % FLAGS.vgg_ckpt)
vgg_init_saver = init_savers['VGG']
vgg_init_saver.restore(sess, FLAGS.vgg_ckpt)
if FLAGS.resnet_ckpt is None and FLAGS.vgg_ckpt is None:
return None
最后,用如下代码替换 train.py 中的 get_init_fn
函数(需要增加导入语句:import model_utils.py
,以及 from functools import partial
):
def get_init_fn():
"""Returns a function run by che chief worker to warm-start the training.
Returns:
An init function run by the supervisor.
"""
var_scopes_dict = {'ResNet': 'resnet_v1_50',
'VGG': 'vgg_16'}
checkpoint_exclude_scopes_dict = {'ResNet': 'resnet_v1_50/logits',
'VGG': 'vgg_16/fc8'}
init_savers = model_utils.retrieve_init_savers(
var_scopes_dict, checkpoint_exclude_scopes_dict)
init_fn = partial(model_utils.init_fn, init_savers)
return init_fn
其它代码照旧就可以了(此时,batch_size 需要调小才能在 1080Ti 上训练)。
一次性导入多个预训练模型参数的思路非常简单,首先根据模型变量的命名空间,比如 ResNet-50 的命名空间 resnet_v1_50 以及 VGG-16 的命名空间 vgg_16,借助函数 tf.trainable_variables()
将相应命名空间中的可训练变量列表找出来(同时排除掉不需要的预训练参数);接着就可以用语句 tf.train.Saver(var_list=variables_to_restore)
定义模型保存的实例,然后用这些实例的 restore
函数将预训练参数逐个恢复。