本篇博客主要讲了利用ckpt文件载入预训练模型,进行finetuning的两种方法,第一种是通过导入训练好的graph进行finetuning,该方法在训练的时候不用再复写之前的网络,只需要写自己修改的网络部分即可,该部分参考博客链接如下:https://blog.csdn.net/Alienge/article/details/81012363
https://blog.csdn.net/chanbo8205/article/details/85067610
本博客主要讲的是第二种方法,通过载入部分层的参数去finetuning一个自己修改的网络,该方法需要复写所有的网络(复写网络的时候要保证各种变量的命名要与保存的模型中的各种变量的命名一致),个人认为此种方法比较灵活,可以任意的选择载入某些层的参数,finetuning某些层的参数。两种方法都是先构造好图,然后通过restore来给图里的变量赋值。
checkpoint_dir目录下保存的文件结构如下:
|--checkpoint_dir
| |--checkpoint
| |--MyModel.meta
| |--MyModel.data-00000-of-00001
| |--MyModel.index
MyModel.meta文件保存的是图结构,meta文件是pb(protocol buffer)格式文件,包含变量、op、集合等。
ckpt文件是二进制文件,保存了所有的weights、biases、gradients等变量。在tensorflow 0.11之前,保存在.ckpt文件中。0.11后,通过两个文件保存,如:
MyModel.data-00000-of-00001
MyModel.index
我们还可以看到checkpoint_dir目录下还有checkpoint文件,该文件是个文本文件,里面记录了保存的最新的checkpoint文件以及其它checkpoint文件列表。在inference时,可以通过修改这个文件,指定使用哪个model
tensorflow 提供了tf.train.Saver类来保存模型,值得注意的是,在tensorflow中,变量是存在于Session环境中,也就是说,只有在Session环境下才会存有变量值,因此,保存模型时需要传入session。
如果我们不对tf.train.Saver指定任何参数,默认会保存所有变量。如果你不想保存所有变量,而只保存一部分变量,可以通过指定variables/collections。在创建tf.train.Saver实例时,通过将需要保存的变量构造list或者dictionary,传入到Saver中:
import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver([w1,w2])
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, './checkpoint_dir/MyModel',global_step=1000)
如果我们要加载预训练模型中的部分参数,则需要知道模型中各种变量的名字,其次是在session中,所有的变量都要初始化,否则程序报错。对于要加载的参数,通过restore(载入模型中的参数)来初始化,对于新添加的变量通过tf.variables_initializer( )来初始化。
下面一段代码可以得到未被初始化的变量
def get_uninitialized_variables(sess):
global_vars = tf.global_variables()
# print([str(i.name) for i in global_vars])
is_not_initialized = sess.run([tf.is_variable_initialized(var) for var in global_vars])
not_initialized_vars = [v for (v, f) in zip(global_vars, is_not_initialized) if not f]
print([str(i.name) for i in not_initialized_vars])
return not_initialized_vars
以一个简单的3层的NN网络训练,然后载入前面2层的weights,再加一层普通网络和softmax网络进行finetuning,下面简要的画下网络的图:
图中颜色相同的矩形框表示需要固定住的权重的网络层,上图中的下面一个网络结构是finetuning需要加上的层数。
下面以tensorflow自带的手写数字识别数据进行测试。
from tensorflow.examples.tutorials.mnist import input_data
from datetime import datetime
import os
mnist = input_data.read_data_sets('MNIST_data',one_hot=True)
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size',100,'''batch_size''')
tf.app.flags.DEFINE_integer('traing_epoches',15,'''epoch''')
tf.app.flags.DEFINE_string('check_point_dir','./','check_ponint_dir')
def _bias_variable(name,shape,initializer):
var = tf.get_variable(name, shape, initializer=initializer, dtype=tf.float32)
return var
def _weight_variable(name,shape,std):
return _bias_variable(name, shape, initializer=tf.truncated_normal_initializer(stddev=std,dtype=tf.float32),
)
def inference(x):
with tf.variable_scope('layer1') as scope:
weights = _weight_variable('weights',[784,256],0.04)
bias = _bias_variable('bias',[256],tf.constant_initializer(0.1))
layer1 = tf.nn.relu(tf.matmul(x,weights)+bias,name=scope.name)
with tf.variable_scope('layer2') as scope:
weights = _weight_variable('weights',[256,128],std=0.02)
bias = _bias_variable('bias',[128],tf.constant_initializer(0.2))
layer2 = tf.nn.relu(tf.matmul(layer1,weights)+bias,name=scope.name)
with tf.variable_scope('softmax_linear') as scope:
weights = _weight_variable('weights',[128,10],std=1/192.0)
bias = _bias_variable('bias',[10],tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(layer2,weights),bias,name=scope.name)
return softmax_linear
def loss(logits,labels):
print(labels.get_shape().as_list())
print(logits.get_shape().as_list())
labels = tf.cast(labels,tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(labels,1),logits=logits,name = 'cross_entropy')
cross_entropy_mean = tf.reduce_mean(cross_entropy,name = 'cross_entropy')
return cross_entropy_mean
def train():
with tf.name_scope("input"):
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y = tf.placeholder(tf.float32, shape=[None, 10], name='y')
softmax_linear = inference(x)
cost = loss(softmax_linear,y)
opt = tf.train.AdamOptimizer(0.001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(softmax_linear, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
#重点
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(FLAGS.traing_epoches):
avg_cost = 0.0
total_batch = int(mnist.train.num_examples/FLAGS.batch_size)
for _ in range(total_batch):
batch_xs,batch_ys = mnist.train.next_batch(FLAGS.batch_size)
sess.run(opt,feed_dict={x:batch_xs,y:batch_ys})
cost_ = sess.run(cost,feed_dict={x:batch_xs,y:batch_ys})
print(("%s epoch: %d,cost: %.6f")%(datetime.now(),epoch+1,cost_))
if (epoch+1) % 5 == 0:
check_point_file = os.path.join(FLAGS.check_point_dir,'my_test_model')
#重点
saver.save(sess,check_point_file,global_step=epoch+1)
mean_accuary = sess.run(accuracy,{x:mnist.test.images,y:mnist.test.labels})
print("accuracy %3.f"%mean_accuary)
print()
def main(_):
train()
if __name__ == '__main__':
tf.app.run()
下面一段代码可以获得.ckpt保存model的变量名以及对应的数值,这样我们就可以方便的根据名字去掉不用加载的参数
from tensorflow.python import pywrap_tensorflow
#.ckpt文件保存的位置
logdir='/home/model/'
# checkpoint_path = os.path.join(model_dir, "model.ckpt-9999")
ckpt = tf.train.get_checkpoint_state(logdir)
reader = pywrap_tensorflow.NewCheckpointReader(ckpt.model_checkpoint_path)
var_to_shape_map = reader.get_variable_to_shape_map()
for key,value in var_to_shape_map:
print("tensor_name: ", key)
下面是载入预训练模型部分参数进行finetuning的代码
from tensorflow.examples.tutorials.mnist import input_data
from datetime import datetime
import os
mnist = input_data.read_data_sets('MNIST_data',one_hot=True)
import tensorflow as tf
def _bias_variable(name,shape,initializer):
var = tf.get_variable(name, shape, initializer=initializer, dtype=tf.float32)
return var
def _weight_variable(name,shape,std):
return _bias_variable(name, shape, initializer=tf.truncated_normal_initializer(stddev=std,dtype=tf.float32),
)
#重新定义网络模型,前面部分要保持一致
def inference(input):
with tf.variable_scope('layer1') as scope:
weights = _weight_variable('weights',[784,256],0.04)
bias = _bias_variable('bias',[256],tf.constant_initializer(0.1))
layer1 = tf.nn.relu(tf.matmul(x,weights)+bias,name=scope.name)
with tf.variable_scope('layer2') as scope:
weights = _weight_variable('weights',[256,128],std=0.02)
bias = _bias_variable('bias',[128],tf.constant_initializer(0.2))
layer2 = tf.nn.relu(tf.matmul(layer1,weights)+bias,name=scope.name)
#加入第三层
with tf.variable_scope('layer3') as scope:
weights = _weight_variable('weights',[128,64],std=0.001)
bias = _bias_variable('bias',[64],tf.constant_initializer(0.0))
layer3 = tf.nn.relu(tf.matmul(input, weights) + bias, name=scope.name)
with tf.variable_scope('softmax_linear') as scope:
weights = _weight_variable('weights', [64, 10], std=1 / 192.0)
bias = _bias_variable('bias', [10], tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(layer3, weights), bias, name=scope.name)
return softmax_linear
def loss(logits,labels):
labels = tf.cast(labels,tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(labels,1),logits=logits,name = 'cross_entropy')
cross_entropy_mean = tf.reduce_mean(cross_entropy,name = 'cross_entropy')
return cross_entropy_mean
def get_uninitialized_variables(sess):
global_vars = tf.global_variables()
# print([str(i.name) for i in global_vars])
is_not_initialized = sess.run([tf.is_variable_initialized(var) for var in global_vars])
not_initialized_vars = [v for (v, f) in zip(global_vars, is_not_initialized) if not f]
print([str(i.name) for i in not_initialized_vars])
return not_initialized_vars
batch_size = 100
training_epoch = 20
def train():
with tf.name_scope("input"):
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y = tf.placeholder(tf.float32, shape=[None, 10], name='y')
softmax_linear = inference(x)
cost = loss(softmax_linear,y)
opt = tf.train.AdamOptimizer(0.001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(softmax_linear, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
# Set tf saver 重点
#载入'softmax_linear'之外的所有变量,通过修改exclude=[ ]里面的参数改变载入的变量
variables_to_restore = tf.contrib.framework.get_variables_to_restore(exclude=['softmax_linear'])
saver_old = tf.train.Saver(variables_to_restore) #获取导入图的saver
saver = tf.train.Saver() #创建新的图的saver,用于保存修改网络后的参数
with tf.Session() as sess:
saver_old.restore(sess, tf.train.latest_checkpoint('./'))
#初始化新加入的变量
sess.run(tf.variables_initializer(get_uninitialized_variables(sess)))
########
也可以先初始化所有的变量,再restore
sess.run(tf.global_variables_initializer())
saver_old.restore(sess, tf.train.latest_checkpoint('./'))
#########
for epoch in range(FLAGS.traing_epoches):
avg_cost = 0.0
total_batch = int(mnist.train.num_examples/FLAGS.batch_size)
for _ in range(total_batch):
batch_xs,batch_ys = mnist.train.next_batch(FLAGS.batch_size)
sess.run(opt,feed_dict={x:batch_xs,y:batch_ys})
cost_ = sess.run(cost,feed_dict={x:batch_xs,y:batch_ys})
print(("%s epoch: %d,cost: %.6f")%(datetime.now(),epoch+1,cost_))
if (epoch+1) % 5 == 0:
check_point_file = os.path.join(FLAGS.check_point_dir,'my_test_model')
saver.save(sess,check_point_file,global_step=epoch+1) #此时保存的是加入新的层数的网络结构
mean_accuary = sess.run(accuracy,{x:mnist.test.images,y:mnist.test.labels})
print("accuracy %3.f"%mean_accuary)
print()
def main(_):
train()
if __name__ == '__main__':
tf.app.run()
上述代码中重点是下面几行:
# Set tf saver 重点
#载入'softmax_linear'之外的所有变量,通过修改exclude=[ ]里面的参数改变载入的变量
variables_to_restore = tf.contrib.framework.get_variables_to_restore(exclude=['softmax_linear'])
saver_old = tf.train.Saver(variables_to_restore) #获取导入图的saver
saver = tf.train.Saver() #创建新的图的saver,用于保存修改网络后的参数
with tf.Session() as sess:
saver_old.restore(sess, tf.train.latest_checkpoint('./'))
#初始化新加入的变量
sess.run(tf.variables_initializer(get_uninitialized_variables(sess)))
########
也可以先初始化所有的变量,再restore
sess.run(tf.global_variables_initializer())
saver_old.restore(sess, tf.train.latest_checkpoint('./'))
#########
通过以上代码便可以完成对自己修改的网络的finetuning。