i3d finetune(微调)

主要参考
https://github.com/deepmind/kinetics-i3d/blob/master/evaluate_sample.py
来恢复网络图(基本可以在提供的文件包中的用来predict,test或evaluate的文件中找到这个过程)。整个finetune流程跟我之前的博客https://blog.csdn.net/weixin_42388228/article/details/101209788
中流程一样,总体来说就是先恢复网络图(包括添加自己的层),再把预训练模型权重中的变量权重赋给现在网络图中同样名称的变量。

i3d finetune的坑主要在配置环境,需要安装dm-sonnet;dm-sonnet需要安装tensorflow-gpu 1.8.0版本以上,安装tensorflow-gpu的同时还得安装tensorflow-probability;如果tensorflow-probability和tensorflow的版本不对应,即使安装了tensorflow-probability还会老是提示需要安装tensorflow-probability,具体的版本对应问题参考我的另一篇博客https://blog.csdn.net/weixin_42388228/article/details/102608779

我的环境是
python 3.6.2
tensorflow-gpu 1.14.0
tensorflow-probability 0.7.0
dm-sonnet 1.35
cuda 10.1.168
cudnn 7.6.0
cuda和cudnn在安装tensorflow-gpu使用conda install tensorflow-gpu==1.14.0会自动下载安装匹配的版本
安装dm-sonnet后测试import会出现很多warning,不用管它

另一个坑就是就是i3d-kinetics使用的optimizer是SGD,finetune时使用ADAM或RMS会报错

具体代码如下(该py文件需要放在kinetics-i3d-master文件下)

from __future__ import division, print_function, absolute_import
import tensorflow as tf
import numpy as np
import cv2
import time
import os
import i3d
import sonnet as snt
from i3d import Unit3D
from sklearn.model_selection import train_test_split
os.environ["CUDA_VISIBLE_DEVICES"] = '2,3'
########################################################################################################################
############################################    graph construction     #################################################
########################################################################################################################

rgb_input = tf.placeholder(tf.float32, shape=(1, 32, 224, 224, 3))
#恢复之前的网络图
with tf.variable_scope('RGB'):
    rgb_model = i3d.InceptionI3d(5, spatial_squeeze=True, final_endpoint='Mixed_5c')
    rgb_logits_before, _ = rgb_model(rgb_input, is_training=True, dropout_keep_prob=0.5)
#增加自己的层
with tf.variable_scope('Logits_new'):
    net = tf.nn.avg_pool3d(rgb_logits_before, ksize=[1, 2, 7, 7, 1],
                           strides=[1, 1, 1, 1, 1], padding=snt.VALID)
    net = tf.nn.dropout(net, 0.5)
    logits = Unit3D(output_channels=5,
                    kernel_shape=[1, 1, 1],
                    activation_fn=None,
                    use_batch_norm=False,
                    use_bias=True,
                    name='Conv3d_0c_1x1_new')(net, is_training=True)
    logits = tf.squeeze(logits, [2, 3], name='SpatialSqueeze')
    averaged_logits = tf.reduce_mean(logits, axis=1)
predictions = tf.nn.softmax(averaged_logits)
rgb_output=tf.placeholder(tf.float32, shape=(1, 5))
loss = tf.losses.softmax_cross_entropy(onehot_labels=rgb_output, logits=predictions)+tf.losses.get_regularization_loss()

optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(loss)
train_accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(predictions, 1), tf.argmax(rgb_output, 1)), tf.float32))

#挑出需要恢复变量值的变量
rgb_variable_map = {}
for variable in tf.global_variables():
    if variable.name.split('/')[0] == 'RGB':
        rgb_variable_map[variable.name.replace(':0', '')] = variable
   
var=tf.global_variables()
rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)
sess=tf.Session()
model_name='.../kinetics-i3d-master/data/checkpoints/rgb_imagenet/model.ckpt'
#恢复和预训练模型中同样变量名的变量
rgb_saver.restore(sess, model_name)
#挑出新增加的层
var_to_init=[val for val in var if 'RGB' not in val.name]
#初始化新增加层的变量
sess.run(tf.variables_initializer(var_to_init))
########################################################################################################################
############################################    model training     #################################################
########################################################################################################################
num_pairs=len(x_train)
for i in range(20000):
    x1 = time.time()
    batch_loss = 0
    batch_accuracy = 0
    indicies = list(range(len(x_train)))
    np.random.shuffle(indicies)
    for j in np.arange(len(x_train)):
        _, temp_loss, temp_accuracy = sess.run([optimizer, loss, train_accuracy],
                                               {rgb_input: x_train[indicies[j]].reshape(1, 32, 224, 224, 3),
                                                rgb_output: y_train[indicies[j]].reshape(1, 5)})
        batch_loss += temp_loss / num_pairs
        batch_accuracy += temp_accuracy / len(x_train)
    x2 = time.time()
    batch_loss_val = 0
    batch_accuracy_val = 0
    for k in np.arange(len(x_dev)):
        temp_loss_val, temp_accuracy_val = sess.run([loss, train_accuracy],
                                                    {rgb_input: x_dev[k].reshape(1, 32, 224, 224, 3),
                                                     rgb_output: y_dev[k].reshape(1, 5)})
        batch_loss_val += temp_loss_val / len(x_dev)
        batch_accuracy_val += temp_accuracy_val / len(x_dev)
    x3 = time.time()
    print('Epoch:' + str(i) + ' Loss:' + str(batch_loss) + ' Accuracy:' + str(
        batch_accuracy) + ' Time:' + str(x2 - x1) + ' Val loss:' + str(batch_loss_val) + ' Val acc:' + str(
        batch_accuracy_val) + ' Val_time:' + str(x3 - x2))





你可能感兴趣的:(tensorflow,动作识别,视频分类)