facenet专题4:train_softmax.py源代码解析--模型训练

接上文,现在开始介绍训练部分的代码:

        # Add center loss
        if args.center_loss_factor > 0.0:
            prelogits_center_loss, _ = facenet.center_loss(prelogits, label_batch, args.center_loss_alfa, nrof_classes)
            tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, prelogits_center_loss * args.center_loss_factor)

这里是添加中心损失,后续在做详细的介绍,继续往下看:

learning_rate = tf.train.exponential_decay(learning_rate_placeholder, global_step,
                                                   args.learning_rate_decay_epochs * args.epoch_size, args.learning_rate_decay_factor, staircase=True)
tf.summary.scalar('learning_rate', learning_rate)

tf.train.exponential_decay函数实现指数衰减学习率,其定义为tf.train.exponential_decay(learning_rate,global_step, decay_steps, decay_rate,staircase) ,其更新公式为decayed_learning_rate=learining_rate*decay_rate^(global_step/decay_steps) 

,learning_rate为原始学习率、global_step好比一个计数器每进行一次更新就会增一 、decay_steps为衰减间隔,顾名思义就是每隔多少步会更新一次学习率(它只有在staircase为true时才有效)、 decay_rate衰减率 staircase若为true则每隔decay_steps步对学习率进行一次更新,若为false则每一步都更新。tf.summary.scalar用于收集一维标量,添加标量统计结果,用于在在TensorBoard上展示的统计结果。

接下来定义sotfmax:

cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=label_batch, logits=logits, name='cross_entropy_per_example')

这里原始的方法,送入labels的是label_batch,由之前数据加载部分分析可以得到,label_batch依次根据不同的人脸编号为0、1、3。。。。。,由上文网络结构介绍可以知道,全连接层输出logits的单元个数与训练样本人的个数相同,即每一个人都是一个类别。

接下来得到训练的损失函数:

cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
# Calculate the total losses
regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
total_loss = tf.add_n([cross_entropy_mean] + regularization_losses, name='total_loss')

接下来是训练部分的代码:

train_op = facenet.train(total_loss, global_step, args.optimizer,learning_rate, args.moving_average_decay, tf.global_variables(), args.log_histograms)

进入facenet.py的代码中:

def train(total_loss, global_step, optimizer, learning_rate, moving_average_decay, update_gradient_vars, log_histograms=True):
    # Generate moving averages of all losses and associated summaries.
    loss_averages_op = _add_loss_summaries(total_loss)

    # Compute gradients.
    with tf.control_dependencies([loss_averages_op]):
        if optimizer=='ADAGRAD':
            opt = tf.train.AdagradOptimizer(learning_rate)
        elif optimizer=='ADADELTA':
            opt = tf.train.AdadeltaOptimizer(learning_rate, rho=0.9, epsilon=1e-6)
        elif optimizer=='ADAM':
            opt = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999, epsilon=0.1)
        elif optimizer=='RMSPROP':
            opt = tf.train.RMSPropOptimizer(learning_rate, decay=0.9, momentum=0.9, epsilon=1.0)
        elif optimizer=='MOM':
            opt = tf.train.MomentumOptimizer(learning_rate, 0.9, use_nesterov=True)
        else:
            raise ValueError('Invalid optimization algorithm')
    
        grads = opt.compute_gradients(total_loss, update_gradient_vars)
        
    # Apply gradients.
    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
  
    # Add histograms for trainable variables.
    if log_histograms:
        for var in tf.trainable_variables():
            tf.summary.histogram(var.op.name, var)
   
    # Add histograms for gradients.
    if log_histograms:
        for grad, var in grads:
            if grad is not None:
                tf.summary.histogram(var.op.name + '/gradients', grad)
  
    # Track the moving averages of all trainable variables.
    variable_averages = tf.train.ExponentialMovingAverage(
        moving_average_decay, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())
  
    with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
        train_op = tf.no_op(name='train')
  
    return train_op

很容易理解,tf.control_dependencies([apply_gradient_op, variables_averages_op])使得apply_gradient_op梯度下降算法和variables_averages_op变量衰减操作依次进行,tf.no_op(name='train')没有任何实际意义,只是保证两个操作依次进行。接下来定义存储:

saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=3)

定义存储对象,其中max_to_keep表示保存最近的3个节点。

summary_op = tf.summary.merge_all()

将所有summary全部保存到磁盘,以便tensorboard显示。接下来:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
sess =tf.Session(config=tf.ConfigProto(gpu_options=gpu_options,log_device_placement=False))
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
coord = tf.train.Coordinator()
tf.train.start_queue_runners(coord=coord, sess=sess)

初始化GPU配置、初始化变量然后启动读队列。接下来执行训练:

while epoch < args.max_nrof_epochs:
    step = sess.run(global_step, feed_dict=None)
    epoch = step // args.epoch_size
    # Train for one epoch
    train(args, sess, epoch, image_list, label_list, index_dequeue_op, enqueue_op, image_paths_placeholder, labels_placeholder,
          learning_rate_placeholder, phase_train_placeholder, batch_size_placeholder, global_step,
          total_loss, train_op, summary_op, summary_writer, regularization_losses, args.learning_rate_schedule_file)
    
    # Save variables and the metagraph if it doesn't exist already
    save_variables_and_metagraph(sess, saver, summary_writer, model_dir, subdir, step)

首先获取变量global_step的值,通过将global_step赋值给优化器的minimize()中的参数global_step(通过facenet.train()中的opt.apply_gradients(grads, global_step=global_step)将global_step赋值给minimize()中的global_step),每次调用优化器对应的minimize(),global_step都会自动加1,可以算出对应的epoch,接下来调用train函数,train函数的定义就在train_softmax.py文件:

def train(args, sess, epoch, image_list, label_list, index_dequeue_op, enqueue_op, image_paths_placeholder, labels_placeholder,
          learning_rate_placeholder, phase_train_placeholder, batch_size_placeholder, global_step,
          loss, train_op, summary_op, summary_writer, regularization_losses, learning_rate_schedule_file):
    batch_number = 0

    if args.learning_rate > 0.0:
        lr = args.learning_rate
    else:
        lr = facenet.get_learning_rate_from_file(learning_rate_schedule_file, epoch)

    index_epoch = sess.run(index_dequeue_op)
    label_epoch = np.array(label_list)[index_epoch]
    image_epoch = np.array(image_list)[index_epoch]

    # Enqueue one epoch of image paths and labels
    labels_array = np.expand_dims(np.array(label_epoch), 1)
    image_paths_array = np.expand_dims(np.array(image_epoch), 1)
    sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array})

    # Training loop
    train_time = 0
    while batch_number < args.epoch_size:
        start_time = time.time()
        feed_dict = {learning_rate_placeholder: lr, phase_train_placeholder: True, batch_size_placeholder: args.batch_size}
        if (batch_number % 100 == 0):
            err, _, step, reg_loss, summary_str = sess.run([loss, train_op, global_step, regularization_losses, summary_op], feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, global_step=step)
        else:
            err, _, step, reg_loss = sess.run([loss, train_op, global_step, regularization_losses], feed_dict=feed_dict)
        duration = time.time() - start_time
        print('Epoch: [%d][%d/%d]\tTime %.3f\tLoss %2.3f\tRegLoss %2.3f' %
              (epoch, batch_number + 1, args.epoch_size, duration, err, np.sum(reg_loss)))
        batch_number += 1
        train_time += duration
    # Add validation loss and accuracy to summary
    summary = tf.Summary()
    # pylint: disable=maybe-no-member
    summary.value.add(tag='time/total', simple_value=train_time)
    summary_writer.add_summary(summary, step)
    return step

首先执行:

index_epoch = sess.run(index_dequeue_op)
label_epoch = np.array(label_list)[index_epoch]
image_epoch = np.array(image_list)[index_epoch]

读取队列生成的索引,根据索引选取出image_epoch(带绝对路径的不同人脸文件)和label_epoch(人脸文件对应的编号),具体文件的读取可参考数据加载,接下来执行:

labels_array = np.expand_dims(np.array(label_epoch), 1)
image_paths_array = np.expand_dims(np.array(image_epoch), 1)
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array})

前两行将变量的维度由(N,)变为(N,1),最后将训练数据送入image_paths_placeholder和labels_placeholder,即将人脸文件和对应的label加入队列,接下来每次执行train_op,都会读取image_batch和label_batch的值,而根据之前代码,image_batch 和label_batch的值都是通过读取队列索引然后解析对应位置的图片文件实现的:

nrof_preprocess_threads = 4
images_and_labels = []
for _ in range(nrof_preprocess_threads):
    filenames, label = input_queue.dequeue()
    images = []
    for filename in tf.unstack(filenames):
        file_contents = tf.read_file(filename)
        image = tf.image.decode_png(file_contents)
        if args.random_rotate:
            image = tf.py_func(facenet.random_rotate_image, [image], tf.uint8)
        if args.random_crop:
            image = tf.random_crop(image, [args.image_size, args.image_size, 3])
        else:
            image = tf.image.resize_image_with_crop_or_pad(image, args.image_size, args.image_size)
        if args.random_flip:
            image = tf.image.random_flip_left_right(image)

        # pylint: disable=no-member
        image.set_shape((args.image_size, args.image_size, 3))
        images.append(tf.image.per_image_standardization(image))
    images_and_labels.append([images, label])

image_batch, label_batch = tf.train.batch_join(
    images_and_labels, batch_size=batch_size_placeholder,
    shapes=[(args.image_size, args.image_size, 3), ()], enqueue_many=True,
    capacity=4 * nrof_preprocess_threads * args.batch_size,
    allow_smaller_final_batch=True)
image_batch = tf.identity(image_batch, 'image_batch')
image_batch = tf.identity(image_batch, 'input')
label_batch = tf.identity(label_batch, 'label_batch')

至此训练部分代码结束,下一篇将进一步介绍下如何使用LFW数据库验证训练出的模型的准确性。

你可能感兴趣的:(facenet专题4:train_softmax.py源代码解析--模型训练)