在模型训练DL模型时,随着模型的epoch迭代,往往会推荐逐渐减小learning rate,在一些实验中也证明确实对训练的收敛有正向效果。对于learning rate的改变,有定制衰减规则直接控制的,也有通过算法自动寻优的。这里主要介绍下TF自带的两种衰减方法:指数衰减和多项式衰减。
指数衰减(tf.train.exponential_decay)
方法原型:
tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None){#exponential_decay}
参数:
learning_rate:初始值
global_step:全局step数(每个step对应一次batch)
decay_steps:learning rate更新的step周期,即每隔多少step更新一次learning rate的值
decay_rate:指数衰减参数(对应α^t中的α)
staircase:是否阶梯性更新learning rate,也就是global_step/decay_steps的结果是float型还是向下取整
计算公式:
decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)
多项式衰减(tf.train.polynomial_decay)
方法原型:
tf.train.polynomial_decay(learning_rate, global_step, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, name=None){#polynomial_decay}
参数:
learning_rate:初始值
global_step:全局step数(每个step对应一次batch)
decay_steps:learning rate更新的step周期,即每隔多少step更新一次learning rate的值
end_learning_rate:衰减最终值
power:多项式衰减系数(对应(1-t)^α的α)
cycle:step超出decay_steps之后是否继续循环t
计算公式:
当cycle=False时
global_step=min(global_step, decay_steps)
decayed_learning_rate=
(learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate
当cycle=True时
decay_steps=decay_steps*ceil(global_step/decay_steps)
decayed_learning_rate=
(learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate
注:ceil是向上取整
更新lr的一般代码:
def _configure_learning_rate(num_samples_per_epoch, global_step):
"""Configures the learning rate.
Args:
num_samples_per_epoch: The number of samples in each epoch of training.
global_step: The global_step tensor.
Returns:
A `Tensor` representing the learning rate.
Raises:
ValueError: if
"""
decay_steps = int(num_samples_per_epoch / FLAGS.batch_size *
FLAGS.num_epochs_per_decay)
if FLAGS.sync_replicas:
decay_steps /= FLAGS.replicas_to_aggregate
if FLAGS.learning_rate_decay_type == 'exponential':
return tf.train.exponential_decay(FLAGS.learning_rate,
global_step,
decay_steps,
FLAGS.learning_rate_decay_factor,
staircase=True,
name='exponential_decay_learning_rate')
elif FLAGS.learning_rate_decay_type == 'fixed':
return tf.constant(FLAGS.learning_rate, name='fixed_learning_rate')
elif FLAGS.learning_rate_decay_type == 'polynomial':
return tf.train.polynomial_decay(FLAGS.learning_rate,
global_step,
decay_steps,
FLAGS.end_learning_rate,
power=1.0,
cycle=False,
name='polynomial_decay_learning_rate')
else:
raise ValueError('learning_rate_decay_type [%s] was not recognized',
FLAGS.learning_rate_decay_type)