学习率对于每一个深度学习研究者和开发者来说是一个老生常谈的问题了,在设置学习率大小的时候我们经常会纠结精度和速度之间到底选择哪个,试图找到一种两者之间的平衡。
通俗来讲,退化学习率就是训练过程中使学习率逐渐减小,当训练开始时我们使用大学习率加快训练速度,在一定程度后我们减小学习率来提高模型的精确度。
定义退化学习率的方法:
def exponential_decay(learning_rate, global_step,
decay_step, decay_rate, staircase=Flase, name=None):
learning_rate是学习率,
global_step是迭代步数变量,
decay_steps, 带迭代多少次进行衰减
decay_rate, 迭代decay_steps次衰减的值
staircase=False, 默认为False,为True则不衰减
2.0版本我们使用keras构建退化学习率:
import tensorflow as tf
exponential_decay = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=1., decay_steps=1, decay_rate=0.96)
使用tensorflow我们可以编写一个测试程序,基于tensorflow1.x改进的2.0版本程序如下:
import tensorflow as tf
import math
import numpy as np
initial_learning_rate = 0.1
learning_rate = tf.compat.v1.train.exponential_decay(
initial_learning_rate, global_step = global_step,
decay_steps = 10, decay_rate = 0.9)
opt = tf.compat.v1.train.GradientDescentOptimizer(learning_rate)
add_global = global_step.assign_add(1)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(learning_rate))
for i in range(20):
g, rate =sess.run([add_global, learning_rate])
print(g, rate)
如果读者使用的是tensorflow 1.x版本 ,将上列代码中出现的.compat.v1去掉即可。
运行代码结果如下:
第一个数是迭代的次数,第二个输出是学习率。通过测试我们可以看出学习率逐渐变小。
这种训练策略非常常用,在训练神经网络时,通常在训练刚开始时使用较大的学习率,训练的进行会慢慢减少学习率。注意在使用时一定要global_step输入进去,否则将没有衰减功能。
此外,为了进一步验证退化学习率的优点,我们使用MNIST进行测试,该程序使用tensorflow 1.x版本:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
w = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.zeros([10]))
pred = tf.matmul(x, w) + b
pred = tf.nn.softmax(pred)
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred), reduction_indices=1))
global_step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(initial_learning_rate,
global_step=global_step,
decay_steps=1000,
decay_rate=0.9)
opt = tf.train.GradientDescentOptimizer(learning_rate)
add_global = global_step.assign_add(1)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
training_epochs = 50
batch_size = 100
display_step = 1
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(mnist.train.num_examples/batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, c, add, rate = sess.run([optimizer, cost, add_global, learning_rate], feed_dict={x:batch_xs, y:batch_ys})
avg_cost += c / total_batch
if (epoch + 1) % display_step == 0:
print('epoch= ', epoch+1, ' cost= ', avg_cost, 'add_global=', add, 'rate=', rate)
print('finished')
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('accuracy: ', accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
测试结果得出使用exponential_decay提高了训练结果的精确度。
后续作者将更新使用tensorflow 2.0版本退化学习率在MNIST上的训练。
本文参考:
https://blog.csdn.net/Vici__/article/details/98596090?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2allsobaiduend~default-1-98596090.nonecase&utm_term=%E4%BB%80%E4%B9%88%E6%98%AF%E6%8C%87%E6%95%B0%E8%A1%B0%E5%87%8F&spm=1000.2123.3001.4430
https://www.it610.com/article/1281598963010519040.htm