梯度下降法中,公式为
w i + 1 = w i − L R × g ( w i ) w_{i+1}=w_{i}-L R \times g\left(w_{i}\right) wi+1=wi−LR×g(wi)
通过学习率 L R LR LR控制更新的步伐,但是如果不对学习率进行调整,则有可能出现达不到最优解( L R LR LR过大)或者收敛速度慢( L R LR LR过小)的情况;因此在训练模型时,非常有必要对学习率进行调整;一般在训练初期给予较大的学习率,随着训练 的进行,学习率逐渐减小。学习率什么时候减小,减小多少,这就涉及到学习率调整方法。
Pytorch
中的六种学习率调整策略_LRScheduler
Pytorch
中的六种学习率调整方法都是继承于基类_LRScheduler
optimizer
:关联的优化器last_epoch
:记录epoch数base_lrs
:记录初始学习率step
:更新下一个epoch的学习率get_lr
:虚函数,计算下一个epoch的学习率lr_scheduler.StepLR
torch.optim.lr_scheduler.StepLR(optimizer,
step_size,
gamma=0.1,
last_epoch=-1)
功能:等间隔调整学习率
主要参数:
step_size
:调整间隔数,若为 30,则会在 30、60、90…个 step 时,将学习率调整为 l r ⋅ g a m m a lr \cdot gamma lr⋅gamma
gamma
:调整系数(学习率调整倍数,默认为 0.1 倍,即下降 10 倍)
last_epoch
:上一个 epoch 数,这个变量用来指示学习率是否需要调整。当last_epoch 符合设定的间隔时,就会对学习率进行调整。当为-1时,学习率设置为初始值
调整方法: l r = l r ⋅ g a m m a lr = lr \cdot gamma lr=lr⋅gamma
实例
LR = 0.1
iteration = 10
max_epoch = 200
scheduler_lr = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1) # 设置学习率下降策略
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Step LR Scheduler")
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
从上面可以看出,每到50的倍数,学习率 L R LR LR为原来的0.1倍
lr_scheduler.MultiStepLR
milestones
:设定调整时刻数gamma
:调整系数last_epoch
:上一个 epoch 数,这个变量用来指示学习率是否需要调整LR = 0.1
iteration = 10
max_epoch = 200
milestones = [50, 125, 160]
scheduler_lr = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=0.1)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Multi Step LR Scheduler\nmilestones:{}".format(milestones))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
由于设置了时刻数为50, 125, 160,因此从上图可以看出,在这几个点,学习率为原来的0.1倍
lr_scheduler.ExponentialLR
lr_scheduler.ExponentialLR(optimizer,
gamma,
last_epoch=-1)
gamma
:指数的底,指数为epochLR = 0.1
iteration = 10
max_epoch = 200
gamma = 0.95
scheduler_lr = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Exponential LR Scheduler\ngamma:{}".format(gamma))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
lr_scheduler.CosineAnnealingLR
lr_scheduler.CosineAnnealingLR(optimizer,
T_max,
eta_min=0,
last_epoch=-1)
T_max
:一次学习率周期的迭代次数,即 T_max 个 epoch 之后重新设置学习率eta_min
:最小学习率,即在一个周期中,学习率最小会下降到 eta_min,默认值为0LR = 0.1
iteration = 10
max_epoch = 200
t_max = 50
scheduler_lr = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=t_max, eta_min=0.)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="CosineAnnealingLR Scheduler\nT_max:{}".format(t_max))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
lr_scheduler.ReduceLROnPlateau
lr_scheduler.ReduceLROnPlateau(optimizer,
mode='min',
factor=0.1,
patience=10,
verbose=False,
threshold=0.0001,
threshold_mode='rel',
cooldown=0,
min_lr=0,
eps=1e-08)
mode
:min/max 两种模式factor
:调整系数patience
:“耐心”,接受几次不变化cooldown
:“冷却时间”,停止监控一段时间verbose
:是否打印日志min_lr
:学习率下限eps
:学习率衰减最小值LR = 0.1
iteration = 10
max_epoch = 200
loss_value = 0.5
accuray = 0.9
factor = 0.1
mode = "min"
patience = 10
cooldown = 10
min_lr = 1e-4
verbose = True
scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=factor, mode=mode, patience=patience,cooldown=cooldown, min_lr=min_lr, verbose=verbose)
for epoch in range(max_epoch):
for i in range(iteration):
# train(...)
optimizer.step()
optimizer.zero_grad()
if epoch == 5:
loss_value = 0.4
scheduler_lr.step(loss_value)
# Epoch 17: reducing learning rate of group 0 to 1.0000e-02.
# Epoch 38: reducing learning rate of group 0 to 1.0000e-03.
# Epoch 59: reducing learning rate of group 0 to 1.0000e-04.
LambdaLR
class torch.optim.lr_scheduler.LambdaLR(optimizer,
lr_lambda,
last_epoch=-1)
lr_lambda
:一个计算学习率调整倍数的函数,输入通常为 step,当有多个参数组时,设为 listLR = 0.1
iteration = 10
max_epoch = 200
lr_init = 0.1
weights_1 = torch.randn((6, 3, 5, 5))
weights_2 = torch.ones((5, 5))
optimizer = optim.SGD([
{'params': [weights_1]},
{'params': [weights_2]}], lr=lr_init)
lambda1 = lambda epoch: 0.1 ** (epoch // 20)
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
for i in range(iteration):
# train(...)
optimizer.step()
optimizer.zero_grad()
scheduler.step()
lr_list.append(scheduler.get_lr())
epoch_list.append(epoch)
print('epoch:{:5d}, lr:{}'.format(epoch, scheduler.get_lr()))
plt.plot(epoch_list, [i[0] for i in lr_list], label="lambda 1")
plt.plot(epoch_list, [i[1] for i in lr_list], label="lambda 2")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.title("LambdaLR")
plt.legend()
plt.show()
PyTorch 提供了六种学习率调整方法,可分为三大类,分别是