优化器中 最重要的一个参数是学习率,合理的学习率可以使优化器快速收敛。一般在训练初期设定较大的学习率,随着训练的进行,学习率逐渐减小,学习率什么时候减小,减小多少,这就涉及到 学习率调整方法。
pytorch V1.60 提供了 10种 learning rate 调整方法,这里做一个简单的总结。
所有的学习率调整方法可以分3大类,分别是 有序调整,自适应调整,自定义调整。
第一类:有序调整,依据一定的规律有序进行调整,这一类是最常用的,分别是等间隔下降(step), 按需设定下降间隔(MultiStep),指数下降(Exponential)和余弦退火CosineAnnealing。这种方法的调整时间都是人为可控的,也是训练时常用到的。
第二类:自适应调整,依据训练状态伺机调整,ReduceLROnPlateau方法。该方法通过监测某一指标的变化情况,当该指标不再怎么变化的时候,就是调整学习率的时机,因而属于自适应调整。
第三类:自定义调整,Lambda. lamda方法提供的调整策略十分灵活。我们可以为不同的层设置不同的学习率调整方法,这在fine-tune中十分有用,我们不仅可以为不同层设定不同的学习率,还是设置不同的学习率调整策略。
torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
为不同参数组设定不同学习率调整策略。调整规则为 lr = base_lr * lambda(self.last_epoch)
参数:
>>> # Assuming optimizer has two groups.
>>> ignored_params = list(map(id, net.fc3.parameters()))
>>> base_params = filter(lambda p: id(p) not in ignored_params, net.parameters())
>>> optimizer = optim.SGD([{'params':base_params},{'params':net.fc3.parameters(), 'lr': 0.001*100}], 0.001, momentum=0.9, weight_decay=1e-4)
>>> lambda1 = lambda epoch: epoch // 30
>>> lambda2 = lambda epoch: 0.95 ** epoch
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
>>> print('epoch: ', i, 'lr: ', scheduler.get_lr())
输出:
epoch: 0 lr: [0.0, 0.1]
epoch: 1 lr: [0.0, 0.095]
epoch: 2 lr: [0.0, 0.09025]
epoch: 3 lr: [0.001, 0.0857375]
epoch: 4 lr: [0.001, 0.081450625]
epoch: 5 lr: [0.001, 0.07737809374999999]
epoch: 6 lr: [0.002, 0.07350918906249998]
epoch: 7 lr: [0.002, 0.06983372960937498]
epoch: 8 lr: [0.002, 0.06634204312890622]
epoch: 9 lr: [0.003, 0.0630249409724609]
为什么第一个参数组的学习率会是0呢?来看看学习率是如何计算的。
第一个参数的学习率设置为0.001, lambda = lambda epoch: epoch/3
第一个epoch时,由lr = base_lr * lambda(self.last_epoch) 可以知道 lr = 0.001 * (0//3) = 0
第二个参数组的学习率变化,初始为0.1, lr = 0.1 * 0.95^epoch,当epoch 为0时,lr=0.1, epoch为1时,lr=0.1*0.95.
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
等间隔调整学习率,调整倍数为gamma倍。间隔单位是step. 需要注意的是step通常是指epoch,而不是iteration。
参数:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 60
>>> # lr = 0.0005 if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
按照设定的间隔调整学习率。这个方法适合后期调试使用,观察loss曲线,为每个实验定制学习率调整时机。
参数:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 80
>>> # lr = 0.0005 if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
按指数衰减调整学习率。lr = lr * (gamma **epoch)
参数:
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
余弦退火方式调整学习率。以余弦为周期,并在每个周期最大值时重新设置学习率。
学习率调整公式:
η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T m a x π ) ) \eta_t = \eta_{min} + \frac {1}{2} (\eta_{max} - \eta_{min})(1 + cos(\frac {T_{cur}}{T_{max}}\pi)) ηt=ηmin+21(ηmax−ηmin)(1+cos(TmaxTcurπ))
可以看出,余弦退火调整方式,是以初始学习率为最大学习率,以2*Tmax为周期,在一个周期内先下降后上升地调整学习率。
参数:
torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
这个是自适应学习率调整。非常实用的调整策略。
当某个指标不再变化(loss不再下降或者acc不再升高),调整学习率。
参数:
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>> train(...)
>>> val_loss = validate(...)
>>> # Note that step should be called after validate()
>>> scheduler.step(val_loss)
torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1)
这个是来自于 Cyclical Learning rates for training Neural Networks 这篇文章。
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
>>> data_loader = torch.utils.data.DataLoader(...)
>>> for epoch in range(10):
>>> for batch in data_loader:
>>> train_batch(...)
>>> scheduler.step()
注意:pytorch中, 学习率更新时 scheduler.step(),新版的pytorch 已经不依赖于epoch了 ,如果你在epoch的for循环中调用 step(),与之前的方法一样 epoch 也会加一。 如果你在epoch 里面的 iteration 的for循环中调用 lr_schedular.step() 那么会调用get_closed_form_lr() 或者就直接get_lr() 这里epoch 参数已经不是必须的了。
在源码中 torch/optim/lr_scheduler.py , step()的方法在LRScheduler类当中,该类作为所有学习率调整的基类,其中定义了一些基本方法,如step(),以及最常用的get_lr(),不过get_lr是一个虚函数,均需要在派生类中重新定义。
具体可以参考:lr_scheduler
我们看一下 step()函数。
class _LRScheduler(object):
def __init__(self, optimizer, last_epoch=-1):
# Attach optimizer
if not isinstance(optimizer, Optimizer):
raise TypeError('{} is not an Optimizer'.format(
type(optimizer).__name__))
self.optimizer = optimizer
# Initialize epoch and base learning rates
if last_epoch == -1:
for group in optimizer.param_groups:
group.setdefault('initial_lr', group['lr'])
else:
for i, group in enumerate(optimizer.param_groups):
if 'initial_lr' not in group:
raise KeyError("param 'initial_lr' is not specified "
"in param_groups[{}] when resuming an optimizer".format(i))
self.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups))
self.last_epoch = last_epoch
# Following https://github.com/pytorch/pytorch/issues/20124
# We would like to ensure that `lr_scheduler.step()` is called after
# `optimizer.step()`
def with_counter(method):
if getattr(method, '_with_counter', False):
# `optimizer.step()` has already been replaced, return.
return method
# Keep a weak reference to the optimizer instance to prevent
# cyclic references.
instance_ref = weakref.ref(method.__self__)
# Get the unbound method for the same purpose.
func = method.__func__
cls = instance_ref().__class__
del method
@wraps(func)
def wrapper(*args, **kwargs):
instance = instance_ref()
instance._step_count += 1
wrapped = func.__get__(instance, cls)
return wrapped(*args, **kwargs)
# Note that the returned function here is no longer a bound method,
# so attributes like `__func__` and `__self__` no longer exist.
wrapper._with_counter = True
return wrapper
self.optimizer.step = with_counter(self.optimizer.step)
self.optimizer._step_count = 0
self._step_count = 0
self.step()
def state_dict(self):
"""Returns the state of the scheduler as a :class:`dict`.
It contains an entry for every variable in self.__dict__ which
is not the optimizer.
"""
return {key: value for key, value in self.__dict__.items() if key != 'optimizer'}
def load_state_dict(self, state_dict):
"""Loads the schedulers state.
Arguments:
state_dict (dict): scheduler state. Should be an object returned
from a call to :meth:`state_dict`.
"""
self.__dict__.update(state_dict)
def get_last_lr(self):
""" Return last computed learning rate by current scheduler.
"""
return self._last_lr
def get_lr(self):
# Compute learning rate using chainable form of the scheduler
raise NotImplementedError
def step(self, epoch=None):
# Raise a warning if old pattern is detected
# https://github.com/pytorch/pytorch/issues/20124
if self._step_count == 1:
if not hasattr(self.optimizer.step, "_with_counter"):
warnings.warn("Seems like `optimizer.step()` has been overridden after learning rate scheduler "
"initialization. Please, make sure to call `optimizer.step()` before "
"`lr_scheduler.step()`. See more details at "
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
# Just check if there were two first lr_scheduler.step() calls before optimizer.step()
elif self.optimizer._step_count < 1:
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
"In PyTorch 1.1.0 and later, you should call them in the opposite order: "
"`optimizer.step()` before `lr_scheduler.step()`. Failure to do this "
"will result in PyTorch skipping the first value of the learning rate schedule. "
"See more details at "
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
self._step_count += 1
class _enable_get_lr_call:
def __init__(self, o):
self.o = o
def __enter__(self):
self.o._get_lr_called_within_step = True
return self
def __exit__(self, type, value, traceback):
self.o._get_lr_called_within_step = False
with _enable_get_lr_call(self):
if epoch is None:
self.last_epoch += 1
values = self.get_lr()
else:
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
self.last_epoch = epoch
if hasattr(self, "_get_closed_form_lr"):
values = self._get_closed_form_lr()
else:
values = self.get_lr()
for param_group, lr in zip(self.optimizer.param_groups, values):
param_group['lr'] = lr
self._last_lr = [group['lr'] for group in self.optimizer.param_groups]
参考:
lr 调整方法总结
pytorch tutorial
lr 调整方法