项目地址:https://github.com/Fafa-DL/Awesome-Backbones
项目说明:开箱即用,涵盖主流模型的图像分类|主干网络学习/对比/魔改项目
在models
下的配置文件中,学习率更新方式以及参数均在lr_config
中,在上一期讲网络如何被搭建时说了type
是我们所需的调用关键字。在core/optimizers/lr_update.py
中可以看到目前所支持的学习率更新策略。
常用的为StepLrUpdater
与CosineAnnealingLrUpdater
,均继承了LrUpdater
类,先来看看它
class LrUpdater(object):
"""LR Scheduler in MMCV.
Args:
by_epoch (bool): LR changes epoch by epoch
warmup (string): Type of warmup used. It can be None(use no warmup),
'constant', 'linear' or 'exp'
warmup_iters (int): The number of iterations or epochs that warmup
lasts
warmup_ratio (float): LR used at the beginning of warmup equals to
warmup_ratio * initial_lr
warmup_by_epoch (bool): When warmup_by_epoch == True, warmup_iters
means the number of epochs that warmup lasts, otherwise means the
number of iteration that warmup lasts
"""
def __init__(self,
by_epoch=True,
warmup=None,
warmup_iters=0,
warmup_ratio=0.1,
warmup_by_epoch=False):
# validate the "warmup" argument
if warmup is not None:
if warmup not in ['constant', 'linear', 'exp']:
raise ValueError(
f'"{warmup}" is not a supported type for warming up, valid'
' types are "constant" and "linear"')
if warmup is not None:
assert warmup_iters > 0, \
'"warmup_iters" must be a positive integer'
assert 0 < warmup_ratio <= 1.0, \
'"warmup_ratio" must be in range (0,1]'
self.by_epoch = by_epoch
self.warmup = warmup
self.warmup_iters = warmup_iters
self.warmup_ratio = warmup_ratio
self.warmup_by_epoch = warmup_by_epoch
if self.warmup_by_epoch:
self.warmup_epochs = self.warmup_iters
self.warmup_iters = None
else:
self.warmup_epochs = None
self.base_lr = [] # initial lr for all param groups
self.regular_lr = [] # expected lr if no warming up is performed
# 给optimizer设置新lr
def _set_lr(self, runner, lr_groups):
for param_group, lr in zip(runner.get("optimizer").param_groups,
lr_groups):
param_group['lr'] = lr
def get_lr(self, runner, base_lr):
raise NotImplementedError
# 获取新lr
def get_regular_lr(self, runner):
return [self.get_lr(runner, _base_lr) for _base_lr in self.base_lr]
# 获取warmup学习率更新策略的lr
def get_warmup_lr(self, cur_iters):
def _get_warmup_lr(cur_iters, regular_lr):
if self.warmup == 'constant':
warmup_lr = [_lr * self.warmup_ratio for _lr in regular_lr]
elif self.warmup == 'linear':
k = (1 - cur_iters / self.warmup_iters) * (1 -
self.warmup_ratio)
warmup_lr = [_lr * (1 - k) for _lr in regular_lr]
elif self.warmup == 'exp':
k = self.warmup_ratio**(1 - cur_iters / self.warmup_iters)
warmup_lr = [_lr * k for _lr in regular_lr]
return warmup_lr
if isinstance(self.regular_lr, dict):
lr_groups = {}
for key, regular_lr in self.regular_lr.items():
lr_groups[key] = _get_warmup_lr(cur_iters, regular_lr)
return lr_groups
else:
return _get_warmup_lr(cur_iters, self.regular_lr)
# 记录初始lr
def before_run(self, runner):
# NOTE: when resuming from a checkpoint, if 'initial_lr' is not saved,
# it will be set according to the optimizer params
for group in runner.get("optimizer").param_groups:
group.setdefault('initial_lr', group['lr'])
self.base_lr = [
group['initial_lr'] for group in runner.get("optimizer").param_groups
]
# 在周期更新前获取新lr并在optimizer中更新它
def before_train_epoch(self, runner):
if self.warmup_iters is None: # 即self.warmup_by_epoch为True,warmup_epochs = warmup_iters
epoch_len = len(runner.get("train_loader"))
self.warmup_iters = self.warmup_epochs * epoch_len # 按周期更新则warmup iters = warmup_epochs * datasets//batch size
# 不按周期更新则没必要在此进行lr更新,在下一步before_train_iter中更新
if not self.by_epoch:
return
self.regular_lr = self.get_regular_lr(runner)
self._set_lr(runner, self.regular_lr)
# 首先判断是否按周期更新lr,若按迭代次数更新即by_epoch为False,大于等于warmup_iters使用正常lr更新方式,小于则用warmup方式更新lr
def before_train_iter(self, runner):
cur_iter = runner.get("iter")
if not self.by_epoch:
self.regular_lr = self.get_regular_lr(runner)
if self.warmup is None or cur_iter >= self.warmup_iters:
self._set_lr(runner, self.regular_lr)
else:
warmup_lr = self.get_warmup_lr(cur_iter)
self._set_lr(runner, warmup_lr)
elif self.by_epoch:
# 按周期更新lr,若当前迭代步数大于warmup迭代步数阈值,直接返回,使用before_train_epoch中的lr
if self.warmup is None or cur_iter > self.warmup_iters:
return
elif cur_iter == self.warmup_iters: # 等于则用常规方式
self._set_lr(runner, self.regular_lr)
else:
warmup_lr = self.get_warmup_lr(cur_iter) # 小于则用warmup方式
self._set_lr(runner, warmup_lr)
LrUpdater
执行的顺序为:
__init__
,初始化,主要针对warmup,需注意的是若warmup_by_epoch=True
则warmup_iters
代表的是warmup作用的epoch
数,若warmup_by_epoch=False
,则warmup_iters
代表的是warmup作用的iters
数,1 epoch = total images // batch size iters
before_run
,在这里记录了初始学习率,通过获取配置文件optimizer_cfg
中的lr
定义初始学习率before_train_epoch
,在每个epoch更新前获取新lr并在optimizer中更新学习率。若by_epoch=False
,大于等于warmup_iters使用常规lr更新方式,小于则用warmup方式更新lr
get_regular_lr
,调用配置文件中type
指定的学习率更新方式,如StepLrUpdater
与CosineAnnealingLrUpdater
,通过get_lr
获取当前epoch/iter
下的学习率_set_lr
给优化器设置最新的学习率before_train_iter
,在每个iter更新前获取新lr并在optimizer中更新学习率。首先判断是否按epoch更新lr
by_epoch = False
,即按iter
更新,大于等于warmup_iters使用正常lr更新方式,小于则用warmup方式更新lrby_epoch = True
,按epoch
更新lr,若当前迭代步数大于warmup迭代步数阈值,直接返回,使用before_train_epoch
中的lr,等于则用常规方式,小于则用warmup方式get_warmup_lr
获取warmup学习率更新策略的lr说完了核心LrUpdater
类,再说说StepLrUpdater
与CosineAnnealingLrUpdater
,我们知道主要功能是根据当前iter或epoch
计算最新学习率并返回,先看看StepLrUpdater
class StepLrUpdater(LrUpdater):
"""Step LR scheduler with min_lr clipping.
Args:
step (int | list[int]): Step to decay the LR. If an int value is given,
regard it as the decay interval. If a list is given, decay LR at
these steps.
gamma (float, optional): Decay LR ratio. Default: 0.1.
min_lr (float, optional): Minimum LR value to keep. If LR after decay
is lower than `min_lr`, it will be clipped to this value. If None
is given, we don't perform lr clipping. Default: None.
"""
def __init__(self, step, gamma=0.1, min_lr=None, **kwargs):
self.step = step
self.gamma = gamma
self.min_lr = min_lr
super(StepLrUpdater, self).__init__(**kwargs)
def get_lr(self, runner, base_lr):
progress = runner.get('epoch') if self.by_epoch else runner.get('iter')
# calculate exponential term
if isinstance(self.step, int):
exp = progress // self.step
else:
exp = len(self.step)
for i, s in enumerate(self.step):
if progress < s:
exp = i
break
lr = base_lr * (self.gamma**exp)
if self.min_lr is not None:
# clip to a minimum value
lr = max(lr, self.min_lr)
return lr
step
:步骤衰减LR,有两种形式,一是int
,另一个是list
。举例:
step=2
,即每2个iter/epoch更新一次学习率step=[20,60,80]
,即0-20,21-60,61-80,80-X间的iter/epoch更新一次学习率gamma
:学习率衰减比率,对应公式lr = base_lr * (self.gamma**exp)
min_lr
:最小学习率,若设置了该值,则当学习率小于
该值时使用min_lr
再看看CosineAnnealingLrUpdater
class CosineAnnealingLrUpdater(LrUpdater):
def __init__(self, min_lr=None, min_lr_ratio=None, **kwargs):
assert (min_lr is None) ^ (min_lr_ratio is None)
self.min_lr = min_lr
self.min_lr_ratio = min_lr_ratio
super(CosineAnnealingLrUpdater, self).__init__(**kwargs)
def get_lr(self, runner, base_lr):
if self.by_epoch:
progress = runner.get('epoch')
max_progress = runner.get('max_epochs')
else:
progress = runner.get('iter')
max_progress = runner.get('max_iters')
if self.min_lr_ratio is not None:
target_lr = base_lr * self.min_lr_ratio
else:
target_lr = self.min_lr
return annealing_cos(base_lr, target_lr, progress / max_progress)
min_lr
:最小学习率
min_lr_ratio
:最小学习率比率,min_lr_ratio
与min_lr
不能同时存在,因为这两个参数仅是用于计算最终学习率,当设置了min_lr
,则最终学习率为min_lr
;若设置了min_lr_ratio
,则最终学习率为base_lr * min_lr_ratio
annealing_cos
,学习率更新计算方法,具体如下
def annealing_cos(start, end, factor, weight=1):
"""Calculate annealing cos learning rate.
Cosine anneal from `weight * start + (1 - weight) * end` to `end` as
percentage goes from 0.0 to 1.0.
Args:
start (float): The starting learning rate of the cosine annealing.
end (float): The ending learing rate of the cosine annealing.
factor (float): The coefficient of `pi` when calculating the current
percentage. Range from 0.0 to 1.0.
weight (float, optional): The combination factor of `start` and `end`
when calculating the actual starting learning rate. Default to 1.
"""
cos_out = cos(pi * factor) + 1
return end + 0.5 * weight * (start - end) * cos_out
至此学习率更新介绍完毕,最后说下优化器是如何调用,对应配置文件中的optimizer_cfg
,在上一期讲网络如何被搭建时说了type
是我们所需的调用关键字,此处type与torch.optim
中的优化器一一对应,也就是不需要额外重写优化器类。
在tools/train.py
中可看到import torch.optim as optim
,对应构建optimizer的语句为
optimizer = eval('optim.' + optimizer_cfg.pop('type'))(params=model.parameters(),**optimizer_cfg)
还是借助eval
,所以如果有修改优化器的需要,大家可以查阅pytorch手册自行修改。
以上就是本次学习率更新策略/优化器的全部内容,在后续可视化
教程中会绘制学习率曲线