背景:训练的过程之中,需要对learning rate设置decay。我们现在设置learning rate,然后将它进行衰减,从而获得更好的效果。
目录
一、标准做法
二、learning rate用于训练
2.1 我们的程序的嵌套
2.2 adjust learning rate
2.3 我们的方案
很容易看懂,直接用lr_scheduler.StepLR(optimizer_ft, step_size=5, gamma=0.2)函数进行learning rate的decay
# define cost function
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.005, momentum=0.9)
# Decay LR by a factor of 0.2 every 5 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=5, gamma=0.2)
# multi-GPU
model_ft = torch.nn.DataParallel(model_ft, device_ids=[0])
# train model
print("start train_model......")
model_ft = train_model(model=model_ft,
criterion=criterion,
optimizer=optimizer_ft,
scheduler=exp_lr_scheduler,
num_epochs=15,
use_gpu=use_gpu)
在general_train.py之中
engine = GCNMultiLabelMAPEngine(state)
engine.learning(model, criterion, train_dataset, val_dataset, optimizer)
engine.py之中,learning的class
for epoch in range(self.state['start_epoch'], self.state['max_epochs']):
self.state['epoch'] = epoch
# lr = self.adjust_learning_rate(optimizer) #fixme
# print('lr:{:.5f}'.format(lr)) # fixme
self.adjust_learning_rate(optimizer) # fixme: not return lr for printing
# train for one epoch
self.train(train_loader, model, criterion, optimizer, epoch)
# evaluate on validation set
prec1 = self.validate(val_loader, model, criterion)
learning rate应该是在optimzer之中定义的,用adjust_learning_rate函数直接对optimizer进行操作,然后对lr进行decay,训练时候直接调用optimizer函数即可。
原函数中,adjust learning rate中,
def adjust_learning_rate(self, optimizer):
"""Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
# lr = args.lr * (0.1 ** (epoch // 30))
# decay = 0.1 ** (sum(self.state['epoch'] >= np.array(self.state['epoch_step'])))
# decay = 0.1 ** (self.state['epoch'] // self.state['epoch_step'])
# lr = self.state['lr'] * decay # fixme
for i, param_group in enumerate(optimizer.param_groups):
if i == 0:
# print(param_group.)
decay = 0.1 ** (self.state['epoch'] // self.state['epoch_step'])
# print(self.state['lrp'])
param_group['lr'] = decay * self.state['lrp'] # fixme
# param_group['lr'] = lr
print('backbone learning rate', param_group['lr'])
if i == 1:
decay = 0.1 ** (self.state['epoch'] // self.state['epoch_step'])
# print(self.state['lr'])
param_group['lr'] = decay * self.state['lr'] # fixme
# param_group['lr'] = lr
print('head learning rate', param_group['lr'])
# return lr
即,由当前epoch得到学习率。//表示除以之后向下取整。
因为是从第43次加载模型的,所以运行为0.001 * 0.9^epoch
for i, param_group in enumerate(optimizer.param_groups):
if i == 0:
# print(param_group.)
up=self.state['epoch']-43
decay = 0.9 ** up
# print(self.state['lrp'])
param_group['lr'] = decay * 0.001 # fixme
# param_group['lr'] = lr
print('backbone learning rate', param_group['lr'])
if i == 1:
up = self.state['epoch'] - 43
decay = 0.9 ** up
# print(self.state['lr'])
param_group['lr'] = decay * 0.001 # fixme
# param_group['lr'] = lr
print('head learning rate', param_group['lr'])