正如Deeplab-v2所做的一样,每经过一定的iteration对learning rate进行调整,无论是使用step policy还是poly policy,都需要由当前迭代次数来决定。翻了各种文档和issue,总结出在mxnet中,目前有两种方式可以实现。
参考:
https://github.com/dmlc/mxnet/issues/540
https://github.com/dmlc/mxnet/issues/625
- 1 使用mxnet/python/mxnet/lr_scheduler下提供的方法:
具体可以参考example/image_classification下的fit.py中,在mx.model.Feedforward中给出lr_scheduler参数。关于如何实现lr_scheduler的例子:
def _get_lr_scheduler(args, kv):
if 'lr_factor' not in args or args.lr_factor >= 1:
return (args.lr, None)
epoch_size = args.num_examples / args.batch_size
if 'dist' in args.kv_store:
epoch_size /= kv.num_workers
begin_epoch = args.load_epoch if args.load_epoch else 0
step_epochs = [int(l) for l in args.lr_step_epochs.split(',')]
lr = args.lr
for s in step_epochs:
if begin_epoch >= s:
lr *= args.lr_factor
if lr != args.lr:
logging.info('Adjust learning rate to %e for epoch %d' %(lr, begin_epoch))
steps = [epoch_size * (x-begin_epoch) for x in step_epochs if x-begin_epoch > 0]
return (lr, mx.lr_scheduler.MultiFactorScheduler(step=steps, factor=args.lr_factor))
如何调用:
lr, lr_scheduler = _get_lr_scheduler(args, kv)
# create model
model = mx.model.FeedForward(
......
learning_rate = lr, lr_scheduler = lr_scheduler,
...... )
或者在lenet中:
model = mx.model.FeedForward(ctx=dev, symbol=lenet, num_epoch=20,
learning_rate=0.05, momentum=0.9, wd=0.00001,
lr_scheduler=mx.misc.FactorScheduler(step=5))
但是由于我使用的不是feedforward方法,而是bind和fit,且feedforward对应的是single data-single output的结构,所以这种方法并不适用于我。
-
2 直接在fit函数中自己写一个epoch_end_callback:
# my code
def lr_callback(epoch, net, net_args, net_auxs):
opt.lr = args.base_lr * (1 - float(epoch)/num_epoch)**0.9
mod.fit(train, val,
batch_end_callback = mx.callback.Speedometer(batch_size, 30)
epoch_end_callback = [checkpoint, lr_callback],
optimizer = opt, num_epoch = num_epoch)
- 3 在optimizer中给出lr_scheduler,我的实现如下:
opt = mx.optimizer.create('sgd', learning_rate=args.base_lr,
momentum=0.9, wd=0.0005,
rescale_grad=float(1)/len(contexts),
lr_scheduler=mx.lr_scheduler.MultiFactorScheduler([10,20,30], 0.9))
# the first parameter of scheduler is the list of all those changing point
在step那里需要用np.linspace()的话记得要转成list(),函数不接受ndarray的类型。如果需要更灵活的话,可以自己在lr_scale.py写一个满足需求的scheduler。