迁移学习 (Finetune) 中我们经常需要固定pretrained层的学习率,或者把其学习率设置比后面的网络小,这就需要我们对不同的层设置不同的学习率,这里总结一下实现设置每层学习率的方法。
net.collect_params()
将返回一个ParamterDict
类型的变量,其中包含了网络中所有参数。其函数原型如下:
def collect_params(self,select=None)
model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')
model.collect_params('.*weight|.*bias')
其中select
参数可以为一个正则表达式,从而collect_params
只会选择被该正则表达式匹配上的参数。我们首先把需要单独设置学习率的参数都用正则表达式匹配出来。比如说下面的ResNet50
,其所有参数如下(中间省略了一些层):
print(net.collect_params())
resnet50v1 (
Parameter resnet50v1batchnorm0_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm0_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1conv0_weight (shape=(64, 0, 5, 5), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm0_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv0_weight (shape=(64, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm1_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv1_weight (shape=(64, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm2_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv2_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1conv1_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1batchnorm1_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm3_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv3_weight (shape=(64, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm4_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv4_weight (shape=(64, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm5_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_conv5_weight (shape=(256, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer1_batchnorm6_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
.....
.....
.....
.....
.....
Parameter resnet50v1layer4_batchnorm7_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_conv7_weight (shape=(512, 0, 3, 3), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_gamma (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_beta (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_running_mean (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_batchnorm8_running_var (shape=(0,), dtype=<class 'numpy.float32'>)
Parameter resnet50v1layer4_conv8_weight (shape=(2048, 0, 1, 1), dtype=<class 'numpy.float32'>)
Parameter resnet50v1dense0_weight (shape=(10, 2048), dtype=float32)
Parameter resnet50v1dense0_bias (shape=(10,), dtype=float32)
)
假设我们想加大最后全连接层的学习率,那么我们可以通过正则表达式将其参数选出来,net.collect_params('.*dense')
,结果如下:
print(net.collect_params('.*dense'))
resnet50v1 (
Parameter resnet50v1dense0_weight (shape=(10, 2048), dtype=float32)
Parameter resnet50v1dense0_bias (shape=(10,), dtype=float32)
)
选完了我们需要设置的参数后,最后只要设置其lr_mult
属性就行,该层的学习率为lr*lr_mult
,当lr_mult=0
时,那么该层参数不会更新。
trainter = mx.gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':0.1})
net.collect_params('.*dense').setattr('lr_mult',0.1)
在net.collect_params()
中,我们通过使用正则表达式匹配出需要单独设置的参数。最后再通过setattr()
方法设置其学习率因子lr_mult
,从而实现设置该层的学习率。所以设计网络时,我们可以通过为每层加上特定的prefix_name从而能让我们方便地用正则表达式匹配出每一层的参数。同理我们也可以通过这种方式来对不同层的参数进行单独初始化。