使用bert时,希望最后面的几层学习率大,前面的层学习率小
只需一句话,使最后一层学习率为(1e-3),向前每层学习率乘以0.8
lr_layerwise=list({'params':p,'lr':(1e-3)*0.8**i} for i,(n,p) in enumerate(list(reversed(list(model.named_parameters())))))
optimizer = optim.Adam(lr_layerwise)
查看结果(把’params’:p 改成 ‘params’:n)
lr_layerwise=list({'params':n,'lr':(1e-3)*0.8**i} for i,(n,p) in enumerate(list(reversed(list(model.named_parameters())))))
print(lr_layerwise)
'''
[{'params': 'classifier.bias', 'lr': 0.001}, {'params': 'classifier.weight', 'lr': 0.0008}, {'params': 'bert.encoder.layer.1.output.LayerNorm.bias', 'lr': 0.0006400000000000002}, {'params': 'bert.encoder.layer.1.output.LayerNorm.weight', 'lr': 0.0005120000000000001}, {'params': 'bert.encoder.layer.1.output.dense.bias', 'lr': 0.0004096000000000001}, {'params': 'bert.encoder.layer.1.output.dense.weight', 'lr': 0.0003276800000000001}, {'params': 'bert.encoder.layer.1.intermediate.dense.bias', 'lr': 0.0002621440000000001}, {'params': 'bert.encoder.layer.1.intermediate.dense.weight', 'lr': 0.0002097152000000001}, {'params': 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'lr': 0.0001677721600000001}, {'params': 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'lr': 0.00013421772800000008}, {'params': 'bert.encoder.layer.1.attention.output.dense.bias', 'lr': 0.00010737418240000006}, {'params': 'bert.encoder.layer.1.attention.output.dense.weight', 'lr': 8.589934592000005e-05}, {'params': 'bert.encoder.layer.1.attention.self.value.bias', 'lr': 6.871947673600005e-05}, {'params': 'bert.encoder.layer.1.attention.self.value.weight', 'lr': 5.497558138880004e-05}, {'params': 'bert.encoder.layer.1.attention.self.key.bias', 'lr': 4.398046511104004e-05}, {'params': 'bert.encoder.layer.1.attention.self.key.weight', 'lr': 3.518437208883203e-05}, {'params': 'bert.encoder.layer.1.attention.self.query.bias', 'lr': 2.8147497671065623e-05}, {'params': 'bert.encoder.layer.1.attention.self.query.weight', 'lr': 2.2517998136852502e-05}, {'params': 'bert.encoder.layer.0.output.LayerNorm.bias', 'lr': 1.8014398509482003e-05}, {'params': 'bert.encoder.layer.0.output.LayerNorm.weight', 'lr': 1.4411518807585603e-05}, {'params': 'bert.encoder.layer.0.output.dense.bias', 'lr': 1.1529215046068483e-05}, {'params': 'bert.encoder.layer.0.output.dense.weight', 'lr': 9.223372036854787e-06}, {'params': 'bert.encoder.layer.0.intermediate.dense.bias', 'lr': 7.37869762948383e-06}, {'params': 'bert.encoder.layer.0.intermediate.dense.weight', 'lr': 5.902958103587064e-06}, {'params': 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'lr': 4.722366482869652e-06}, {'params': 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'lr': 3.7778931862957216e-06}, {'params': 'bert.encoder.layer.0.attention.output.dense.bias', 'lr': 3.0223145490365774e-06}, {'params': 'bert.encoder.layer.0.attention.output.dense.weight', 'lr': 2.417851639229262e-06}, {'params': 'bert.encoder.layer.0.attention.self.value.bias', 'lr': 1.93428131138341e-06}, {'params': 'bert.encoder.layer.0.attention.self.value.weight', 'lr': 1.547425049106728e-06}, {'params': 'bert.encoder.layer.0.attention.self.key.bias', 'lr': 1.2379400392853823e-06}, {'params': 'bert.encoder.layer.0.attention.self.key.weight', 'lr': 9.903520314283058e-07}, {'params': 'bert.encoder.layer.0.attention.self.query.bias', 'lr': 7.922816251426449e-07}, {'params': 'bert.encoder.layer.0.attention.self.query.weight', 'lr': 6.338253001141158e-07}, {'params': 'bert.embeddings.LayerNorm.bias', 'lr': 5.070602400912927e-07}, {'params': 'bert.embeddings.LayerNorm.weight', 'lr': 4.056481920730342e-07}, {'params': 'bert.embeddings.token_type_embeddings.weight', 'lr': 3.2451855365842734e-07}, {'params': 'bert.embeddings.position_embeddings.weight', 'lr': 2.5961484292674195e-07}, {'params': 'bert.embeddings.word_embeddings.weight', 'lr': 2.0769187434139353e-07}]
'''