深度学习trick集合

调参技巧

数据增强

预处理

1️⃣zero-center

[9]将数据中心化

初始化

1️⃣Xavier initialization[7]方法

适用[9]于普通激活函数(tanh,sigmoid):scale = np.sqrt(3/n)

2️⃣He initialization[8]方法

适用[9]于ReLU:scale = np.sqrt(6/n)

3️⃣Batch normalization[10]

4️⃣RNN/LSTM init hidden state

Hinton[3]提到将RNN/LSTM的初始hidden state设置为可学习的weight

训练技巧

1️⃣Gradient Clipping[5,6]

2️⃣learning rate

原则:当validation loss开始上升时,减少学习率。
[1]Time/Drop-based/Cyclical Learning Rate

3️⃣batch size

[2]中详细论述了增加batch size而不是减小learning rate能够提升模型表现。保持学习率不变,提高batch size,直到batch size~训练集/10,接下来再采用学习率下降的策略。

Reference

[1]How to make your model happy again — part 1

[2]Don’t Decay the Learning Rate, Increase the Batch Size

[3]CSC2535 2013: Advanced Machine Learning Lecture 10 Recurrent neural networks

[4]https://zhuanlan.zhihu.com/p/25110150

[5]On the difficulty of training Recurrent Neural Networks

[6]Language Modeling with Gated Convolutional Networks

[7]Understanding the difficulty of training deep feedforward neural networks

[8]Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[9]知乎:你有哪些deep learning(rnn、cnn)调参的经验?

[10]Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

你可能感兴趣的:(深度学习)