tensorflow 里面提供了几种学习率下降的策略,有指数下降法、分段固定值下降法、多项式下降法、自然指数下降法、随时间/步数下降法、余弦下降法、余弦重启式下降法、线性余弦下降法、带噪声的线性余弦下降法,下面各个介绍:
1.指数下降法自然指数下降法:
它们的定义的参数形式都是:
def exponential_decay(learning_rate,
global_step,
decay_steps, #decay_steps: How often to apply decay.,比如多少步下降一次
decay_rate,
staircase=False, # 是否应用于离散情况。
name=None):
不同的地方在于衰减后的学习率decayed_rate 的计算方法不一样。
指数下降法的计算方式是:
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
自然下降法的计算方式是:
decayed_learning_rate = learning_rate * exp(-decay_rate * global_step)
2.分段固定值下降法
这种方法类似于分段函数式下降,比如在前5000步数内学习率为0.1,5000到10000步内是0.05,之后任意多步都是0.01,参数定义如下:
def piecewise_constant(x, boundaries, values, name=None):
"""Piecewise constant from boundaries and interval values.
Example: use a learning rate that's 1.0 for the first 100001 steps, 0.5
for the next 10000 steps, and 0.1 for any additional steps.
```python
global_step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
# Later, whenever we perform an optimization step, we increment global_step.
```
3.多项式下降法:
def polynomial_decay(learning_rate,
global_step,
decay_steps,
end_learning_rate=0.0001,
power=1.0,
cycle=False,
name=None):
```python
global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) *
(1 - global_step / decay_steps) ^ (power) +
end_learning_rate
```
4.随时间步下降法:
def inverse_time_decay(learning_rate,
global_step,
decay_steps,
decay_rate,
staircase=False,
name=None):
```python
decayed_learning_rate = learning_rate / (1 + decay_rate * global_step /
decay_step)
```
5.余弦下降:
def cosine_decay(learning_rate, global_step, decay_steps, alpha=0.0, name=None):
```python
global_step = min(global_step, decay_steps)
cosine_decay = 0.5 * (1 + cos(pi * global_step / decay_steps))
decayed = (1 - alpha) * cosine_decay + alpha
decayed_learning_rate = learning_rate * decayed
```
'''
alpha: A scalar `float32` or `float64` Tensor or a Python number.
Minimum learning rate value as a fraction of learning_rate.
'''