常用的激活函数为ReLu、Sigmod、tanh等
一般为了防止梯度消失(vanishing gradient)和梯度爆炸(gradient explording)常选用ReLu(注:CS231N也指出过常用的激活函数为ReLu)
tf.nn.relu()
tf.nn.sigmoid()
tf.nn.tanh()
例如:
预测酸奶日销量 y, x1 和 x2 是影响日销量的两个因素。
即:用神经网络拟合y_ = X1 + X2的数据。为了更真实, 加入了正负0.05的随机噪声。
import tensorflow as tf
import numpy as np
BATCH_SIZE = 8
seed = 23455
rdm = np.random.RandomState(seed)
X = rdm.rand(32, 2)
Y_ = [[X1 + X2 + (rdm.rand() / 10.0 - 0.05)] for (X1, X2) in X]
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
w1 = tf.Variable(tf.random_normal([2, 1], stddev=1, seed=1))
y = tf.matmul(x, w1)
loss_mse = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 20000
for i in range(STEPS):
start = (i * BATCH_SIZE) % 32
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
if i % 500 == 0:
print("After %d training step(s) " % i)
print("w1 is ", sess.run(w1))
print("Final w1 is :\n", sess.run(w1))
tip:可以尝试改变学习率的值,观察参数变化和收敛情况
Final w1 is :
[[0.98019385]
[1.0159807 ]]
* 预测结果 * : y = 0.98*X1 + 1.02*X2 基本符合
即:y_ > y? PROFIT*(y_ - y) : COST*(y - y_)
loss = tf.reduce_sum(tf.where(tf.greater(y_, y), PROFIT(y_-y), COST(y-y_)))
交叉熵(Cross Entropy):表示两个概率分布之间的距离。交叉熵越大,两个概率分布距离越远, 两个概率分布越相异; 交叉熵越小,两个概率分布距离越近,两个概率分布越相似。
用tensorflow表示
ce= -tf.reduce_mean(y_* tf.log(tf.clip_by_value(y, 1e-12, 1.0)))
softmax 函数: 将 n 分类的 n 个输出(y1,y2…yn) 变为满足以下概率分布要求的函数。
在 Tensorflow 中,一般让模型的输出经过 softmax 函数, 以获得输出分类的概率分布,再与标准答案对比, 求出交叉熵, 得到损失函数,用如下函数实现:
ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cem = tf.reduce_mean(ce)
学习率(Learning Rate):表示每次参数更新的幅度大小。
在训练过程中,参数的更新向着损失函数梯度下降的方向。
参数更新公式:Wn+1 = Wn - learning_rate▽
例如: 假设loss = (w + 1)²,寻找最优参
import tensorflow as tf
import numpy as np
w = tf.Variable(tf.constant(5, dtype=tf.float32))
loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(.2).minimize(loss)#learning_rate为0.2
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 40
for i in range(STEPS):
sess.run(train_step)
w_val = sess.run(w)
loss_val = sess.run(loss)
print("After %d step(s): w is %f, loss is %f" % (i, w_val, loss_val))
tip:可以改变学习率观察收敛结果和收敛速度
指数衰减学习率:学习率随着训练轮数而动态变化
用 Tensorflow 的函数表示为:
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP, LEARNING_RATE_DECAY,
staircase=True/False)
staircase 设置为 True 时,表示 global_step/learning rate step 取整数,学习
率阶梯型衰减;若 staircase 设置为 false 时,学习率会是一条平滑下降的曲线。
例如:
import tensorflow as tf
LEARNING_RATE_BASE = .1 # 初始学习率
LEARNING_RATE_DECAY = .99 # 衰减学习率
LEARNING_RATE_STEP = 1 # 喂入多少轮BATCH后,更新一次学习率,一般设为:sum/BATCH_SIZE
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, \
global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True)
w = tf.Variable(tf.constant(5, dtype=tf.float32))
loss = tf.square(w + 1)
train_step = tf.train.GradientDescentOptimizer(learning_rate). \
minimize(loss, global_step=global_step)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 40
for i in range(STEPS):
sess.run(train_step)
learning_rate_val = sess.run(learning_rate)
global_step_val = sess.run(global_step)
w_val = sess.run(w)
loss_val = sess.run(loss)
print("After %s steps: global_step is %f, w id %f, learning rate is %f, loss is %f" % (i, global_step_val, w_val, learning_rate_val, loss_val))
滑动平均: 记录了一段时间内模型中所有参数 w 和 b 各自的平均值。利用滑动平均值可以增强模
型的泛化能力。
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
其中, MOVING_AVERAGE_DECAY 表示滑动平均衰减率,一般会赋接近 1 的值, global_step 表示当前训练了多少轮。
ema_op = ema.apply(tf.trainable_variables())
其中, ema.apply()函数实现对括号内参数求滑动平均, tf.trainable_variables()函数实现把所有
待训练参数汇总为列表。
with tf.control_dependencies([train_step, ema_op]):
train_op = tf.no_op(name='train')
其中,该函数实现将滑动平均和训练过程同步运行。
查看模型中参数的平均值,可以用 ema.average()函数。
import tensorflow as tf
w1 = tf.Variable(0, dtype=tf.float32)
global_step = tf.Variable(0, dtype = tf.float32)
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
#ema.apply后对应的是更新后的列表,每次更新sess.run(ema_op)时,对更新列表中的元素求滑动平均值
#在实际应用中会使用tf.trainable_variables()自动将所有训练参数汇总为列表
#ema.apply([w1])
ema_op = ema.apply(tf.trainable_variables())
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
print(sess.run([w1, ema.average(w1)]))
sess.run(tf.assign(global_step, 100))
sess.run(tf.assign(w1, 10))
sess.run(ema_op)
print(sess.run([w1, ema.average(w1)]))
sess.run(ema_op)
print(sess.run([w1, ema.average(w1)]))
sess.run(ema_op)
print(sess.run([w1, ema.average(w1)]))
sess.run(ema_op)
print(sess.run([w1, ema.average(w1)]))
sess.run(ema_op)
print(sess.run([w1, ema.average(w1)]))
正则化: 在损失函数中给每个参数 w 加上权重,引入模型复杂度指标,从而抑制模型噪声, 减小
过拟合。
用 Tesnsorflow 函数表示:loss(w) = tf.contrib.layers.l1_regularizer(REGULARIZER)(w)
用 Tesnsorflow 函数表示:loss(w) = tf.contrib.layers.l2_regularizer(REGULARIZER)(w)
使用正则化之后,损失函数Loss变成两项之和:
loss = loss(y 与 y_) + REGULARIZER*loss(w)
用 Tesnsorflow 函数实现正则化:
tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w)
loss = cem + tf.add_n(tf.get_collection('losses'))
```
```python
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
LEARNING_RATE_BASE = .001
LEARNING_RATE_DECAY = .999
BATCH_SIZE = 30
seed = 2
rdm = np.random.RandomState(seed)
X = rdm.randn(300, 2)
Y_ = [int((x0 * x0 + x1 * x1) < 2) for (x0, x1) in X]
Y_c = [["red" if y else "blue"] for y in Y_]
X = np.vstack(X).reshape(-1, 2)
Y_ = np.vstack(Y_).reshape(-1, 1)
plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
plt.savefig('original.png')
plt.show()
def get_weight(shape, regularizer):
w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
return w
def get_bias(shape):
b = tf.Variable(tf.constant(.01, shape=shape))
return b
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
w1 = get_weight([2, 11], .01)
b1 = get_bias([11])
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
w2 = get_weight([11, 1], .01)
b2 = get_bias([1])
y = tf.matmul(y1, w2) + b2
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
300/BATCH_SIZE,
LEARNING_RATE_DECAY,
staircase=True
)
loss_mse = tf.reduce_mean(tf.square(y - y_))
loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss_mse)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 40000
for i in range(STEPS):
start = (i * BATCH_SIZE) % 300
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x: X[start: end], y_: Y_[start: end]})
if i % 2000 == 0:
loss_mse_val = sess.run(loss_mse, feed_dict={x: X, y_: Y_})
print("After %d steps, loss is %f" % (i, loss_mse_val))
xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
grid = np.c_[xx.ravel(), yy.ravel()]
probs = sess.run(y, feed_dict={x: grid})
probs = probs.reshape(xx.shape)
print("w1:\n", sess.run(w1))
print("w1:\n", sess.run(b1))
print("w1:\n", sess.run(w2))
print("w1:\n", sess.run(b2))
plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.savefig('result.png')
plt.show()