精灵耶

TensorFlow笔记_神经网络优化

1.预备知识

2.神经网络复杂度

3.学习率

4.激活函数

4.1Sigmoid函数

4.2Tanh函数

4.3ReLU函数

4.4Leaky ReLU函数

5.损失函数

5.1均方差

5.2交叉熵

6.欠拟合与过拟合

7.优化器

7.1SGD

7.2SGDM

7.3Adagrad

7.4RMSProp

7.5Adam

7.6五种优化器对比

7.6.1SGD

7.6.2SGDM

7.6.3Adagrad

7.6.4RMSProp

7.6.5Adam

1.预备知识

1.条件语句真返回A，条件语句假返回B

 tf.where(条件语句，真返回A，假返回B)

2.返回一个[0,1)之间的随机数

 np.random.RandomState.rand(维度)#维度为空，返回标量

3.将两个数组按垂直方向叠加

np.vstack( 数组1，数组2)

4.np.mgrid[ ] 返回间隔数值点，可同时返回多组，[起始值，结束值)

x.ravel( ) 将x变为一维数组，“把.前变量拉直”

np.c_[ ]使返回的间隔数值点配对

np.mgrid[ 起始值: 结束值: 步长，起始值: 结束值: 步长, … ] 
x.ravel( ) 
np.c_[ 数组1，数组2，… ]

2.神经网络复杂度

NN(神经网络)复杂度：多用NN层数和NN参数的个数表示

空间复杂度： 层数 = 隐藏层层数 + 1个输出层

总参数 = 总w + 总b

时间复杂度： 乘加运算次数

3.学习率

指数衰减学习率是先使用较大的学习率来快速得到一个较优的解，然后随着迭代的继续,逐步减小学习率，使得模型在训练后期更加稳定。

指数衰减学习率 = 初始学习率 * 学习率衰减率^(当前轮数 / 多少轮衰减一次)

w = tf.Variable(tf.constant(5, dtype=tf.float32))

epoch = 40
LR_BASE = 0.2  # 最初学习率
LR_DECAY = 0.99  # 学习率衰减率
LR_STEP = 1  # 喂入多少轮BATCH_SIZE后，更新一次学习率

for epoch in range(epoch):  # for epoch 定义顶层循环，表示对数据集循环epoch次，此例数据集数据仅有1个w,初始化时候constant赋值为5，循环100次迭代。
    lr = LR_BASE * LR_DECAY ** (epoch / LR_STEP)
    with tf.GradientTape() as tape:  # with结构到grads框起了梯度的计算过程。
        loss = tf.square(w + 1)
    grads = tape.gradient(loss, w)  # .gradient函数告知谁对谁求导

    w.assign_sub(lr * grads)  # .assign_sub 对变量做自减 即 w = w - lr*grads
    print("After %s epoch,w is %f,loss is %f,lr is %f" % (epoch, w.numpy(), loss, lr))

4.激活函数

优秀的激活函数：

1）非线性：激活函数非线性时，多层神经网络可逼近所有函数

2）可微性：优化器大多用梯度下降更新参数

3）单调性：当激活函数是单调的，能保证单层网络的损失函数是凸函数

4）近似恒等性：f(x)≈x当参数初始化为随即小值时，神经网络更稳定

激活函数输出值范围：

1）激活函数输出为有限值时，基于梯度的优化方法更稳定

2）激活函数输出为无限值时，建议调小学习率

4.1Sigmoid函数

特点：

1）易造成梯度消失

2）输出非0均值，收敛慢

3）幂运算复杂，训练时间长

4.2Tanh函数

特点：

1）输出是0均值

2）易造成梯度消失

3）幂运算复杂，训练时间长

4.3ReLU函数

优点：

1）解决了梯度消失问题(在正区间)

2）只需判断输入是否大于0，计算速度快

3）收敛速度远快于sigmoid和tanh

缺点：

1）输出非0均值，收敛慢

2）Dead ReLU问题：某些神经元可能永远不会被激活，导致相应的参数永远不能被更新。

4.4Leaky ReLU函数

为了解决ReLU负数时神经元死亡问题，引入Leaky ReLU函数，理论上来讲，Leaky Relu有Relu的所有优点，外加不会有Dead Relu问题，但是在实际操作当中，并没有完全证明Leaky Relu总是好于Relu。

对于初学者的建议：

1）首选relu激活函数；

2）学习率设置较小值；

3）输入特征标准化，即让输入特征满足以0为均值，1为标准差的正态分布；

4）初始参数中心化，即让随机生成的参数满足以0为均值，（根号下（2/当前层输入特征个数））为标准差的正态分布。

5.损失函数

预测值和已知答案的差距。主流方法有：均方差、自定义、交叉熵

5.1均方差

loss_mse = tf.reduce_mean(tf.square(y_ -y))

5.2交叉熵

交叉熵表征两个概率分布之间的距离

tf.losses.categorical_crossentropy(y_，y)

softmax与交叉熵结合：输出先过softmax函数，再计算y与y_的交叉熵损失函数

tf.nn.softmax_cross_entropy_with_logits(y_，y)

6.欠拟合与过拟合

欠拟合是模型不能有效拟合数据集，对现有数据集学习的不彻底。

过拟合是模型对现有数据集拟合的太好，对于新数据难以作出判断，模型缺乏泛化力。

欠拟合的解决方法：

1）增加输入特征项

2）增加网络参数

3）减少正则化参数

过拟合的解决方法：

1）数据清洗

2）增大数据集

3）采用正则化

4）增大正则化参数

正则化缓解过拟合

正则化在损失函数中引入模型复杂度指标，利用给w加权值，弱化了训练数据的噪声（一般不正则化b）

正则化的选择

L1正则化大概率会使很多参数变为零，因此该方法可通过稀疏参数，即减少参数的数量，降低复杂度。

L2正则化会使参数很接近零但不为零，因此该方法可通过减小参数的大小降低复杂度。

L2正则化示例：预测0.380472 -0.21714对应的标签

未加入L2正则化：

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = np.vstack(x_data).reshape(-1, 2)
y_train = np.vstack(y_data).reshape(-1, 1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为2个神经元，隐藏层为11个神经元，1层隐藏层，输出层为1个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.005  # 学习率
epoch = 800  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss = tf.reduce_mean(tf.square(y_train - y))

        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad tape.gradient是自动求导结果与[w1, b1, w2, b2] 索引为0，1，2，3 
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每20个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
# 将xx , yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_test in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_test], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c))  # squeeze去掉纬度是1的纬度,相当于去掉[['red'],[''blue]],内层括号变为['red','blue']
# 把坐标xx yy和对应的值probs放入contour函数，给probs值为0.5的所有点上色  plt.show()后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

# 读入红蓝点，画出分割线，不包含正则化

加入L2正则化：

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = x_data
y_train = y_data.reshape(-1, 1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为4个神经元，隐藏层为32个神经元，2层隐藏层，输出层为3个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.005  # 学习率为
epoch = 800  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss_mse = tf.reduce_mean(tf.square(y_train - y))
            # 添加l2正则化
            loss_regularization = []
            # tf.nn.l2_loss(w)=sum(w ** 2) / 2
            loss_regularization.append(tf.nn.l2_loss(w1))
            loss_regularization.append(tf.nn.l2_loss(w2))
            # 求和
            # 例：x=tf.constant(([1,1,1],[1,1,1]))
            #   tf.reduce_sum(x)
            # >>>6
            loss_regularization = tf.reduce_sum(loss_regularization)
            loss = loss_mse + 0.03 * loss_regularization  # REGULARIZER = 0.03

        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每200个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
# 将xx, yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_predict in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_predict], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c))
# 把坐标xx yy和对应的值probs放入contour函数，给probs值为0.5的所有点上色  plt.show()后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

# 读入红蓝点，画出分割线，包含正则化