Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax

学习 Tensorflow实践

损失函数(loss)

Table of Contents

损失函数(loss)

一、激活函数 activation function

二、NN复杂度:多用NN层数和N参数的个数表示

NN优化目标:loss最小

交叉熵ce

softmax函数

学习率learning_rate:每次参数更新的幅度

指数衰减学习率


一、激活函数 activation function

引入激活函数,可以有效避免XW的纯线性组合,提高模型的表达力,使模型更有区分度

1、relu激活函数 用tf.nn.relu()表示

Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax_第1张图片

2、sigmoid激活函数 用tf.nn.sigmoid()表示

Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax_第2张图片

3、tanh激活函数 用tf.nn.tamh()表示

Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax_第3张图片

二、NN复杂度:多用NN层数和N参数的个数表示

计算神经网络层数时只计算有计算能力的层,所以不计算输入层

Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax_第4张图片

层数=隐藏层的层数+1个输出层

总参数=总W+总b

上图3*4+4  +  4*2+2=26

NN优化目标:loss最小

主流loss计算:

  • MSE(Mean Squared Error)均方误差
  • CE(Cross Entropy)交叉熵
  • 自定义

——————————————————————————————————————————

拟造数据集 X,Y_     X中有x1,x2    y_=x1+x2 噪声-0.05~0.05 拟合可以预测y的函数

import tensorflow as tf
import numpy as np
BATCH_SIZE = 8 #每次喂入神经网络的特征数量
seed=23455

rdm= np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X]

定义神经网络输入输出及前向传播过程 

x = tf.placeholder(tf.float32,shape=(None,2))
y_= tf.placeholder(tf.float32,shape=(None,1))

w1 = tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)

loss_mse = tf.reduce_mean(tf.square(y_-y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

生成会话 训练STEPS轮

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    STEPS =20000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end = (i*BATCH_SIZE)% 32 +BATCH_SIZE
        sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})
        if (i%431) == 0 :
            print(start,end)
            print("after",i)
            print(sess.run(w1))
    print("final",sess.run(w1))

——————————————————————————————————————

交叉熵ce

表征两个概率分布之间的距离

H(y',y)=-\sum y'*logy  

eg:已知答案y_=(1,0) 预测y1=(0.6,0.4) y2=(0.8,0.2)哪个更接近标准答案

ce = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-12,1.0)))
#y小于1e-12为1e-12 大于1.0为1.0

softmax函数

当n分类的n个输出 通过softmax 函数,便满足了概率分布的要求 使得每一个元素的范围都在(0,1)之间,并且所有元素的和为1

\forall x P(X=x)\subseteq [0,1] and \sum P(X=x)=1

可以看这张图片来差不多理解softmax函数的作用

Tensorflow 神经网络优化 关于损失函数 loss learning_rate softmax_第5张图片

softmax(y_{i})=\frac{e^{y_{i}}}{\sum_{j}e^{y_{j}}}

输出经过softmax函数 使其满足概率分布后在于标准答案求交叉熵 输出为cem 即为loss

ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem = tf.reduce_mean(ce)

学习率learning_rate:每次参数更新的幅度

w_{n+1} = w_{n}-learningRate*\frac{\partial loss }{\partial w}

更新后的参数 = 当前参数-学习率*损失函数的梯度(导数)

设定loss=square(w+1) w初值为5 反向传播求最优w 即求最小loss对应的w值   w值最小应为-1

#coding:utf-8
import tensorflow as tf
w= tf.Variable(tf.constant(5,dtype=tf.float32))

loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(50):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val = sess.run(loss)
        if (i%5==0):
            print ("after ",i," w is ",w_val," loss is",loss_val)

#learning_rate=0.2
# after  0  w is  2.6  loss is 12.959999
# after  5  w is  -0.720064  loss is 0.07836417
# after  10  w is  -0.9782322  loss is 0.0004738369
# after  15  w is  -0.99830735  loss is 2.8650732e-06
# after  20  w is  -0.9998684  loss is 1.7320417e-08
# after  25  w is  -0.99998975  loss is 1.0510348e-10
# after  30  w is  -0.9999992  loss is 6.004086e-13
# after  35  w is  -0.99999994  loss is 3.5527137e-15
# after  40  w is  -0.99999994  loss is 3.5527137e-15
# after  45  w is  -0.99999994  loss is 3.5527137e-15

#learning_rate=1
# after  0  w is  -7.0  loss is 36.0
# after  5  w is  5.0  loss is 36.0
# after  10  w is  -7.0  loss is 36.0
# after  15  w is  5.0  loss is 36.0
# after  20  w is  -7.0  loss is 36.0
# after  25  w is  5.0  loss is 36.0
# after  30  w is  -7.0  loss is 36.0
# after  35  w is  5.0  loss is 36.0
# after  40  w is  -7.0  loss is 36.0
# after  45  w is  5.0  loss is 36.0

学习率过大震荡不收敛,学习率小了收敛速度慢

指数衰减学习率

learning_rate = LEARNING_RATE_BASE * LEARNING_RATE_DECAY ^ (global_step/LEARNING_RATE_STEP)

  • LEARNING_RATE_BASE:学习率基数,学习率初始值
  • LEARNING_RATE_DECAY:学习率衰减(0,1)
  • LEARNING_RATE_STEP:多少轮更新一次学习率=总样本数/BATCH_SIZE
  • global_step:运行了几轮BATCH_SIZE
import tensorflow as tf 
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 1 #喂入多少轮BATCHSIZE后更新一次学习率

trainable=False为不被训练

global_step = tf.Variable(0,trainable=False)
#定义指数下降学习率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss = tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

生成会话,训练40轮

 

你可能感兴趣的:(机器学习,Tensorflow)