tensorflow 2.0 神经网络与全连接层 之 损失函数

5.5 损失函数

  • 常用损失函数
  • MSE
  • Cross Entropy
    • Entropy
    • cross entropy
    • 多分类
      • 函数API
      • 类API
    • 二分类
  • Why not MSE?
  • 分类问题网络流程 与 数值的稳定性

常用损失函数

  1. MSE
  2. 交叉熵损失
  3. Hinge Loss (支持向量机)
    • ∑ i m a x ( 0 , 1 − y i ∗ h θ ( x i ) ) \sum_i max(0, 1-y_i*h_\theta(x_i)) imax(0,1yihθ(xi))
      tensorflow 2.0 神经网络与全连接层 之 损失函数_第1张图片

MSE

  1. l o s s = 1 N ∑ ( y − o u t ) 2 loss = \frac{1}{N}\sum (y-out)^2 loss=N1(yout)2 这里 N = B ∗ N u m O f C l a s s N = B * NumOfClass N=BNumOfClass
  2. L 2 − n o r m = ∑ ( y − o u t ) 2 L_{2-norm} = \sqrt{\sum(y-out)^2} L2norm=(yout)2
y = tf.constant([1, 2, 3, 0, 2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y, dtype=tf.float32)
out = tf.random.normal([5, 4])

loss1 = tf.reduce_mean(tf.square(y-out))   
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss2 = tf.square(tf.norm(y-out))/(5*4)
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out)) # VS MeanSquaredError is a class
# tf.Tensor(1.4226108, shape=(), dtype=float32)

Cross Entropy

Entropy

一个数据分布

  1. 不确定性
  2. 衡量惊喜度
  3. 低信息量就会有较大的信息
    E n t r o p y = − ∑ i P ( i ) log ⁡ P ( i ) Entropy = -\sum_iP(i)\log P(i) Entropy=iP(i)logP(i)
a = tf.fill([4], 0.25)
a*tf.math.log(a)/tf.math.log(2.)
# tf.Tensor([-0.5 -0.5 -0.5 -0.5], shape=(4,), dtype=float32)
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(2.0, shape=(), dtype=float32)

a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(1.3567796, shape=(), dtype=float32)

a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(0.24194068, shape=(), dtype=float32)

cross entropy

两个数据分布

  1. H ( p , q ) = − ∑ p ( x ) log ⁡ q ( x ) H(p,q) = -\sum p(x)\log q(x) H(p,q)=p(x)logq(x)
  2. H ( p , q ) = H ( p ) + D K L ( p ∣ q ) H(p,q) = H(p) + D_{KL}(p|q) H(p,q)=H(p)+DKL(pq)
    • 对于 p = = q p==q p==q
      • Minima: H ( p , q ) = H ( p ) H(p,q) = H(p) H(p,q)=H(p)
    • 对于 p:one-hot encoding 交叉熵退化成 KL Divergence
      • h ( p : [ 0 , 1 , 0 ] ) = − 1 l o g 1 = 0 h(p:[0,1,0])=-1log1 = 0 h(p:[0,1,0])=1log1=0
      • H ( [ 0 , 1 , 0 ] , [ q 0 , q 1 , q 2 ] ) = 0 + D K L ( p ∣ q ) = − 1 log ⁡ q 1 H([0,1,0], [q_0,q_1,q_2]) = 0 + D_{KL}(p|q)=-1\log q_1 H([0,1,0],[q0,q1,q2])=0+DKL(pq)=1logq1

对于二分类:
3. 两个输出单元 softmax
tensorflow 2.0 神经网络与全连接层 之 损失函数_第2张图片
H ( [ 0 , 1 , 0 ] , [ p 0 , p 1 , p 2 ] ) = 0 + D K L ( p ∣ q ) = − 1 log ⁡ q 1 H([0,1,0], [p_0,p_1,p_2]) = 0 + D_{KL}(p|q) = -1\log q_1 H([0,1,0],[p0,p1,p2])=0+DKL(pq)=1logq1

  1. 一个输出单元 阈值
    tensorflow 2.0 神经网络与全连接层 之 损失函数_第3张图片
    H ( P , Q ) = − P ( c a t ) log ⁡ Q ( c a t ) − ( 1 − P ( c a t ) ) log ⁡ ( 1 − Q ( c a t ) ) H(P,Q) = -P(cat) \log Q(cat)-(1-P(cat))\log (1-Q(cat)) H(P,Q)=P(cat)logQ(cat)(1P(cat))log(1Q(cat))
    P ( d o g ) = 1 − P ( c a t ) P(dog) = 1-P(cat) P(dog)=1P(cat)
    H ( P , Q ) = − ∑ i = { c a t , d o g } P ( i ) log ⁡ Q ( i ) = − P ( c a t ) log ⁡ Q ( c a t ) − P ( d o g ) log ⁡ Q ( d o g ) H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -P(cat)\log Q(cat)-P(dog)\log Q(dog) H(P,Q)=i={cat,dog}P(i)logQ(i)=P(cat)logQ(cat)P(dog)logQ(dog)
    − ( y log ⁡ ( p ) + ( 1 − y ) log ⁡ ( 1 − p ) ) -(y \log(p) + (1-y)\log (1-p)) (ylog(p)+(1y)log(1p))

在这里插入图片描述
P 1 = [ 1 , 0 , 0 , 0 , 0 ] P_1 = [1, 0, 0, 0, 0] P1=[1,0,0,0,0]

  1. Q 1 = [ 0.4 , 0.3 , 0.05 , 0.05 , 0.2 ] Q_1 = [0.4, 0.3, 0.05, 0.05, 0.2] Q1=[0.4,0.3,0.05,0.05,0.2]
    • H ( P , Q ) = − ∑ i = { c a t , d o g } P ( i ) log ⁡ Q ( i ) = − ( 1 log ⁡ 0.4 + 0 + 0 + 0 + 0 ) = 0.916 H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -(1\log 0.4 + 0 + 0 + 0 + 0)=0.916 H(P,Q)=i={cat,dog}P(i)logQ(i)=(1log0.4+0+0+0+0)=0.916
  2. Q 1 = [ 0.98 , 0.01 , 0 , 0 , 0.01 ] Q_1 = [0.98, 0.01, 0, 0, 0.01] Q1=[0.98,0.01,0,0,0.01]
    • H ( P , Q ) = − ∑ i = { c a t , d o g } P ( i ) log ⁡ Q ( i ) = − 1 log ⁡ 0.98 = 0.02 H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -1\log 0.98 = 0.02 H(P,Q)=i={cat,dog}P(i)logQ(i)=1log0.98=0.02

多分类

  1. one-hot
  2. multi-classification

函数API

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.25, 0.25, 0.25, 0.25])
# tf.Tensor(1.3862944, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.1, 0.8, 0.1])
# tf.Tensor(2.3978953, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.7, 0.1, 0.1])
# tf.Tensor(0.35667497, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

类API

调用 self.__call__() 方法。

tf.losses.CategoricalCrossentropy()([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

二分类

categorical_crossentropy

# class API
tf.losses.CategoricalCrossentropy()([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

# functinal API
tf.losses.categorical_crossentropy([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

binary crossentropy

# class API
print(tf.losses.BinaryCrossentropy()([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

# functinal API
print(tf.losses.binary_crossentropy([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

Why not MSE?

  1. sigmoid + MSE 会产生梯度弥散和爆炸现象
  2. 收敛缓慢
  3. 但是对于 meta learning MSE的表现却较好。

具体问题,具体分析
tensorflow 2.0 神经网络与全连接层 之 损失函数_第4张图片

分类问题网络流程 与 数值的稳定性

tensorflow 2.0 神经网络与全连接层 之 损失函数_第5张图片
为了数值的稳定性, 将 softmax 与 crossentropy 打包

x = tf.random.normal([1, 784])
w = tf.random.normal([784, 2])
b = tf.zeros([2])

logits = x@w+b

prob = tf.math.softmax(logits)

print(tf.losses.categorical_crossentropy([0, 1], logits, from_logits=True))  # 数值稳定 推荐
# tf.Tensor([0.], shape=(1,), dtype=float32)
print(tf.losses.categorical_crossentropy([0, 1], prob))   # 不推荐
# tf.Tensor([1.192093e-07], shape=(1,), dtype=float32)

你可能感兴趣的:(tensorflow,深度学习,tensorflow)