交叉熵 (Cross Entropy)

交叉熵 (Cross Entropy)

Machine Learning 很大一部分二分类任务的 loss function 使用 Cross Entrophy.,正常前面会有 s i g m o i d sigmoid sigmoid激活函数,具体前向和反先可以参考前面我的博客:https://blog.csdn.net/jmu201521121021/article/details/86658163

公式

P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 1 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (1) P(yx)=m1i=1myilog(y i)+(1yi)log(1y i)(1)

  • y i y_i yi 表示 i t h 样 本 i^{th}样本 ith标注好的标签
  • y ^ i \widehat y_i y i 表示 i t h i^{th} ith样本输出预测值

推导

  • y i ∈ [ 0 , 1 ] y_i \in [0,1] yi[0,1],-1代表负类标签,1代表正类标签
  • when y = 1 y =1 y=1, P ( y ∣ x ) = y ^ P(y|x) = \widehat y P(yx)=y 。when y = 0 , P ( y ∣ x ) = 1 − y ^ y=0,P(y|x)=1- \widehat y y=0,P(yx)=1y ,
  • 所以 可以写成 :
    P ( y ∣ x ) = y ^ y ∗ ( 1 − y ^ ) 1 − y ( 2 ) P(y|x)=\widehat y^{y}*(1-\widehat y)^{1-y}\qquad (2) P(yx)=y y(1y )1y(2)
  • 由最大似然估计可得 :
    P ( y 1 , y 2 , . . . ∣ x 1 , x 2 , . . . ) = ∏ i = 1 m P ( y i ∣ x i ) = ∏ i = 1 m y ^ i y i ∗ ( 1 − y ^ i ) 1 − y i ( 3 ) P(y1,y2,...|x_1,x_2,...) = \prod_{i=1}^m P(y_i|x_i) = \prod_{i=1}^{m}\widehat y_i^{y_i}*(1-\widehat y_i)^{1-y_i}\qquad (3) P(y1,y2,...x1,x2,...)=i=1mP(yixi)=i=1my iyi(1y i)1yi(3)
  • 由于乘法对于后面反向传播求导不方便求解,所以加个log,相乘变为相加,公式(3)变为 :
    P ( y ∣ x ) = ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 4 ) P(y|x) = \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (4) P(yx)=i=1myilog(y i)+(1yi)log(1y i)(4)
  • 为了让loss表示小点,前面加个 1 m , 有 取 平 均 效 果 \frac{1}{m},有取平均效果 m1,,由于目标是为了取到loss最小,前面再加个 − 1 -1 1
    ,即公式(4)变为: P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 5 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (5) P(yx)=m1i=1myilog(y i)+(1yi)log(1y i)(5)

代码实现

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1]

    # Compute loss from aL and y.
    ### START CODE HERE ### (≈ 1 lines of code)
    cost = -1.0 / m * np.sum( Y * np.log(AL) + (1 - Y) * np.log(1 - AL)) 
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())
    
    return cost

你可能感兴趣的:(深度学习)