二元交叉熵损失梯度推导

二元交叉熵损失(logistic 损失)定义如下:
L logistic ( y ^ , y ) = − y l o g y ^ − ( 1 − y ) l o g ( 1 − y ^ ) L_{\text{logistic}}(\hat{y},y) =-ylog\hat{y}-(1-y)log(1-\hat{y}) Llogistic(y^,y)=ylogy^(1y)log(1y^)
其中
y ∈ { 0 , 1 } y\in\{0,1\} y{0,1}, y ^ = σ ( y ˉ ) \hat{y}=\sigma(\bar{y}) y^=σ(yˉ), σ ( y ˉ ) = 1 1 + e − y ˉ \sigma(\bar{y})=\dfrac{1}{1+e^{-\bar{y}}} σ(yˉ)=1+eyˉ1, ∂ σ ∂ y ˉ = σ ( 1 − σ ) \dfrac{\partial \sigma}{\partial \bar{y}}=\sigma(1-\sigma) yˉσ=σ(1σ)
y ˉ = w ⋅ x + b = ∑ j w j x j + b \bar{y}=\bold{w}\cdot \bold{x} + b =\displaystyle\sum_jw_jx_j+b yˉ=wx+b=jwjxj+b, ∂ y ˉ ∂ w j = x j \dfrac{\partial \bar{y}}{\partial w_j}=x_j wjyˉ=xj
样本 x = ( x 1 , ⋯   , x j , ⋯   , x n ) \bold{x}=(x_1,\cdots,x_j,\cdots,x_n) x=(x1,,xj,,xn)共包含n个特征,权重向量 w = ( w 1 , ⋯   , w j , ⋯   , w n ) \bold{w}=(w_1,\cdots,w_j,\cdots,w_n) w=(w1,,wj,,wn)共包含n个权重,与特征一一对应,则
∂ L ∂ w j = − y 1 σ ∂ σ ∂ y ˉ ∂ y ˉ ∂ w j − ( 1 − y ) − 1 1 − σ ∂ σ ∂ y ˉ ∂ y ˉ ∂ w j = σ − y σ ( 1 − σ ) ∂ σ ∂ y ˉ ∂ y ˉ ∂ w j = σ − y σ ( 1 − σ ) σ ( 1 − σ ) x j = ( σ − y ) x j = [ σ ( w ⋅ x + b ) − y ] x j \begin{aligned} \dfrac{\partial L}{\partial w_j} &= -y \dfrac{1}{\sigma}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j} -(1-y)\dfrac{-1}{1-\sigma}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j} \\ &=\dfrac{\sigma - y}{\sigma(1-\sigma)}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j} \\ &=\dfrac{\sigma - y}{\sigma(1-\sigma)}\sigma(1-\sigma)x_j \\ &=(\sigma -y)x_j \\ &=[\sigma(\bold{w}\cdot \bold{x} + b)-y]x_j \end{aligned} wjL=yσ1yˉσwjyˉ(1y)1σ1yˉσwjyˉ=σ(1σ)σyyˉσwjyˉ=σ(1σ)σyσ(1σ)xj=(σy)xj=[σ(wx+b)y]xj

你可能感兴趣的:(AI)