Softmax公式及梯度计算

softmax是一个多分类器,可以计算预测对象属于各个类别的概率。

公式

y i = S ( z ) i = e z i ∑ j = 1 C e z j , i = 1 , . . . , C y_i=S(\boldsymbol{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{C}e^{z_j}},i=1,...,C yi=S(z)i=j=1Cezjezii=1,...,C

  • z \boldsymbol{z} z是上一层的输出,softmax的输入, 维度为 C C C
  • y i y_i yi为预测对象属于第 c c c类的概率

梯度

Softmax公式及梯度计算_第1张图片

变量间的计算图如上,已知 y \boldsymbol{y} y的梯度 ∂ l ∂ y i , i = 1 , . . . , C \frac{\partial l}{\partial y_i}, i=1,...,C yil,i=1,...,C,要计算 z \boldsymbol{z} z的梯度 ∂ l ∂ z j , j = 1 , . . . , C \frac{\partial l}{\partial z_j}, j=1,...,C zjl,j=1,...,C

从计算图中可以看到, z \boldsymbol{z} z的分量 z j z_j zj y \boldsymbol{y} y的每一个分量都有贡献,因此:
∂ l ∂ z j = ∑ i = 1 C ∂ l ∂ y i ∂ y i ∂ z j \frac{\partial l}{\partial z_j} = \sum_{i=1}^{C}\frac{\partial l}{\partial y_i} \frac{\partial y_i}{\partial z_j} zjl=i=1Cyilzjyi

由于 ∂ l ∂ y i \frac{\partial l}{\partial y_i} yil已知,因此计算 ∂ y i ∂ z j \frac{\partial y_i}{\partial z_j} zjyi即可!

为方便记 ∑ j = 1 C e z j \sum_{j=1}^{C}e^{z_j} j=1Cezj ∑ C \sum_C C

(1) i = j i=j i=j时:
∂ y i ∂ z j = e z i ∑ C − e z i e z i ∑ C 2 = e z i ∑ C − e z i ∑ C 2 = y i − y i 2 = y i ( 1 − y i ) \begin{aligned} \frac{\partial y_i}{\partial z_j} & = \frac{e^{z_i}\sum_C-e^{z_i}e^{z_i}}{{\sum_C}^2} \\ &=\frac{e^{z_i}}{\sum_C} - \frac{e_{z_i}}{\sum_C}^2 \\ & = y_i-y_i^2 \\ & = y_i(1-y_i) \end{aligned} zjyi=C2eziCeziezi=CeziCezi2=yiyi2=yi(1yi)
(2) i ≠ j i \neq j i̸=j
∂ y i ∂ z j = 0 ∑ C − e z i e z j ∑ C 2 = − e z i ∑ C e z j ∑ C = − y i y j \begin{aligned} \frac{\partial y_i}{\partial z_j} &= \frac{0\sum_C - e^{z_i}e^{z_j}}{{\sum_C}^2} \\ &= -\frac{e_{z_i}}{\sum_C}\frac{e_{z_j}}{\sum_C} \\ &=-y_iy_j \end{aligned} zjyi=C20Ceziezj=CeziCezj=yiyj

你可能感兴趣的:(Deeplearning)