Softmax回归交叉熵损失函数求导

softmax函数的表达式: a i = e z i ∑ k e z k a_{i}=\frac{e^{z_{i}}}{\sum_{k} e^{z_{k}}} ai=kezkezi

交叉熵 损失函数: C = − ∑ i y i ln ⁡ a i C=-\sum_{i} y_{i} \ln a_{i} C=iyilnai

根据复合函数求导法则: ∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) \frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) ziC=j(ajCjziaj)

计算前面一项: ∂ C j ∂ a j = ∂ ( − y j ln ⁡ a j ) ∂ a j = − y j 1 a j \frac{\partial C_{j}}{\partial a_{j}}=\frac{\partial\left(-y_{j} \ln a_{j}\right)}{\partial a_{j}}=-y_{j} \frac{1}{a_{j}} ajCj=aj(yjlnaj)=yjaj1

计算后面一项:

  • 如果 i = j i=j i=j
    ∂ a i ∂ z i = ∂ ( e z i ∑ k e z k ) ∂ z i = ∑ k e z k e z i − ( e z i ) 2 ( ∑ k e z k ) 2 = ( e z i ∑ k e z k ) ( 1 − e z i ∑ k e z k ) = a i ( 1 − a i ) \frac{\partial a_{i}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z_{i}}}{\sum_{k} e^{z_{k}}}\right)}{\partial z_{i}}=\frac{\sum_{k} e^{z_{k}} e^{z_{i}}-\left(e^{z_{i}}\right)^{2}}{\left(\sum_{k} e^{z_{k}}\right)^{2}}=\left(\frac{e^{z_{i}}}{\sum_{k} e^{z_{k}}}\right)\left(1-\frac{e^{z_{i}}}{\sum_{k} e^{z_{k}}}\right)=a_{i}\left(1-a_{i}\right) ziai=zi(kezkezi)=(kezk)2kezkezi(ezi)2=(kezkezi)(1kezkezi)=ai(1ai)

  • 如果 i ≠ j i \neq j i=j
    ∂ a j ∂ z i = ∂ ( e i j ∑ t j e k ) ∂ z i = − e z j ( 1 ∑ k e z k ) 2 e z i = − a i a j \frac{\partial a_{j}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{i j}}{\sum t^{j} e^{k}}\right)}{\partial z_{i}}=-e^{z_{j}}\left(\frac{1}{\sum_{k} e^{z_{k}}}\right)^{2} e^{z_{i}}=-a_{i} a_{j} ziaj=zi(tjekeij)=ezj(kezk1)2ezi=aiaj

组合起来得到:
∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i ( ∂ C j ∂ a j ∂ a j ∂ z i ) + ∑ i = j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i − y j 1 a j ( − a i a j ) + ( − y i 1 a i ) ( a i ( 1 − a i ) ) = ∑ j ≠ i a i y j + ( − y i ( 1 − a i ) ) = ∑ j ≠ i a i y j + a i y i − y i = a i ∑ j y j − y i \begin{array}{l}{\frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)=\sum_{j \neq i}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)+\sum_{i=j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)} \\ {\quad=\sum_{j \neq i}-y_{j} \frac{1}{a_{j}}\left(-a_{i} a_{j}\right)+\left(-y_{i} \frac{1}{a_{i}}\right)\left(a_{i}\left(1-a_{i}\right)\right)} \\ {\quad=\sum_{j \neq i} a_{i} y_{j}+\left(-y_{i}\left(1-a_{i}\right)\right)} \\ {\quad=\sum_{j \neq i} a_{i} y_{j}+a_{i} y_{i}-y_{i}} \\ {\quad=a_{i} \sum_{j} y_{j}-y_{i}}\end{array} ziC=j(ajCjziaj)=j=i(ajCjziaj)+i=j(ajCjziaj)=j=iyjaj1(aiaj)+(yiai1)(ai(1ai))=j=iaiyj+(yi(1ai))=j=iaiyj+aiyiyi=aijyjyi
对于分类问题,这个梯度等于:
∂ C ∂ z i = a i − y i \frac{\partial C}{\partial z_{i}}=a_{i}-y_{i} ziC=aiyi

你可能感兴趣的:(机器学习,ml)