softmax loss 交叉熵损失函数求导

1. softmax 函数求导

求导之前我们先了解softmax 函数,softmax一般是用来作为网络的输出层,直接输出概率信息,定义如下:

\large S_i = \frac{e^{a_i}}{\sum_{j}e^{a_j}}

那么我们对softmax 函数S_i 进行求导,为了简洁把求和里面的一大堆用\small \sum 简写:

\large \frac{\partial S_i}{\partial a_j} = \frac{\frac{\partial e^{a_i}}{\partial a_j}\sum - e^{a_i} \frac{\partial \sum }{\partial a_j}}{\sum ^2}

①当 i = j 时:

\large \frac{\partial S_i}{\partial a_j} = \frac{e^{a_i}\sum - e^{a_i}e^{a_j}}{\sum ^2} = \frac{e^{a_i}}{\sum }\cdot \frac{\sum - e^{a_j}}{\sum } = S_i(1-S_j)

②当 i ≠ j 时:

\large \frac{\partial S_i}{\partial a_j} = \frac{-e^{a_i}e^{a_j}}{\sum ^2} = -S_i\cdot S_j

 

2. softmax loss 求导

好了,有了前面这些知识之后我们开始求导。这个函数最要是用来计算分类的loss的,我们训练模型时就要计算loss,再来求导进行反向传播,这也是写本文的目的。先看定义:

\large L = -\sum y_i\, log(S_i)

公式中log的底数为e,就是ln了;\large y_i取0或1,表示当训练时\large y_i的输出为第i类时,\large y_i = 1,为其他类别时\large y_i = 0;

\large \frac{\partial L}{\partial S_i} = -y_i\, \frac{1}{S_i}

 

对所有像素求导:

\large \frac{\partial L}{\partial a_i} = \sum \frac{\partial L}{\partial S_j} \cdot \frac{\partial S_j}{\partial a_i} = \frac{\partial L}{\partial S_i} \cdot \frac{\partial S_i}{\partial a_i} + \sum_{i\neq j}\frac{\partial L}{\partial S_j} \cdot \frac{\partial S_j}{\partial a_i}  

\large = -\frac{y_i}{S_i}\, S_i(1-S_i) + \sum_{i\neq j}(-\frac{y_j}{S_j})(-1)S_iS_j

\large = y_i(S_j - 1) + \sum_{i\neq j}y_jS_i

我们把这两种情况分开写,得到:

\large \frac{\partial L}{\partial a_i} = \begin{cases} S_j-1 & \text{ if } i=j \\ S_i & \text{ else } \end{cases}

 


参考资料:

https://blog.csdn.net/grllery/article/details/97788745

 

你可能感兴趣的:(AI,算法)