另一个博主有更详细的推导https://blog.csdn.net/chaipp0607/article/details/101946040
综上 , ∂ L ∂ z i = ∑ j = 1 10 ∂ L ∂ p j ∂ p j ∂ z i = ∑ j ≠ i ∂ L ∂ p j ∂ p j ∂ z i + ( p i − 1 ) × y i = ∑ j ≠ i p i × y j + ( p i − 1 ) × y i 综上, \frac{\partial L}{\partial z_i}=\sum_{j=1}^{10}\frac{\partial L}{\partial p_j}\frac{\partial p_j}{\partial z_i}=\sum_{j\neq i}\frac{\partial L}{\partial p_j}\frac{\partial p_j}{\partial z_i}+(p_i-1)\times y_i=\\ \sum_{j\neq i}p_i\times y_j+(p_i-1)\times y_i 综上,∂zi∂L=j=1∑10∂pj∂L∂zi∂pj=j=i∑∂pj∂L∂zi∂pj+(pi−1)×yi=j=i∑pi×yj+(pi−1)×yi
且已知 ∑ j = 1 10 y i = 1 , 所以 ∂ L ∂ z i = p i − y i ! 且已知 \sum_{j=1}^{10}y_i=1,所以\frac{\partial L}{\partial z_i}=p_i-y_i! 且已知j=1∑10yi=1,所以∂zi∂L=pi−yi!
z = [ z 10 , z 11 , . . . , z 19 . . . z n 0 , z n 1 , . . . , z n 9 ] z= \begin{bmatrix} z_{10},z_{11},...,z_{19}\\ ...\\ z_{n0},z_{n1},...,z_{n9}\\ \end{bmatrix} z= z10,z11,...,z19...zn0,zn1,...,zn9
y = [ y 10 , y 11 , . . . , y 19 . . . y n 0 , y n 1 , . . . , y n 9 ] y= \begin{bmatrix} y_{10},y_{11},...,y_{19}\\ ...\\ y_{n0},y_{n1},...,y_{n9}\\ \end{bmatrix} y= y10,y11,...,y19...yn0,yn1,...,yn9
∂ L ∂ z = s o f t m a x ( z ) − y / / 这里用到 n u m p y 的广播机制 \frac{\partial L}{\partial z}=softmax(z)-y //这里用到numpy的广播机制\\ ∂z∂L=softmax(z)−y//这里用到numpy的广播机制
∂ L ∂ z = [ s o f t m a x ( z 10 − y 1 ) , s o f t m a x ( z 11 − y 1 ) , . . . , s o f t m a x ( z 19 − y 1 ) . . . s o f t m a x ( z n 0 − y n ) , s o f t m a x ( z n 1 − y n ) , . . . , s o f t m a x ( z n 9 − y n ) ] \frac{\partial L}{\partial z}=\begin{bmatrix} softmax(z_{10}-y_1),softmax(z_{11}-y_1),...,softmax(z_{19}-y_1)\\ ...\\ softmax(z_{n0}-y_n),softmax(z_{n1}-y_n),...,softmax(z_{n9}-y_n)\\ \end{bmatrix} ∂z∂L= softmax(z10−y1),softmax(z11−y1),...,softmax(z19−y1)...softmax(zn0−yn),softmax(zn1−yn),...,softmax(zn9−yn)