关于softmax,cross entropy,三层全连接的导数计算以及反向传播

  • 在本文中,我们主要介绍softmax,softmax+crossentropy,三层全连接的导数计算和反向传播

softmax

  • 定义: S ( a i ) = e a i ∑ j = 1 N e a j S(a_i) = \frac{e^{a_i}}{\sum_{j=1}^N{e^{a_j}}} S(ai)=j=1Neajeai
  • 倒数计算过程(令 S i S_i Si表示 S ( a i ) S(a_i) S(ai)):
    i f   i = = k : ∂ S i ∂ a k = e a i ∑ j = 1 N e a j + e a i ∗ − 1 ∗ e a i ( ∑ j = 1 N e a j ) 2 = e a i ∑ j = 1 N e a j ∗ ( 1 − e a i ∑ j = 1 N e a j ) = S i ( 1 − S i )     i f   i ! = k : ∂ S i ∂ a k = e a i ∗ − 1 ∗ e a k ( ∑ j = 1 N e a j ) 2 = − S i ∗ S k if\ i==k:\\ \frac{\partial{S_i}}{\partial{a_k}}\\ =\frac{e^{a_i}}{\sum_{j=1}^N{e^{a_j}}} + \frac{e^{a_i} * -1 * e^{a_i}}{(\sum_{j=1}^N{e^{a_j}})^2} \\ = \frac{e^{a_i}}{\sum_{j=1}^N{e^{a_j}}} * (1-\frac{e^{a_i}}{\sum_{j=1}^N{e^{a_j}}}) \\ = S_i(1-S_i) \\ \ \\ \ \\ if\ i != k:\\ \frac{\partial{S_i}}{\partial{a_k}} \\ = \frac{e^{a_i} * -1*e^{a_k}}{(\sum_{j=1}^N{e^{a_j}})^2} \\ =-S_i*S_k if i==kakSi=j=1Neajeai+(j=1Neaj)2eai1eai=j=1Neajeai(1j=1Neajeai)=Si(1Si)  if i!=kakSi=(j=1Neaj)2eai1eak=SiSk

Softmax+CrossEntropy

  • 定义: L c e = ∑ i = 1 N − y i ∗ l o g ( S i ) L_{ce} = \sum_{i=1}^N{-y_i * log(S_i)} Lce=i=1Nyilog(Si)
  • 导数计算过程:
    ∂ L c e ∂ a k = ∑ i = 1 N − y i ∗ ∂ l o g ( S i ) ∂ a k = − y i ∗ 1 S i ∗ S i ∗ ( 1 − S i ) − ∑ i ! = k y i ∗ 1 S i ∗ ( − S i ∗ S k ) = − y i ∗ ( 1 − S i ) + ∑ i ! = k y i ∗ S k = − y i + ∑ i = 1 N y i ∗ S k = S k − y i \frac{\partial{L_{ce}}}{\partial{a_k}} = \sum_{i=1}^N{-y_i * \frac{\partial{log(S_i)}}{\partial a_k}}\\ = -y_i*\frac{1}{S_i}*S_i*(1-S_i) - \sum_{i!=k}{y_i*\frac{1}{S_i}*(-S_i*S_k)}\\ =-y_i*(1-S_i) + \sum_{i!=k}{y_i * S_k} \\ =-y_i + \sum_{i=1}^N{y_i * S_k} \\ = S_k - y_i akLce=i=1Nyiaklog(Si)=yiSi1Si(1Si)i!=kyiSi1(SiSk)=yi(1Si)+i!=kyiSk=yi+i=1NyiSk=Skyi
  • 物理意义:softmax 和 cross entropy组合,对softmax前的第i维度的导数就是对应softmax后第i维度和ground truth 的差值

三层全联接求导

  • 网络结构如下所示
    关于softmax,cross entropy,三层全连接的导数计算以及反向传播_第1张图片
  • 我们首先定义如下符合
    • L = 1 2 ∗ ∣ y − z ∣ 2 L=\frac{1}{2} * |y-z|^2 L=21yz2
    • z k = f ( ∑ j = 1 h w k j y j ) z_k=f(\sum_{j=1}^h{w_{kj}y_j}) zk=f(j=1hwkjyj), f is the non-linear function, such as softmax
    • n e t k = ∑ j = 1 h w k j y j net_k = \sum_{j=1}^h{w_{kj}y_j} netk=j=1hwkjyj
    • y j = f ( ∑ i = 1 d w j i x i ) y_j = f(\sum_{i=1}^d{w_{ji}x_i}) yj=f(i=1dwjixi)
    • n e t j = ∑ i = 1 d w j i x i net_j = \sum_{i=1}^d{w_{ji}x_i} netj=i=1dwjixi
    • x i x_i xi represents the input value on the i-th dimension.
  • 接下来我们分别计算对 w k j a n d w j i w_{kj} and w_{ji} wkjandwji的导数
    • derivation for w k j w_{kj} wkj
      ∂ L ∂ w k j = ( y k − z k ) ∗ − 1 ∗ ∂ z k ∂ w k j = − 1 ∗ ( y k − z k ) ∗ ∂ z k ∂ n e t k ∗ ∂ n e t k ∂ w k j = − 1 ∗ ( y k − z k ) ∗ f ′ ( n e t k ) ∗ y j ∴ w k j = w k j − α Δ w k j = w k j + ( y k − z k ) ∗ f ′ ( n e t k ) ∗ y j \frac{\partial{L}}{\partial{w_{kj}}}=(y_k-z_k)*-1*\frac{\partial{z_k}}{\partial{w_{kj}}}\\ =-1*(y_k-z_k)*\frac{\partial{z_k}}{\partial{net_k}}*\frac{\partial{net_k}}{\partial{w_{kj}}}\\ =-1*(y_k-z_k)*f'(net_k)*y_j\\ \therefore w_{kj} = w_{kj} - \alpha\Delta{w_{kj}} \\ = w_{kj} + (y_k-z_k)*f'(net_k)*y_j wkjL=(ykzk)1wkjzk=1(ykzk)netkzkwkjnetk=1(ykzk)f(netk)yjwkj=wkjαΔwkj=wkj+(ykzk)f(netk)yj
    • derivation for w j i w_ji wji
      ∂ L ∂ w j i = ∂ L ∂ y j ∗ ∂ y j ∂ n e t j ∗ ∂ n e t j ∂ w j i ∂ L ∂ y j = ∑ k = 1 h ∂ L ∂ n e t k ∗ ∂ n e t k ∂ y j = ∑ k = 1 h − ( y k − z k ) ∗ f ′ ( n e t k ) ∗ w k j ∴ ∂ L ∂ w j i = ∑ k = 1 h − ( y k − z k ) ∗ f ′ ( n e t k ) ∗ w k j ∗ f ′ ( n e t j ) ∗ x i \frac{\partial{L}}{\partial{w_{ji}}}=\frac{\partial{L}}{\partial{y_j}}*\frac{\partial{y_j}}{\partial{net_j}}*\frac{\partial{net_j}}{\partial{w_{ji}}}\\ \frac{\partial{L}}{\partial{y_j}}=\sum_{k=1}^h{\frac{\partial{L}}{\partial{net_k}}}*\frac{\partial{net_k}}{\partial{y_j}} \\ =\sum_{k=1}^h{-(y_k-z_k)*f'(net_k)*w_{kj}} \\ \therefore \\ \frac{\partial{L}}{\partial{w_{ji}}}=\sum_{k=1}^h{-(y_k-z_k)*f'(net_k)*w_{kj}} * f'(net_j)*x_i wjiL=yjLnetjyjwjinetjyjL=k=1hnetkLyjnetk=k=1h(ykzk)f(netk)wkjwjiL=k=1h(ykzk)f(netk)wkjf(netj)xi
    • 至此, 我们已经计算了这两层里面所有的参数的梯度,然后我们就可以通过反向传播去计算梯度啦。
    • 至于 f ′ ( n e t k ) f'(net_k) f(netk)的计算参照softmax梯度的计算方法。 f ′ ( n e t k ) = n e t k ∗ ( 1 − n e t k ) f'(net_k) = net_k*(1-net_k) f(netk)=netk(1netk)

你可能感兴趣的:(深度学习,deep,learning,求导,softmax,反向传播)