【参考文献】
【1】《面向机器智能的TensorFlow实践》4.7
假设存在网络结果如下
各个层输出定义
L 1 = s i g m o i d ( w 1 ⋅ x ) L_1 = sigmoid(w_1 \cdot x) L1=sigmoid(w1⋅x)
L 2 = s i g m o i d ( w 2 ⋅ L 1 ) L_2 = sigmoid(w_2 \cdot L_1) L2=sigmoid(w2⋅L1)
L 3 = s i g m o i d ( w 3 ⋅ L 2 ) L_3 = sigmoid(w_3 \cdot L_2) L3=sigmoid(w3⋅L2)
定义整个网络最终的损失函数为
l o s s = L o s s ( L 3 , y e x p e c t ) loss = Loss(L_3, y_{expect}) loss=Loss(L3,yexpect)
对损失函数求 w 3 w_3 w3偏导数,得到
∂ l o s s ∂ w 3 = L o s s ′ ( L 3 , y e x p e c t ) s i g m o i d ′ ( w 3 , L 2 ) L 2 \dfrac{\partial loss}{\partial w_3}=Loss'(L_3, y_{expect})sigmoid'(w_3, L_2)L_2 ∂w3∂loss=Loss′(L3,yexpect)sigmoid′(w3,L2)L2
同理我们得到对 w 2 w_2 w2和 w 1 w_1 w1的偏导数
∂ l o s s ∂ w 2 = L o s s ′ ( L 3 , y e x p e c t ) s i g m o i d ′ ( w 3 , L 2 ) s i g m o i d ′ ( w 2 , L 1 ) L 1 \dfrac{\partial loss}{\partial w_2}=Loss'(L_3, y_{expect})sigmoid'(w_3, L_2)sigmoid'(w_2, L_1)L_1 ∂w2∂loss=Loss′(L3,yexpect)sigmoid′(w3,L2)sigmoid′(w2,L1)L1
∂ l o s s ∂ w 1 = L o s s ′ ( L 3 , y e x p e c t ) s i g m o i d ′ ( w 3 , L 2 ) s i g m o i d ′ ( w 2 , L 1 ) s i g m o i d ′ ( w 1 , x ) x \dfrac{\partial loss}{\partial w_1}=Loss'(L_3, y_{expect})sigmoid'(w_3, L_2)sigmoid'(w_2, L_1)sigmoid'(w_1, x)x ∂w1∂loss=Loss′(L3,yexpect)sigmoid′(w3,L2)sigmoid′(w2,L1)sigmoid′(w1,x)x
综上所述,我们将整个求导公式简写
∂ l o s s ∂ w 3 = L o s s ′ L 3 ′ L 2 \dfrac{\partial loss}{\partial w_3}=Loss'L_3'L_2 ∂w3∂loss=Loss′L3′L2
∂ l o s s ∂ w 2 = L o s s ′ L 3 ′ L 2 ′ L 1 \dfrac{\partial loss}{\partial w_2}=Loss'L_3'L_2'L1 ∂w2∂loss=Loss′L3′L2′L1
∂ l o s s ∂ w 1 = L o s s ′ L 3 ′ L 2 ′ L 1 ′ x \dfrac{\partial loss}{\partial w_1}=Loss'L_3'L_2'L1'x ∂w1∂loss=Loss′L3′L2′L1′x
可以看到规律在反向求导中,每一次计算都可以重用前一层的计算结果,这也就是所谓的反向传播算法。