参考《统计学习方法》
感知机学习算法的原始形式中,L(w,b)对于w,b的梯度是如何计算出来的有点不懂,就自己推导了一下。这里记录一下过程
1.实标量函数f(x)相对于1*n 行向量 x T x^T xT的梯度(其中x是列向量),定义为:
∂ f ( x ) ∂ x T = [ ∂ f ( x ) ∂ x 1 , ∂ f ( x ) ∂ x 2 , ∂ f ( x ) ∂ x 3 , … , ∂ f ( x ) ∂ x n ] = ∇ x T f ( x ) \frac{\partial{f(x)}}{\partial{x^T}}=[\frac{\partial{f(x)}}{\partial{x_1}},\frac{\partial{f(x)}}{\partial{x_2}},\frac{\partial{f(x)}}{\partial{x_3}},\dots,\frac{\partial{f(x)}}{\partial{x_n}}]=\nabla{x^T}f(x) ∂xT∂f(x)=[∂x1∂f(x),∂x2∂f(x),∂x3∂f(x),…,∂xn∂f(x)]=∇xTf(x)
2.几个向量的加权和,等于这些向量中每个元素的加权和组成的向量,就是:
设 a i 是 标 量 , 其 中 i ∈ ( 1 , … , M ) 。 X i = ( x i 0 , x i 1 ) ∑ i = 0 M a i X i = a 0 ( x 0 0 , x 0 1 ) + a 1 ( x 1 0 , x 1 1 ) + ⋯ + a M ( x M 0 , x M 1 ) = [ ∑ i = 0 M y i x i 0 , ∑ i = 0 M y i x i 1 ] n u m p y 中 计 算 a i X i 用 的 是 n p . d o t ( a i , X i ) , 这 里 相 当 于 把 a i 广 播 了 。 但 其 实 和 a i ∗ X i 计 算 结 果 一 样 设a_i 是标量,其中i \in {(1,\dots,M)} 。X_i = (x_i^0,x_i^1) \\ \sum_{i=0}^{M}a_iX_i \\ = a_0(x_0^0,x_0^1)+ a_1(x_1^0,x_1^1)+ \dots + a_M(x_M^0,x_M^1) \\ = [\sum_{i=0}^{M}y_ix_i^0,\sum_{i=0}^{M}y_ix_i^1] \\ numpy中计算a_iX_i用的是np.dot(a_i, X_i) ,这里相当于把a_i广播了。但其实和a_i*X_i计算结果一样 设ai是标量,其中i∈(1,…,M)。Xi=(xi0,xi1)i=0∑MaiXi=a0(x00,x01)+a1(x10,x11)+⋯+aM(xM0,xM1)=[i=0∑Myixi0,i=0∑Myixi1]numpy中计算aiXi用的是np.dot(ai,Xi),这里相当于把ai广播了。但其实和ai∗Xi计算结果一样
损失函数为:
L ( w , b ) = − ∑ x i ∈ M y i ( w x i + b ) L(w,b) = - \sum_{x_i \in M}y_i(wx_i+b) L(w,b)=−xi∈M∑yi(wxi+b)
不失一般性,设
L ( w , b ) = − ∑ i = 0 M y i ( w x i + b ) 其 中 w = ( w 0 , w 1 ) , x i = ( x i 0 , x i 1 ) L(w,b) = - \sum_{i=0}^M y_i(wx_i+b) \\ 其中w = (w_0,w_1),x_i = (x_i^{0},x_i^1) L(w,b)=−i=0∑Myi(wxi+b)其中w=(w0,w1),xi=(xi0,xi1)
那么:
L ( w , b ) = − ( y 0 ( w 0 x 0 0 + w 1 x 0 1 + b ) + y 1 ( w 0 x 1 0 + w 1 x 1 1 + b ) + y 2 ( w 0 x 2 0 + w 1 x 2 1 + b ) … + y M ( w 0 x M 0 + w 1 x M 1 + b ) ) = − ( ( ∑ i = 0 M y i x i 0 ) w 0 + ( ∑ i = 0 M y i x i 1 ) w 1 + b ∑ i = 0 M y i ) L(w,b) = -(y_0(w_0x_0^0+w_1x_0^1+b) \\ + y_1(w_0x_1^0+w_1x_1^1+b) \\ + y_2(w_0x_2^0+w_1x_2^1+b) \\ \dots \\ + y_M(w_0x_M^0+w_1x_M^1+b)) \\ = - ((\sum_{i=0}^{M}y_ix_i^0)w_0+ (\sum_{i=0}^{M}y_ix_i^1)w_1+b\sum_{i=0}^{M}y_i) L(w,b)=−(y0(w0x00+w1x01+b)+y1(w0x10+w1x11+b)+y2(w0x20+w1x21+b)…+yM(w0xM0+w1xM1+b))=−((i=0∑Myixi0)w0+(i=0∑Myixi1)w1+bi=0∑Myi)
∂ L ( w , b ) ∂ w = [ ∂ L ( w , b ) ∂ w 0 , ∂ L ( w , b ) ∂ w 1 ] = − [ ∑ i = 0 M y i x i 0 , ∑ i = 0 M y i x i 1 ] = − ∑ i = 0 M y i x i \frac{\partial{L(w,b)}}{\partial{w}} \\ =[\frac{\partial{L(w,b)}}{\partial{w_0}},\frac{\partial{L(w,b)}}{\partial{w_1}}] \\ = -[\sum_{i=0}^{M}y_ix_i^0,\sum_{i=0}^{M}y_ix_i^1] \\ = - \sum_{i=0}^{M}y_ix_i ∂w∂L(w,b)=[∂w0∂L(w,b),∂w1∂L(w,b)]=−[i=0∑Myixi0,i=0∑Myixi1]=−i=0∑Myixi
写成书里的样子:
∇ w L ( w , b ) = − ∑ x i ∈ M y i x i \nabla{w}L(w,b) = - \sum_{x_i \in M}y_ix_i ∇wL(w,b)=−xi∈M∑yixi
对b求导就简单了,b只是一个标量。于是
∂ L ( w , b ) ∂ b = − ∑ i = 0 M y i \frac{\partial{L(w,b)}}{\partial{b}}=-\sum_{i=0}^{M}y_i ∂b∂L(w,b)=−i=0∑Myi
写成书里的样子:
∇ b L ( w , b ) = − ∑ x i ∈ M y i \nabla{b}L(w,b) = - \sum_{x_i \in M}y_i ∇bL(w,b)=−xi∈M∑yi