Lei_ZM
2019-09-10
求解偏置 b b b和权重 w w w推导思路
由最小二乘法导出损失函数 E ( w , b ) E(w, b) E(w,b)
证明损失函数
分别对损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b和 w w w求一阶偏导数
令各自的一阶偏导数等于0解出 b b b和 w w w
E ( w , b ) = ∑ i = 1 m ( y i − f ( x i ) ) 2 = ∑ i = 1 m ( y i − ( w x i + b ) ) 2 = ∑ i = 1 m ( y i − w x i − b ) 2 (西瓜书式3.4) \begin{aligned} E_{(w, b)} &=\sum_{i=1}^{m}\left(y_{i}-f\left(x_{i}\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} \tag{西瓜书式3.4} E(w,b)=i=1∑m(yi−f(xi))2=i=1∑m(yi−(wxi+b))2=i=1∑m(yi−wxi−b)2(西瓜书式3.4)
设 f ( x , y ) f(x, y) f(x,y)在区域 D D D上具有二阶连续偏导数,记 A = f x x ′ ′ ( x , y ) A=f_{x x}^{\prime \prime}(x, y) A=fxx′′(x,y), B = f x y ′ ′ ( x , y ) B=f_{x y}^{\prime \prime}(x, y) B=fxy′′(x,y), C = f y y ′ ′ ( x , y ) C=f_{y y}^{\prime \prime}(x, y) C=fyy′′(x,y)。则:
设 f ( x , y ) f(x, y) f(x,y)是在开区域 D D D内具有连续偏导数的凸(或者凹)函数, ( x 0 , y 0 ) ∈ D (x_{0}, y_{0})\in D (x0,y0)∈D,且 f x ′ ( x 0 , y 0 ) = 0 f_{x}^{\prime}(x_{0}, y_{0})=0 fx′(x0,y0)=0, f y ′ ( x 0 , y 0 ) = 0 f_{y}^{\prime}(x_{0}, y_{0})=0 fy′(x0,y0)=0,则 f ( x 0 , y 0 ) f(x_{0}, y_{0}) f(x0,y0)必为 f ( x , y ) f(x, y) f(x,y)在 D D D内的最小值(或最大值)。
证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w和 b b b的凸函数——求 A = f x x ′ ′ ( x , y ) A=f_{xx}^{\prime \prime}(x, y) A=fxx′′(x,y):
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (西瓜书式3.5) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=\frac{\partial}{\partial w}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) \\ &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) \end{aligned} \tag{西瓜书式3.5} ∂w∂E(w,b)=∂w∂[i=1∑m(yi−(wxi+b))2]=i=1∑m∂w∂(yi−wxi−b)2=i=1∑m2⋅(yi−wxi−b)⋅(−xi)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)(西瓜书式3.5)
故有:
∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ w [ 2 w ∑ i = 1 m x i 2 ] = 2 ∑ i = 1 m x i 2 \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial w}\left[2 w \sum_{i=1}^{m} x_{i}^{2}\right] \\ &=2 \sum_{i=1}^{m} x_{i}^{2} \end{aligned} ∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=∂w∂[2wi=1∑mxi2]=2i=1∑mxi2
此式即为 A = f x x ′ ′ ( x , y ) A=f_{xx}^{\prime \prime}(x, y) A=fxx′′(x,y)。
证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w和 b b b的凸函数——求 B = f x y ′ ′ ( x , y ) B=f_{xy}^{\prime \prime}(x, y) B=fxy′′(x,y):
∂ 2 E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ b [ − 2 ∑ i = 1 m ( y i − b ) x i ] = ∂ ∂ b ( − 2 ∑ i = 1 m y i x i + 2 ∑ i = 1 m b x i ) = ∂ ∂ b ( 2 ∑ i = 1 m b x i ) = 2 ∑ i = 1 m x i \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w \partial b} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial b}\left[-2 \sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right] \\ &=\frac{\partial}{\partial b}\left(-2 \sum_{i=1}^{m} y_{i} x_{i}+2 \sum_{i=1}^{m} b x_{i}\right) \\ &=\frac{\partial}{\partial b}\left(2 \sum_{i=1}^{m} b x_{i}\right) \\ &=2 \sum_{i=1}^{m} x_{i} \end{aligned} ∂w∂b∂2E(w,b)=∂b∂(∂w∂E(w,b))=∂b∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=∂b∂[−2i=1∑m(yi−b)xi]=∂b∂(−2i=1∑myixi+2i=1∑mbxi)=∂b∂(2i=1∑mbxi)=2i=1∑mxi
此式即为 B = f x y ′ ′ ( x , y ) B=f_{xy}^{\prime \prime}(x, y) B=fxy′′(x,y)。
证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w和 b b b的凸函数——求 C = f y y ′ ′ ( x , y ) C=f_{yy}^{\prime \prime}(x, y) C=fyy′′(x,y):
∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − 1 ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (西瓜书式3.6) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} \tag{西瓜书式3.6} ∂b∂E(w,b)=∂b∂[i=1∑m(yi−(wxi+b))2]=i=1∑m∂b∂(yi−wxi−b)2=i=1∑m2⋅(yi−wxi−b)⋅(−1)=2(mb−i=1∑m(yi−wxi))(西瓜书式3.6)
故有:
∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = ∂ ∂ b ( 2 m b ) = 2 m \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial b^{2}} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &=\frac{\partial}{\partial b}(2 m b) \\ &=2 m \end{aligned} ∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂[2(mb−i=1∑m(yi−wxi))]=∂b∂(2mb)=2m
此式即为 C = f y y ′ ′ ( x , y ) C=f_{yy}^{\prime \prime}(x, y) C=fyy′′(x,y)。
综上所述,有:
{ A = f x x ′ ′ ( x , y ) = 2 ∑ i = 1 m x i 2 B = f x y ′ ′ ( x , y ) = 2 ∑ i = 1 m x i C = f y y ′ ′ ( x , y ) = 2 m \left\{ \begin{aligned} &A=f_{xx}^{\prime \prime}(x, y)=2 \sum_{i=1}^{m} x_{i}^{2} \\ &B=f_{xy}^{\prime \prime}(x, y)=2 \sum_{i=1}^{m} x_{i} \\ &C=f_{yy}^{\prime \prime}(x, y)=2 m \end{aligned} \right. ⎩⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎧A=fxx′′(x,y)=2i=1∑mxi2B=fxy′′(x,y)=2i=1∑mxiC=fyy′′(x,y)=2m
所以:
A C − B 2 = 2 m ⋅ 2 ∑ i = 1 m x i 2 − ( 2 ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ⋅ m ⋅ 1 m ⋅ ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) ∑ i = 1 m x i x ˉ = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x ˉ 2 ) = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} A C-B^{2} &=2 m \cdot 2 \sum_{i=1}^{m} x_{i}^{2}-\left(2 \sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4\left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4 \cdot m \cdot \frac{1}{m} \cdot\left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4 m \cdot \bar{x} \cdot \sum_{i=1}^{m} x_{i} \\ &=4 m\left(\sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m} x_{i} \bar{x}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+x_{i} \bar{x}\right) \\ &\qquad \sum_{i=1}^{m} x_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+\bar{x}^{2}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2} \end{aligned} AC−B2=2m⋅2i=1∑mxi2−(2i=1∑mxi)2=4mi=1∑mxi2−4(i=1∑mxi)2=4mi=1∑mxi2−4⋅m⋅m1⋅(i=1∑mxi)2=4mi=1∑mxi2−4m⋅xˉ⋅i=1∑mxi=4m(i=1∑mxi2−i=1∑mxixˉ)=4mi=1∑m(xi2−xixˉ)=4mi=1∑m(xi2−xixˉ−xixˉ+xixˉ)i=1∑mxixˉ=xˉi=1∑mxi=xˉ⋅m⋅m1⋅i=1∑mxi=mxˉ2=i=1∑mxˉ2=4mi=1∑m(xi2−xixˉ−xixˉ+xˉ2)=4mi=1∑m(xi−xˉ)2
故有:
A C − B 2 = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 ≥ 0 AC-B^{2} = 4 m \sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2} \geq 0 AC−B2=4mi=1∑m(xi−xˉ)2≥0
也即损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w和 b b b的凸函数,得证!
损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b求一阶偏导数:
∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − 1 ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (西瓜书式3.6) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} \tag{西瓜书式3.6} ∂b∂E(w,b)=∂b∂[i=1∑m(yi−(wxi+b))2]=i=1∑m∂b∂(yi−wxi−b)2=i=1∑m2⋅(yi−wxi−b)⋅(−1)=2(mb−i=1∑m(yi−wxi))(西瓜书式3.6)
损失函数 E ( w , b ) E(w, b) E(w,b)关于 w w w求一阶偏导数:
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (西瓜书式3.5) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=\frac{\partial}{\partial w}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) \\ &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) \end{aligned} \tag{西瓜书式3.5} ∂w∂E(w,b)=∂w∂[i=1∑m(yi−(wxi+b))2]=i=1∑m∂w∂(yi−wxi−b)2=i=1∑m2⋅(yi−wxi−b)⋅(−xi)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)(西瓜书式3.5)
令损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b的一阶偏导数等于0解出 b b b:
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) = 0 ⇒ m b − ∑ i = 1 m ( y i − w x i ) = 0 ⇒ b = 1 m ∑ i = 1 m ( y i − w x i ) = 1 m ∑ i = 1 m y i − w 1 m ∑ i = 1 m x i = y ˉ − w x ˉ (西瓜书式3.8) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) =0 \\ &\Rightarrow m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)=0 \\ & \begin{aligned} \Rightarrow b&=\frac{1}{m}\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \\ &=\frac{1}{m}\sum_{i=1}^{m} y_{i} - w \frac{1}{m}\sum_{i=1}^{m} x_{i} \\ &=\bar{y}-w\bar{x} \end{aligned} \end{aligned} \tag{西瓜书式3.8} ∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))=0⇒mb−i=1∑m(yi−wxi)=0⇒b=m1i=1∑m(yi−wxi)=m1i=1∑myi−wm1i=1∑mxi=yˉ−wxˉ(西瓜书式3.8)
令损失函数 E ( w , b ) E(w, b) E(w,b)关于 w w w的一阶偏导数等于0解出 w w w:
∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) = 0 ⇒ w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i = 0 ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − ∑ i = 1 m b x i b = y ˉ − w x ˉ ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − ∑ i = 1 m ( y ˉ − w x ˉ ) x i ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i + w x ˉ ∑ i = 1 m x i ⇒ w ∑ i = 1 m x i 2 − w x ˉ ∑ i = 1 m x i = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ⇒ w ( ∑ i = 1 m x i 2 − x ˉ ∑ i = 1 m x i ) = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ⇒ w = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ∑ i = 1 m x i 2 − x ˉ ∑ i = 1 m x i y ˉ ∑ i = 1 m x i = 1 m ∑ i = 1 m y i ∑ i = 1 m x i = x ˉ ∑ i = 1 m y i x ˉ ∑ i = 1 m x i = 1 m ∑ i = 1 m x i ∑ i = 1 m x i = 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i x i − x ˉ ∑ i = 1 m y i ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 (西瓜书式3.7) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) =0 \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}=0 \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2} = \sum_{i=1}^{m}y_{i} x_{i} - \sum_{i=1}^{m} b x_{i} \\ &\qquad b=\bar{y}-w\bar{x} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m}(\bar{y}-w \bar{x}) x_{i} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2} =\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}+w \bar{x} \sum_{i=1}^{m} x_{i} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}-w \bar{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ &\Rightarrow w\left(\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}\right)=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ &\begin{aligned} \Rightarrow w &= \frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \\ &\qquad \bar{y} \sum_{i=1}^{m} x_{i} = \frac{1}{m}\sum_{i=1}^{m} y_{i} \sum_{i=1}^{m} x_{i} = \bar{x} \sum_{i=1}^{m} y_{i} \\ &\qquad \bar{x}\sum_{i=1}^{m} x_{i} = \frac{1}{m}\sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} x_{i} = \frac{1}{m} \left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{x} \sum_{i=1}^{m} y_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \\ &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \end{aligned} \end{aligned} \tag{西瓜书式3.7} ∂w∂E(w,b)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)=0⇒wi=1∑mxi2−i=1∑m(yi−b)xi=0⇒wi=1∑mxi2=i=1∑myixi−i=1∑mbxib=yˉ−wxˉ⇒wi=1∑mxi2=i=1∑myixi−i=1∑m(yˉ−wxˉ)xi⇒wi=1∑mxi2=i=1∑myixi−yˉi=1∑mxi+wxˉi=1∑mxi⇒wi=1∑mxi2−wxˉi=1∑mxi=i=1∑myixi−yˉi=1∑mxi⇒w(i=1∑mxi2−xˉi=1∑mxi)=i=1∑myixi−yˉi=1∑mxi⇒w=∑i=1mxi2−xˉ∑i=1mxi∑i=1myixi−yˉ∑i=1mxiyˉi=1∑mxi=m1i=1∑myii=1∑mxi=xˉi=1∑myixˉi=1∑mxi=m1i=1∑mxii=1∑mxi=m1(i=1∑mxi)2=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myixi−xˉ∑i=1myi=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−xˉ)(西瓜书式3.7)
将 w w w向量化,有:
w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 1 m ( ∑ i = 1 m x i ) 2 = ( 1 m ∑ i = 1 m x i ) ∑ i = 1 m x i = x ˉ ∑ i = 1 m x i = ∑ i = 1 m x i x ˉ = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) = ∑ i = 1 m ( y i x i − y i x ˉ − y i x ˉ − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ − x i x ˉ ) ∑ i = 1 m y i x ˉ = x ˉ ∑ i = 1 m y i = 1 m ∑ i = 1 m x i ∑ i = 1 m y i = ∑ i = 1 m x i ⋅ 1 m ⋅ ∑ i = 1 m y i = ∑ i = 1 m x i y ˉ ∑ i = 1 m y i x ˉ = x ˉ ∑ i = 1 m y i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m y i = m x ˉ y ˉ = ∑ i = 1 m x ˉ y ˉ ∑ i = 1 m x i x ˉ = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 = ∑ i = 1 m ( y i x i − y i x ˉ − x i y ˉ − x ˉ y ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ − x ˉ 2 ) = ∑ i = 1 m ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 m ( x i − x ˉ ) 2 x = ( x 1 , x 2 , ⋯ , x m ) T y = ( y 1 , y 2 , ⋯ , y m ) T x d = ( x 1 − x ˉ , x 2 − x ˉ , ⋯ , x m − x ˉ ) T y d = ( y 1 − y ˉ , y 2 − y ˉ , ⋯ , y m − y ˉ ) T = x d T y d x d T x d \begin{aligned} w &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \\ &\qquad \frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2} = \left(\frac{1}{m} \sum_{i=1}^{m} x_{i}\right) \sum_{i=1}^{m} x_{i} = \bar{x} \sum_{i=1}^{m} x_{i} = \sum_{i=1}^{m} x_{i} \bar{x} \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}\right)} \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}-y_{i} \bar{x}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}-x_{i} \bar{x}\right)} \\ &\qquad \sum_{i=1}^{m} y_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} y_{i}=\frac{1}{m} \sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} x_{i} \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} x_{i} \bar{y} \\ &\qquad \sum_{i=1}^{m} y_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} y_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} y_{i}=m \bar{x} \bar{y}=\sum_{i=1}^{m} \bar{x} \bar{y} \\ &\qquad \sum_{i=1}^{m} x_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}-x_{i} \bar{y}-\bar{x}\bar{y}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}-\bar{x}^{2}\right)} \\ &=\frac{\sum_{i=1}^{m} \left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sum_{i=1}^{m} \left(x_{i}-\bar{x}\right)^{2}} \\ &\qquad x=\left(x_{1},x_{2},\cdots, x_{m}\right)^{T} \\ &\qquad y=\left(y_{1},y_{2},\cdots,y_{m}\right)^{T} \\ &\qquad x_{d}=\left(x_{1}-\bar{x},x_{2}-\bar{x},\cdots,x_{m}-\bar{x}\right)^{T} \\ &\qquad y_{d}=\left(y_{1}-\bar{y},y_{2}-\bar{y},\cdots,y_{m}-\bar{y}\right)^{T} \\ &=\frac{x_{d}^{T} y_{d}}{x_{d}^{T} x_{d}} \end{aligned} w=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−xˉ)m1(i=1∑mxi)2=(m1i=1∑mxi)i=1∑mxi=xˉi=1∑mxi=i=1∑mxixˉ=∑i=1m(xi2−xixˉ)∑i=1m(yixi−yixˉ)=∑i=1m(xi2−xixˉ−xixˉ−xixˉ)∑i=1m(yixi−yixˉ−yixˉ−yixˉ)i=1∑myixˉ=xˉi=1∑myi=m1i=1∑mxii=1∑myi=i=1∑mxi⋅m1⋅i=1∑myi=i=1∑mxiyˉi=1∑myixˉ=xˉi=1∑myi=xˉ⋅m⋅m1⋅i=1∑myi=mxˉyˉ=i=1∑mxˉyˉi=1∑mxixˉ=xˉi=1∑mxi=xˉ⋅m⋅m1⋅i=1∑mxi=mxˉ2=i=1∑mxˉ2=∑i=1m(xi2−xixˉ−xixˉ−xˉ2)∑i=1m(yixi−yixˉ−xiyˉ−xˉyˉ)=∑i=1m(xi−xˉ)2∑i=1m(xi−xˉ)(yi−yˉ)x=(x1,x2,⋯,xm)Ty=(y1,y2,⋯,ym)Txd=(x1−xˉ,x2−xˉ,⋯,xm−xˉ)Tyd=(y1−yˉ,y2−yˉ,⋯,ym−yˉ)T=xdTxdxdTyd
求解权重 w ^ \hat{w} w^的公式推导推导思路:
由最小二乘法导出损失函数 E w ^ E_{\hat{w}} Ew^
证明损失函数 E w ^ E_{\hat{w}} Ew^是关于 w ^ \hat{w} w^的凸函数
对损失函数 E w ^ E_{\hat{w}} Ew^关于 w ^ \hat{w} w^求一阶偏导数
令各自的一阶偏导数等于0解出 w ^ ∗ \hat{w}^{*} w^∗
f ( x i ) = w T x i + b = ( w 1 w 2 … w d ) ( x i 1 x i 2 ⋮ x i d ) + b = w 1 x i 1 + w 2 x i 2 + … + w d x i d + b w d + 1 = b = w 1 x i 1 + w 2 x i 2 + … + w d x i d + w d + 1 ⋅ 1 = ( w 1 w 2 … w d w d + 1 ) ( x i 1 x i 2 ⋮ x i d 1 ) = w ^ T x ^ i \begin{aligned} f\left(\boldsymbol{x}_{i}\right) &=\boldsymbol{w}^{T} \boldsymbol{x}_{i}+b \\ &=\left(\begin{array}{cccc} {w_{1}} & {w_{2}} & {\dots} & {w_{d}}\end{array}\right) \left(\begin{array}{c}{x_{i 1}} \\ {x_{i 2}} \\ {\vdots} \\ {x_{i d}}\end{array}\right)+b \\ &=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+b \\ &\qquad w_{d+1}=b \\ &=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+w_{d+1} \cdot 1 \\ &=\left(\begin{array}{ccccc} {w_{1}} & {w_{2}} & {\dots} & {w_{d}} & {w_{d+1}}\end{array}\right) \left(\begin{array}{c}{x_{i 1}} \\ {x_{i 2}} \\ {\vdots} \\ {x_{i d}} \\ 1\end{array}\right) \\ &=\hat{w}^{T}\hat{x}_{i} \end{aligned} f(xi)=wTxi+b=(w1w2…wd)⎝⎜⎜⎜⎛xi1xi2⋮xid⎠⎟⎟⎟⎞+b=w1xi1+w2xi2+…+wdxid+bwd+1=b=w1xi1+w2xi2+…+wdxid+wd+1⋅1=(w1w2…wdwd+1)⎝⎜⎜⎜⎜⎜⎛xi1xi2⋮xid1⎠⎟⎟⎟⎟⎟⎞=w^Tx^i
E w ^ = ∑ i = 1 m ( y i − f ( x ^ i ) ) 2 = ∑ m ( y i − w ^ T x ^ i ) 2 X = ( x 11 x 12 … x 1 d 1 x 21 x 22 … x 2 d 1 ⋮ ⋮ ⋱ ⋮ ⋮ x m 1 x m 2 … x m d 1 ) = ( x 1 T 1 x 2 T 1 ⋮ ⋮ x m T 1 ) = ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) y = ( y 1 , y 2 , ⋯ , y m ) T = ( y 1 − w ^ T x ^ 1 ) 2 + ( y 2 − w ^ T x ^ 2 ) 2 + ⋯ + ( y m − w ^ T x ^ m ) 2 = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( w ^ T x ^ 1 w ^ T x ^ 2 ⋮ w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ m T w ^ ) = ( y 1 y 2 ⋮ y m ) − ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) w ^ = y − X w ^ ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) T = ( y − X w ^ ) T = ( y − X w ^ ) T ( y − X w ^ ) \begin{aligned} E_{\hat{\boldsymbol{w}}} &=\sum_{i=1}^{m}\left(y_{i}-f\left(\hat{\boldsymbol{x}}_{i}\right)\right)^{2} \\ &=\sum^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{i}\right)^{2} \\ &\qquad \begin{aligned} &\mathbf{X} =\left(\begin{array}{ccccc} {x_{11}} & {x_{12}} & {\dots} & {x_{1 d}} & {1} \\ {x_{21}} & {x_{22}} & {\dots} & {x_{2 d}} & {1} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & {\vdots} \\ {x_{m 1}} & {x_{m 2}} & {\dots} & {x_{m d}} & {1} \end{array}\right) =\left(\begin{array}{cc} {\boldsymbol{x}_{1}^{\mathrm{T}}} & {1} \\ {\boldsymbol{x}_{2}^{\mathrm{T}}} & {1} \\ {\vdots} & {\vdots} \\ {\boldsymbol{x}_{m}^{\mathrm{T}}} & {1} \end{array}\right) =\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}} \end{array}\right) \\ &\boldsymbol{y}=\left(y_{1},y_{2},\cdots,y_{m}\right)^{T} \end{aligned} \\ &=\left(y_{1}-\hat{\boldsymbol{w}}^{T} \hat{x}_{1}\right)^{2} + \left(y_{2}-\hat{\boldsymbol{w}}^{T} \hat{x}_{2}\right)^{2} + \cdots + \left(y_{m}-\hat{\boldsymbol{w}}^{T} \hat{x}_{m}\right)^{2} \\ &=\left(\begin{array}{cccc} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} & {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} & {\cdots} & {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) \left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) \\ &\qquad \left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T} \hat{\boldsymbol{w}}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}} \end{array}\right) \hat{\boldsymbol{w}} =\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}} \\ &\qquad \left(\begin{array}{cccc} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} & {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} & {\cdots} & {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right)^{T} =\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right)^{T} \\ &=\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right)^{T}\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right) \end{aligned} Ew^=i=1∑m(yi−f(x^i))2=∑m(yi−w^Tx^i)2X=⎝⎜⎜⎜⎛x11x21⋮xm1x12x22⋮xm2……⋱…x1dx2d⋮xmd11⋮1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x1Tx2T⋮xmT11⋮1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x^1Tx^2T⋮x^mT⎠⎟⎟⎟⎞y=(y1,y2,⋯,ym)T=(y1−w^Tx^1)2+(y2−w^Tx^2)2+⋯+(ym−w^Tx^m)2=(y1−w^Tx^1y2−w^Tx^2⋯ym−w^Tx^m)⎝⎜⎜⎜⎛y1−w^Tx^1y2−w^Tx^2⋮ym−w^Tx^m⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛y1−w^Tx^1y2−w^Tx^2⋮ym