最小二乘法是最小化均方误差来进行模型求解的,E(w,b)就是所有样本的均方误差加和,均方误差是真实值 y i y_{i} yi与模型的预测值 f ( x i ) f(x_{i}) f(xi)差的平方,即下面这个式子
E ( w , b ) = ∑ i = 1 m ( y i − f ( x i ) ) 2 E{(w,b)}=\sum{i=1}^{m}(y{i}-f(x{i}))^{2} E(w,b)=∑i=1m(yi−f(xi))2
其中 f ( x i ) = w x i + b f(x_{i})=wx_{i} + b f(xi)=wxi+b 带入上式得
E ( w , b ) = ∑ i = 1 m ( y i − ( w x i + b ) ) 2 E_{(w,b)}=\sum_{i=1}^{m}(y_{i}-(wx_{i} + b))^{2} E(w,b)=i=1∑m(yi−(wxi+b))2
将括号去掉得
E ( w , b ) = ∑ i = 1 m ( y i − w x i − b ) 2 (1) E_{(w,b)}=\sum_{i=1}^{m}(y_{i}-wx_{i} - b)^{2}\tag{1} E(w,b)=i=1∑m(yi−wxi−b)2(1)
(1)式即西瓜书3.4 argmin后面那一部分
看这个之前先看首先需要明白以下定理
二元函数判断凹凸性
设f(x,y)在区域D上具有二阶连续偏导数,记 A = f x x ′ ′ f ( x , y ) A = f_{xx}^{''}f(x,y) A=fxx′′f(x,y), B = f x y ′ ′ f ( x , y ) B = f_{xy}^{''}f(x,y) B=fxy′′f(x,y), C = f y y ′ ′ f ( x , y ) C = f_{yy}^{''}f(x,y) C=fyy′′f(x,y),则
二元凹凸函数求最值
设f(x,y)是在开区域D内具有连续偏函数的凸(或者凹)函数, ( x 0 , y 0 ) ∈ D (x_{0},y_{0})\in D (x0,y0)∈D且 f x ′ ( x 0 , y 0 ) = 0 f_{x}^{'}(x_{0},y_{0}) = 0 fx′(x0,y0)=0, f y ′ ( x 0 , y 0 ) = 0 f_{y}^{'}(x_{0},y_{0}) = 0 fy′(x0,y0)=0,则$f(x_{0},y_{0}) $必为f(x,y)在D内的最小值(或最大值)
根据上述定理,我们应该首先求出A、B、C
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (2) \begin{aligned} \frac{\partial E(w,b)}{\partial w} &= \frac{\partial }{\partial w}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial w}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-x_{i}) \\&= 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\tag{2} \end{aligned} ∂w∂E(w,b)=∂w∂[i=1∑m(yi−wxi−b)2]=i=1∑m∂w∂(yi−wxi−b)2=i=1∑m2(yi−wxi−b)(−xi)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)(2)
(2)式就是西瓜书的式3.5
∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − 1 ) = 2 ( ∑ i = 1 m b − ∑ i = 1 m ( y i − w x i ) ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (3) \begin{aligned} \frac{\partial E(w,b)}{\partial b} &= \frac{\partial }{\partial b}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial b}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-1) \\&= 2(\sum_{i=1}^{m}b-\sum_{i=1}^{m}(y_{i}-wx_{i})) \\&= 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i}))\tag{3} \end{aligned} ∂b∂E(w,b)=∂b∂[i=1∑m(yi−wxi−b)2]=i=1∑m∂b∂(yi−wxi−b)2=i=1∑m2(yi−wxi−b)(−1)=2(i=1∑mb−i=1∑m(yi−wxi))=2(mb−i=1∑m(yi−wxi))(3)
(3)此式就是西瓜书的式3.6
A = ∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = 2 ∑ i = 1 m x i 2 \begin{aligned} A &= \frac{\partial^2 E(w,b)}{\partial w^2} \\&= \frac{\partial }{\partial w}(\frac{\partial E(w,b)}{\partial w}) \\&= \frac{\partial }{\partial w} \left [2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\right ] \\&= 2\sum_{i=1}^{m}x_{i}^2 \end{aligned} A=∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=2i=1∑mxi2
B = ∂ 2 E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = 2 ∑ i = 1 m x i \begin{aligned} B &= \frac{\partial^2 E(w,b)}{\partial w \partial b} \\&= \frac{\partial }{\partial b}(\frac{\partial E(w,b)}{\partial w}) \\&= \frac{\partial }{\partial b} \left [2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\right ] \\&= 2\sum_{i=1}^{m}x_{i} \end{aligned} B=∂w∂b∂2E(w,b)=∂b∂(∂w∂E(w,b))=∂b∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=2i=1∑mxi
C = ∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = 2 m \begin{aligned} C &= \frac{\partial^2 E(w,b)}{\partial b^2} \\&= \frac{\partial }{\partial b}(\frac{\partial E(w,b)}{\partial b}) \\&= \frac{\partial }{\partial b} \left [2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i}))\right ] \\&= 2m \end{aligned} C=∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂[2(mb−i=1∑m(yi−wxi))]=2m
此式A是大于0的,因为A如果等于0,则所有的 x i = 0 x_{i}=0 xi=0,没有意义
接下来看 A C − B 2 AC-B^2 AC−B2
A C − B 2 = 2 m ⋅ 2 ∑ i = 1 m x i 2 − ( 2 ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ⋅ m ⋅ 1 m ⋅ ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) \begin{aligned} AC-B^2 &= 2m \cdot 2\sum_{i=1}^{m}x_{i}^2 - \left (2\sum_{i=1}^{m}x_{i} \right )^2 \\&= 4m\sum_{i=1}^{m}x_{i}^2 - 4\cdot m\cdot \frac{1}{m}\cdot \left (\sum_{i=1}^{m}x_{i} \right )^2 \\&= 4m\sum_{i=1}^{m}x_{i}^2 - 4m\cdot \bar{x}\cdot \sum_{i=1}^{m}x_{i} \\&= 4m\left (\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}x_{i}\bar{x} \right ) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x}) \end{aligned} AC−B2=2m⋅2i=1∑mxi2−(2i=1∑mxi)2=4mi=1∑mxi2−4⋅m⋅m1⋅(i=1∑mxi)2=4mi=1∑mxi2−4m⋅xˉ⋅i=1∑mxi=4m(i=1∑mxi2−i=1∑mxixˉ)=4mi=1∑m(xi2−xixˉ)
又因
∑ i = 1 m ( x i x ˉ ) = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 \sum_{i=1}^{m}( x_{i}\bar{x}) = \bar{x}\sum_{i=1}^{m}x_{i} = \bar{x}\cdot m\cdot \frac{1}{m}\sum_{i=1}^{m}x_{i} = m \bar{x}^2 = \sum_{i=1}^{m}\bar{x}^2 i=1∑m(xixˉ)=xˉi=1∑mxi=xˉ⋅m⋅m1i=1∑mxi=mxˉ2=i=1∑mxˉ2
所以
A C − B 2 = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − 2 x i x ˉ + x ˉ 2 ) = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} AC-B^2 &= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x}) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x} - x_{i}\bar{x} + x_{i}\bar{x}) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - 2x_{i}\bar{x} + \bar{x}^2) \\&= 4m\sum_{i=1}^{m}(x_{i} - \bar{x})^2 \end{aligned} AC−B2=4mi=1∑m(xi2−xixˉ)=4mi=1∑m(xi2−xixˉ−xixˉ+xixˉ)=4mi=1∑m(xi2−2xixˉ+xˉ2)=4mi=1∑m(xi−xˉ)2
m是大于等于0的, ∑ i = 1 m ( x i − x ˉ ) 2 ⩾ 0 \sum_{i=1}^{m}(x_{i} - \bar{x})^2\geqslant 0 ∑i=1m(xi−xˉ)2⩾0, 即 A C − B 2 ⩾ 0 AC-B^2 \geqslant 0 AC−B2⩾0, 也即损失函数E(w,b)是关于w和b的凸函数
∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − 1 ) = 2 ( ∑ i = 1 m b − ∑ i = 1 m ( y i − w x i ) ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) \begin{aligned} \frac{\partial E(w,b)}{\partial b} &= \frac{\partial }{\partial b}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial b}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-1) \\&= 2(\sum_{i=1}^{m}b-\sum_{i=1}^{m}(y_{i}-wx_{i})) \\&= 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i})) \end{aligned} ∂b∂E(w,b)=∂b∂[i=1∑m(yi−wxi−b)2]=i=1∑m∂b∂(yi−wxi−b)2=i=1∑m2(yi−wxi−b)(−1)=2(i=1∑mb−i=1∑m(yi−wxi))=2(mb−i=1∑m(yi−wxi))
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) = 0 \frac{\partial E(w,b)}{\partial b} = 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i})) = 0 ∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))=0
得
b = 1 m ∑ i = 1 m ( y i − w x i ) (4) b = \frac{1}{m}\sum_{i=1}^{m}(y_{i}-wx_{i})\tag{4} b=m1i=1∑m(yi−wxi)(4)
(4)式也就是西瓜书的式3.8
同时对b进行恒等变形得
b = 1 m ∑ i = 1 m ( y i − w x i ) = 1 m ∑ i = 1 m y i − w ⋅ 1 m ⋅ ∑ i = 1 m x i = y ˉ − w x ˉ \begin{aligned} b &= \frac{1}{m}\sum_{i=1}^{m}(y_{i}-wx_{i}) \\&= \frac{1}{m}\sum_{i=1}^{m}y_{i} - w\cdot \frac{1}{m}\cdot \sum_{i=1}^{m}x_{i} \\&= \bar{y} - w\bar{x} \end{aligned} b=m1i=1∑m(yi−wxi)=m1i=1∑myi−w⋅m1⋅i=1∑mxi=yˉ−wxˉ
其中由最小二乘法导出损失函数E(w,b)和证明损失函数E(w,b)是关于w和b的凸函数上面已经做过了
而且已经求出了
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) \begin{aligned} \frac{\partial E(w,b)}{\partial w} &= \frac{\partial }{\partial w}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial w}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-x_{i}) \\&= 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i}) \end{aligned} ∂w∂E(w,b)=∂w∂[i=1∑m(yi−wxi−b)2]=i=1∑m∂w∂(yi−wxi−b)2=i=1∑m2(yi−wxi−b)(−xi)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)
∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) = 0 \frac{\partial E(w,b)}{\partial w} = 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i}) = 0 ∂w∂E(w,b)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)=0
w ∑ i = 1 m x i 2 = ∑ i = 1 m ( y i − b ) x i w\sum_{i=1}^{m}x_{i}^2=\sum_{i=1}^{m}(y_{i} - b)x_{i} wi=1∑mxi2=i=1∑m(yi−b)xi
将 b = y ˉ − w x ˉ b = \bar{y} - w\bar{x} b=yˉ−wxˉ代入得
w ∑ i = 1 m x i 2 = ∑ i = 1 m ( y i x i − y ˉ x i + w x ˉ x i ) w\sum_{i=1}^{m}x_{i}^2=\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i}+w\bar{x}x_{i}) wi=1∑mxi2=i=1∑m(yixi−yˉxi+wxˉxi)
w ∑ i = 1 m x i 2 − w ∑ i = 1 m x ˉ x i = ∑ i = 1 m ( y i x i − y ˉ x i ) w\sum_{i=1}^{m}x_{i}^2 - w\sum_{i=1}^{m}\bar{x}x_{i}=\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i}) wi=1∑mxi2−wi=1∑mxˉxi=i=1∑m(yixi−yˉxi)
解得
w = ∑ i = 1 m ( y i x i − y ˉ x i ) ∑ i = 1 m x i 2 − ∑ i = 1 m x ˉ x i (5) w = \frac{\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i})}{\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}\bar{x}x_{i}}\tag{5} w=∑i=1mxi2−∑i=1mxˉxi∑i=1m(yixi−yˉxi)(5)
又因
∑ i = 1 m x ˉ x i = x ˉ ∑ i = 1 m x i = 1 m ⋅ m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 1 m ( ∑ i = 1 m x i ) ( ∑ i = 1 m x i ) = 1 m ( ∑ i = 1 m x i ) 2 \begin{aligned} \sum_{i=1}^{m} \bar{x}x_{i} &= \bar{x}\sum_{i=1}^{m}x_{i} \\&=\frac{1}{m}\cdot m\cdot \bar{x}\cdot \sum_{i=1}^{m}x_{i} \\&= \frac{1}{m}(\sum_{i=1}^{m}x_{i})(\sum_{i=1}^{m}x_{i}) \\&= \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2 \end{aligned} i=1∑mxˉxi=xˉi=1∑mxi=m1⋅m⋅xˉ⋅i=1∑mxi=m1(i=1∑mxi)(i=1∑mxi)=m1(i=1∑mxi)2
∑ i = 1 m y ˉ x i = y ˉ ∑ i = 1 m x i = m ⋅ y ˉ ⋅ 1 m ⋅ ∑ i = 1 m x i = ∑ i = 1 m y i x ˉ \begin{aligned} \sum_{i=1}^{m} \bar{y}x_{i} &= \bar{y}\sum_{i=1}^{m}x_{i} \\&= m\cdot \bar{y}\cdot \frac{1}{m}\cdot \sum_{i=1}^{m}x_{i} \\&= \sum_{i=1}^{m}y_{i}\bar{x} \end{aligned} i=1∑myˉxi=yˉi=1∑mxi=m⋅yˉ⋅m1⋅i=1∑mxi=i=1∑myixˉ
代入(5)式得
w = ∑ i = 1 m ( y i x i − y ˉ x i ) ∑ i = 1 m x i 2 − ∑ i = 1 m x ˉ x i = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 (6) \begin{aligned} w &= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i})}{\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}\bar{x}x_{i}} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} \\&= \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2}\tag{6} \end{aligned} w=∑i=1mxi2−∑i=1mxˉxi∑i=1m(yixi−yˉxi)=∑i=1mxi2−m1(∑i=1mxi)2∑i=1m(yixi−yixˉ)=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−xˉ)(6)
(6)式即西瓜书公式3.7
什么是向量化,观察 w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 w = \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} w=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−xˉ) ,分子分母都有累加的式子,而这些累加的式子很像向量的点乘后的结果,将累加的式子抽象成向量的点乘的过程,就是向量化
为什么要向量化,分子分母这些累加式子翻译成python代码,就是for循环,循环的次数和样本数量m成正比,时间复杂度比较高,向量化之后,就可以直接使用Numpy进行计算,Numpy,提供了大量矩阵运算相关的函数,对这些函数在内部实现进行了优化,使其性能提升了很多,这就是向量化的原因
首先对w进行恒等变形
w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) ( ∑ i = 1 m x i ) = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) (7) \begin{aligned} w &= \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})(\sum_{i=1}^{m}x_{i})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 -\sum_{i=1}^{m}x_{i}\bar{x}} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x})}\tag{7} \end{aligned} w=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−xˉ)=∑i=1mxi2−m1(∑i=1mxi)(∑i=1mxi)∑i=1m(yixi−yixˉ)=∑i=1mxi2−∑i=1mxixˉ∑i=1m(yixi−yixˉ)=∑i=1m(xi2−xixˉ)∑i=1m(yixi−yixˉ)(7)
同时有
∑ i = 1 m y i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m y i = ∑ i = 1 m x i y ˉ (8) \begin{aligned} \sum_{i=1}^{m}y_{i}\bar{x} = m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}y_{i} =\sum_{i=1}^{m}x_{i} \bar{y}\tag{8} \end{aligned} i=1∑myixˉ=m⋅xˉ⋅m1i=1∑myi=i=1∑mxiyˉ(8)
∑ i = 1 m y i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m y i = m ⋅ x ˉ ⋅ y ˉ = ∑ i = 1 m x ˉ y ˉ (9) \begin{aligned} \sum_{i=1}^{m}y_{i}\bar{x}= m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}y_{i} = m \cdot \bar{x}\cdot\bar{y}=\sum_{i=1}^{m}\bar{x} \bar{y}\tag{9} \end{aligned} i=1∑myixˉ=m⋅xˉ⋅m1i=1∑myi=m⋅xˉ⋅yˉ=i=1∑mxˉyˉ(9)
∑ i = 1 m x i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m x i = m ⋅ x ˉ ⋅ x ˉ = ∑ i = 1 m x ˉ 2 (10) \begin{aligned} \sum_{i=1}^{m}x_{i}\bar{x}= m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}x_{i} = m \cdot \bar{x}\cdot\bar{x}=\sum_{i=1}^{m}\bar{x}^2\tag{10} \end{aligned} i=1∑mxixˉ=m⋅xˉ⋅m1i=1∑mxi=m⋅xˉ⋅xˉ=i=1∑mxˉ2(10)
将(8)(9)(10)三个式子代入(7)式再次对w进行恒等变形
w = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) = ∑ i = 1 m ( y i x i − y i x ˉ − y i x ˉ + y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = ∑ i = 1 m ( y i x i − y i x ˉ − x i y ˉ + x ˉ y ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x ˉ 2 ) = ∑ i = 1 m ( y i − y ˉ ) ( x i − x ˉ ) ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} w &= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x} - y_{i}\bar{x} + y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x}-x_{i}\bar{x}+x_{i}\bar{x})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x} - x_{i}\bar{y} + \bar{x}\bar{y})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x}-x_{i}\bar{x}+\bar{x}^2)} \\&= \frac{\sum_{i=1}^{m}(y_{i}-\bar{y})(x_{i}-\bar{x})}{\sum_{i=1}^{m}(x_{i}-\bar{x})^2} \end{aligned} w=∑i=1m(xi2−xixˉ)∑i=1m(yixi−yixˉ)=∑i=1m(xi2−xixˉ−xixˉ+xixˉ)∑i=1m(yixi−yixˉ−yixˉ+yixˉ)=∑i=1m(xi2−xixˉ−xixˉ+xˉ2)∑i=1m(yixi−yixˉ−xiyˉ+xˉyˉ)=∑i=1m(xi−xˉ)2∑i=1m(yi−yˉ)(xi−xˉ)
定义向量化时需要使用的向量
x ⃗ = ( x 1 x 2 . . . x m ) T \vec{x} = \begin{pmatrix} x_{1} &x_{2} &... & x_{m} \end{pmatrix} ^{\mathrm{T}} x=(x1x2...xm)T
x d ⃗ = ( x 1 − x ˉ x 2 − x ˉ . . . x m − x ˉ ) T \vec{x_{d}} = \begin{pmatrix} x_{1}-\bar{x} &x_{2}-\bar{x} &... & x_{m} -\bar{x} \end{pmatrix} ^{\mathrm{T}} xd=(x1−xˉx2−xˉ...xm−xˉ)T
y ⃗ = ( y 1 y 2 . . . y m ) T \vec{y} = \begin{pmatrix} y_{1} &y_{2} &... & y_{m} \end{pmatrix} ^{\mathrm{T}} y=(y1y2...ym)T
y d ⃗ = ( y 1 − y ˉ y 2 − y ˉ . . . y m − y ˉ ) T \vec{y_{d}} = \begin{pmatrix} y_{1}-\bar{y} &y_{2}-\bar{y} &... & y_{m} -\bar{y} \end{pmatrix} ^{\mathrm{T}} yd=(y1−yˉy2−yˉ...ym−yˉ)T
对w进行向量化得
w = x d ⃗ T y d ⃗ x d ⃗ T x d ⃗ w = \frac{ \vec{x_{d}}^{\mathrm{T}} \vec{y_{d}}}{\vec{x_{d}}^{\mathrm{T}} \vec{x_{d}}} w=xdTxdxdTyd