02. 一元线性回归公式推导

求解偏置b的公式推导

推导思路

Created with Raphaël 2.2.0 开始 由最小二乘法导出损失函数E(w,b) 证明损失函数E(w,b)是关于w和b的凸函数 对损失函数E(w,b)关于b求一阶偏导数 令一阶偏导数等于0求b 结束

由最小二乘法导出损失函数E(w,b)

最小二乘法是最小化均方误差来进行模型求解的,E(w,b)就是所有样本的均方误差加和,均方误差是真实值 y i y_{i} yi与模型的预测值 f ( x i ) f(x_{i}) f(xi)差的平方,即下面这个式子
E ( w , b ) = ∑ i = 1 m ( y i − f ( x i ) ) 2 E{(w,b)}=\sum{i=1}^{m}(y{i}-f(x{i}))^{2} E(w,b)=i=1m(yif(xi))2
其中 f ( x i ) = w x i + b f(x_{i})=wx_{i} + b f(xi)=wxi+b 带入上式得
E ( w , b ) = ∑ i = 1 m ( y i − ( w x i + b ) ) 2 E_{(w,b)}=\sum_{i=1}^{m}(y_{i}-(wx_{i} + b))^{2} E(w,b)=i=1m(yi(wxi+b))2
将括号去掉得
E ( w , b ) = ∑ i = 1 m ( y i − w x i − b ) 2 (1) E_{(w,b)}=\sum_{i=1}^{m}(y_{i}-wx_{i} - b)^{2}\tag{1} E(w,b)=i=1m(yiwxib)2(1)
(1)式即西瓜书3.4 argmin后面那一部分

证明损失函数E(w,b)是关于w和b的凸函数

看这个之前先看首先需要明白以下定理

  1. 二元函数判断凹凸性

    设f(x,y)在区域D上具有二阶连续偏导数,记 A = f x x ′ ′ f ( x , y ) A = f_{xx}^{''}f(x,y) A=fxxf(x,y), B = f x y ′ ′ f ( x , y ) B = f_{xy}^{''}f(x,y) B=fxyf(x,y), C = f y y ′ ′ f ( x , y ) C = f_{yy}^{''}f(x,y) C=fyyf(x,y),则

    • 在D上,恒有A>0,且 A C − B 2 ⩾ 0 AC - B^2\geqslant 0 ACB20时,f(x,y)在区域D上是凸函数
    • 在D上,恒有A>0,且 A C − B 2 ⩾ 0 AC - B^2\geqslant 0 ACB20时,f(x,y)在区域D上是凹函数
  2. 二元凹凸函数求最值

    设f(x,y)是在开区域D内具有连续偏函数的凸(或者凹)函数, ( x 0 , y 0 ) ∈ D (x_{0},y_{0})\in D (x0,y0)D f x ′ ( x 0 , y 0 ) = 0 f_{x}^{'}(x_{0},y_{0}) = 0 fx(x0,y0)=0, f y ′ ( x 0 , y 0 ) = 0 f_{y}^{'}(x_{0},y_{0}) = 0 fy(x0,y0)=0,则$f(x_{0},y_{0}) $必为f(x,y)在D内的最小值(或最大值)

根据上述定理,我们应该首先求出A、B、C
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (2) \begin{aligned} \frac{\partial E(w,b)}{\partial w} &= \frac{\partial }{\partial w}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial w}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-x_{i}) \\&= 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\tag{2} \end{aligned} wE(w,b)=w[i=1m(yiwxib)2]=i=1mw(yiwxib)2=i=1m2(yiwxib)(xi)=2(wi=1mxi2i=1m(yib)xi)(2)
(2)式就是西瓜书的式3.5

∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − 1 ) = 2 ( ∑ i = 1 m b − ∑ i = 1 m ( y i − w x i ) ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (3) \begin{aligned} \frac{\partial E(w,b)}{\partial b} &= \frac{\partial }{\partial b}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial b}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-1) \\&= 2(\sum_{i=1}^{m}b-\sum_{i=1}^{m}(y_{i}-wx_{i})) \\&= 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i}))\tag{3} \end{aligned} bE(w,b)=b[i=1m(yiwxib)2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(i=1mbi=1m(yiwxi))=2(mbi=1m(yiwxi))(3)
(3)此式就是西瓜书的式3.6
A = ∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = 2 ∑ i = 1 m x i 2 \begin{aligned} A &= \frac{\partial^2 E(w,b)}{\partial w^2} \\&= \frac{\partial }{\partial w}(\frac{\partial E(w,b)}{\partial w}) \\&= \frac{\partial }{\partial w} \left [2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\right ] \\&= 2\sum_{i=1}^{m}x_{i}^2 \end{aligned} A=w22E(w,b)=w(wE(w,b))=w[2(wi=1mxi2i=1m(yib)xi)]=2i=1mxi2

B = ∂ 2 E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = 2 ∑ i = 1 m x i \begin{aligned} B &= \frac{\partial^2 E(w,b)}{\partial w \partial b} \\&= \frac{\partial }{\partial b}(\frac{\partial E(w,b)}{\partial w}) \\&= \frac{\partial }{\partial b} \left [2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i})\right ] \\&= 2\sum_{i=1}^{m}x_{i} \end{aligned} B=wb2E(w,b)=b(wE(w,b))=b[2(wi=1mxi2i=1m(yib)xi)]=2i=1mxi

C = ∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = 2 m \begin{aligned} C &= \frac{\partial^2 E(w,b)}{\partial b^2} \\&= \frac{\partial }{\partial b}(\frac{\partial E(w,b)}{\partial b}) \\&= \frac{\partial }{\partial b} \left [2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i}))\right ] \\&= 2m \end{aligned} C=b22E(w,b)=b(bE(w,b))=b[2(mbi=1m(yiwxi))]=2m

此式A是大于0的,因为A如果等于0,则所有的 x i = 0 x_{i}=0 xi=0,没有意义

接下来看 A C − B 2 AC-B^2 ACB2
A C − B 2 = 2 m ⋅ 2 ∑ i = 1 m x i 2 − ( 2 ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ⋅ m ⋅ 1 m ⋅ ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) \begin{aligned} AC-B^2 &= 2m \cdot 2\sum_{i=1}^{m}x_{i}^2 - \left (2\sum_{i=1}^{m}x_{i} \right )^2 \\&= 4m\sum_{i=1}^{m}x_{i}^2 - 4\cdot m\cdot \frac{1}{m}\cdot \left (\sum_{i=1}^{m}x_{i} \right )^2 \\&= 4m\sum_{i=1}^{m}x_{i}^2 - 4m\cdot \bar{x}\cdot \sum_{i=1}^{m}x_{i} \\&= 4m\left (\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}x_{i}\bar{x} \right ) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x}) \end{aligned} ACB2=2m2i=1mxi2(2i=1mxi)2=4mi=1mxi24mm1(i=1mxi)2=4mi=1mxi24mxˉi=1mxi=4m(i=1mxi2i=1mxixˉ)=4mi=1m(xi2xixˉ)
又因
∑ i = 1 m ( x i x ˉ ) = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 \sum_{i=1}^{m}( x_{i}\bar{x}) = \bar{x}\sum_{i=1}^{m}x_{i} = \bar{x}\cdot m\cdot \frac{1}{m}\sum_{i=1}^{m}x_{i} = m \bar{x}^2 = \sum_{i=1}^{m}\bar{x}^2 i=1m(xixˉ)=xˉi=1mxi=xˉmm1i=1mxi=mxˉ2=i=1mxˉ2
所以
A C − B 2 = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − 2 x i x ˉ + x ˉ 2 ) = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} AC-B^2 &= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x}) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - x_{i}\bar{x} - x_{i}\bar{x} + x_{i}\bar{x}) \\&= 4m\sum_{i=1}^{m}(x_{i}^2 - 2x_{i}\bar{x} + \bar{x}^2) \\&= 4m\sum_{i=1}^{m}(x_{i} - \bar{x})^2 \end{aligned} ACB2=4mi=1m(xi2xixˉ)=4mi=1m(xi2xixˉxixˉ+xixˉ)=4mi=1m(xi22xixˉ+xˉ2)=4mi=1m(xixˉ)2
m是大于等于0的, ∑ i = 1 m ( x i − x ˉ ) 2 ⩾ 0 \sum_{i=1}^{m}(x_{i} - \bar{x})^2\geqslant 0 i=1m(xixˉ)20, 即 A C − B 2 ⩾ 0 AC-B^2 \geqslant 0 ACB20, 也即损失函数E(w,b)是关于w和b的凸函数

对损失函数E(w,b)关于b求一阶偏导数

∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − 1 ) = 2 ( ∑ i = 1 m b − ∑ i = 1 m ( y i − w x i ) ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) \begin{aligned} \frac{\partial E(w,b)}{\partial b} &= \frac{\partial }{\partial b}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial b}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-1) \\&= 2(\sum_{i=1}^{m}b-\sum_{i=1}^{m}(y_{i}-wx_{i})) \\&= 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i})) \end{aligned} bE(w,b)=b[i=1m(yiwxib)2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(i=1mbi=1m(yiwxi))=2(mbi=1m(yiwxi))

令一阶偏导数等于0求b

∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) = 0 \frac{\partial E(w,b)}{\partial b} = 2(mb-\sum_{i=1}^{m}(y_{i}-wx_{i})) = 0 bE(w,b)=2(mbi=1m(yiwxi))=0


b = 1 m ∑ i = 1 m ( y i − w x i ) (4) b = \frac{1}{m}\sum_{i=1}^{m}(y_{i}-wx_{i})\tag{4} b=m1i=1m(yiwxi)(4)
(4)式也就是西瓜书的式3.8

同时对b进行恒等变形得
b = 1 m ∑ i = 1 m ( y i − w x i ) = 1 m ∑ i = 1 m y i − w ⋅ 1 m ⋅ ∑ i = 1 m x i = y ˉ − w x ˉ \begin{aligned} b &= \frac{1}{m}\sum_{i=1}^{m}(y_{i}-wx_{i}) \\&= \frac{1}{m}\sum_{i=1}^{m}y_{i} - w\cdot \frac{1}{m}\cdot \sum_{i=1}^{m}x_{i} \\&= \bar{y} - w\bar{x} \end{aligned} b=m1i=1m(yiwxi)=m1i=1myiwm1i=1mxi=yˉwxˉ

求解导权重w的公式推导

推导思路

Created with Raphaël 2.2.0 开始 由最小二乘法导出损失函数E(w,b) 证明损失函数E(w,b)是关于w和b的凸函数 对损失函数E(w,b)关于w求一阶偏导数 令一阶偏导数等于0求w 结束

其中由最小二乘法导出损失函数E(w,b)和证明损失函数E(w,b)是关于w和b的凸函数上面已经做过了

而且已经求出了
∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ( y i − w x i − b ) ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) \begin{aligned} \frac{\partial E(w,b)}{\partial w} &= \frac{\partial }{\partial w}[\sum_{i=1}^{m}(y_{i} - wx_{i} - b)^2] \\&= \sum_{i = 1}^{m}\frac{\partial }{\partial w}(y_{i} - wx_{i} - b)^2 \\&= \sum_{i = 1}^{m}2(y_{i}-wx_{i}-b)(-x_{i}) \\&= 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i}) \end{aligned} wE(w,b)=w[i=1m(yiwxib)2]=i=1mw(yiwxib)2=i=1m2(yiwxib)(xi)=2(wi=1mxi2i=1m(yib)xi)

令一阶偏导数等于0求w

∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) = 0 \frac{\partial E(w,b)}{\partial w} = 2(w\sum_{i=1}^{m}x_{i}^2-\sum_{i=1}^{m}(y_{i} - b)x_{i}) = 0 wE(w,b)=2(wi=1mxi2i=1m(yib)xi)=0

w ∑ i = 1 m x i 2 = ∑ i = 1 m ( y i − b ) x i w\sum_{i=1}^{m}x_{i}^2=\sum_{i=1}^{m}(y_{i} - b)x_{i} wi=1mxi2=i=1m(yib)xi

b = y ˉ − w x ˉ b = \bar{y} - w\bar{x} b=yˉwxˉ代入得
w ∑ i = 1 m x i 2 = ∑ i = 1 m ( y i x i − y ˉ x i + w x ˉ x i ) w\sum_{i=1}^{m}x_{i}^2=\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i}+w\bar{x}x_{i}) wi=1mxi2=i=1m(yixiyˉxi+wxˉxi)

w ∑ i = 1 m x i 2 − w ∑ i = 1 m x ˉ x i = ∑ i = 1 m ( y i x i − y ˉ x i ) w\sum_{i=1}^{m}x_{i}^2 - w\sum_{i=1}^{m}\bar{x}x_{i}=\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i}) wi=1mxi2wi=1mxˉxi=i=1m(yixiyˉxi)

解得
w = ∑ i = 1 m ( y i x i − y ˉ x i ) ∑ i = 1 m x i 2 − ∑ i = 1 m x ˉ x i (5) w = \frac{\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i})}{\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}\bar{x}x_{i}}\tag{5} w=i=1mxi2i=1mxˉxii=1m(yixiyˉxi)(5)
又因
∑ i = 1 m x ˉ x i = x ˉ ∑ i = 1 m x i = 1 m ⋅ m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 1 m ( ∑ i = 1 m x i ) ( ∑ i = 1 m x i ) = 1 m ( ∑ i = 1 m x i ) 2 \begin{aligned} \sum_{i=1}^{m} \bar{x}x_{i} &= \bar{x}\sum_{i=1}^{m}x_{i} \\&=\frac{1}{m}\cdot m\cdot \bar{x}\cdot \sum_{i=1}^{m}x_{i} \\&= \frac{1}{m}(\sum_{i=1}^{m}x_{i})(\sum_{i=1}^{m}x_{i}) \\&= \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2 \end{aligned} i=1mxˉxi=xˉi=1mxi=m1mxˉi=1mxi=m1(i=1mxi)(i=1mxi)=m1(i=1mxi)2

∑ i = 1 m y ˉ x i = y ˉ ∑ i = 1 m x i = m ⋅ y ˉ ⋅ 1 m ⋅ ∑ i = 1 m x i = ∑ i = 1 m y i x ˉ \begin{aligned} \sum_{i=1}^{m} \bar{y}x_{i} &= \bar{y}\sum_{i=1}^{m}x_{i} \\&= m\cdot \bar{y}\cdot \frac{1}{m}\cdot \sum_{i=1}^{m}x_{i} \\&= \sum_{i=1}^{m}y_{i}\bar{x} \end{aligned} i=1myˉxi=yˉi=1mxi=myˉm1i=1mxi=i=1myixˉ

代入(5)式得
w = ∑ i = 1 m ( y i x i − y ˉ x i ) ∑ i = 1 m x i 2 − ∑ i = 1 m x ˉ x i = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 (6) \begin{aligned} w &= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - \bar{y}x_{i})}{\sum_{i=1}^{m}x_{i}^2 - \sum_{i=1}^{m}\bar{x}x_{i}} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} \\&= \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2}\tag{6} \end{aligned} w=i=1mxi2i=1mxˉxii=1m(yixiyˉxi)=i=1mxi2m1(i=1mxi)2i=1m(yixiyixˉ)=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ)(6)

(6)式即西瓜书公式3.7

向量化w

什么是向量化,观察 w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 w = \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} w=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ) ,分子分母都有累加的式子,而这些累加的式子很像向量的点乘后的结果,将累加的式子抽象成向量的点乘的过程,就是向量化

为什么要向量化,分子分母这些累加式子翻译成python代码,就是for循环,循环的次数和样本数量m成正比,时间复杂度比较高,向量化之后,就可以直接使用Numpy进行计算,Numpy,提供了大量矩阵运算相关的函数,对这些函数在内部实现进行了优化,使其性能提升了很多,这就是向量化的原因

首先对w进行恒等变形
w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) ( ∑ i = 1 m x i ) = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) (7) \begin{aligned} w &= \frac{\sum_{i=1}^{m}y_{i}(x_{i} - \bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})^2} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 - \frac{1}{m}(\sum_{i=1}^{m}x_{i})(\sum_{i=1}^{m}x_{i})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}x_{i}^2 -\sum_{i=1}^{m}x_{i}\bar{x}} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x})}\tag{7} \end{aligned} w=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ)=i=1mxi2m1(i=1mxi)(i=1mxi)i=1m(yixiyixˉ)=i=1mxi2i=1mxixˉi=1m(yixiyixˉ)=i=1m(xi2xixˉ)i=1m(yixiyixˉ)(7)

同时有
∑ i = 1 m y i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m y i = ∑ i = 1 m x i y ˉ (8) \begin{aligned} \sum_{i=1}^{m}y_{i}\bar{x} = m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}y_{i} =\sum_{i=1}^{m}x_{i} \bar{y}\tag{8} \end{aligned} i=1myixˉ=mxˉm1i=1myi=i=1mxiyˉ(8)

∑ i = 1 m y i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m y i = m ⋅ x ˉ ⋅ y ˉ = ∑ i = 1 m x ˉ y ˉ (9) \begin{aligned} \sum_{i=1}^{m}y_{i}\bar{x}= m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}y_{i} = m \cdot \bar{x}\cdot\bar{y}=\sum_{i=1}^{m}\bar{x} \bar{y}\tag{9} \end{aligned} i=1myixˉ=mxˉm1i=1myi=mxˉyˉ=i=1mxˉyˉ(9)

∑ i = 1 m x i x ˉ = m ⋅ x ˉ ⋅ 1 m ∑ i = 1 m x i = m ⋅ x ˉ ⋅ x ˉ = ∑ i = 1 m x ˉ 2 (10) \begin{aligned} \sum_{i=1}^{m}x_{i}\bar{x}= m \cdot \bar{x}\cdot \frac{1}{m}\sum_{i=1}^{m}x_{i} = m \cdot \bar{x}\cdot\bar{x}=\sum_{i=1}^{m}\bar{x}^2\tag{10} \end{aligned} i=1mxixˉ=mxˉm1i=1mxi=mxˉxˉ=i=1mxˉ2(10)

将(8)(9)(10)三个式子代入(7)式再次对w进行恒等变形
w = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) = ∑ i = 1 m ( y i x i − y i x ˉ − y i x ˉ + y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = ∑ i = 1 m ( y i x i − y i x ˉ − x i y ˉ + x ˉ y ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x ˉ 2 ) = ∑ i = 1 m ( y i − y ˉ ) ( x i − x ˉ ) ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} w &= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x} - y_{i}\bar{x} + y_{i}\bar{x})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x}-x_{i}\bar{x}+x_{i}\bar{x})} \\&= \frac{\sum_{i=1}^{m}(y_{i}x_{i} - y_{i}\bar{x} - x_{i}\bar{y} + \bar{x}\bar{y})}{\sum_{i=1}^{m}(x_{i}^2 -x_{i}\bar{x}-x_{i}\bar{x}+\bar{x}^2)} \\&= \frac{\sum_{i=1}^{m}(y_{i}-\bar{y})(x_{i}-\bar{x})}{\sum_{i=1}^{m}(x_{i}-\bar{x})^2} \end{aligned} w=i=1m(xi2xixˉ)i=1m(yixiyixˉ)=i=1m(xi2xixˉxixˉ+xixˉ)i=1m(yixiyixˉyixˉ+yixˉ)=i=1m(xi2xixˉxixˉ+xˉ2)i=1m(yixiyixˉxiyˉ+xˉyˉ)=i=1m(xixˉ)2i=1m(yiyˉ)(xixˉ)

定义向量化时需要使用的向量
x ⃗ = ( x 1 x 2 . . . x m ) T \vec{x} = \begin{pmatrix} x_{1} &x_{2} &... & x_{m} \end{pmatrix} ^{\mathrm{T}} x =(x1x2...xm)T

x d ⃗ = ( x 1 − x ˉ x 2 − x ˉ . . . x m − x ˉ ) T \vec{x_{d}} = \begin{pmatrix} x_{1}-\bar{x} &x_{2}-\bar{x} &... & x_{m} -\bar{x} \end{pmatrix} ^{\mathrm{T}} xd =(x1xˉx2xˉ...xmxˉ)T

y ⃗ = ( y 1 y 2 . . . y m ) T \vec{y} = \begin{pmatrix} y_{1} &y_{2} &... & y_{m} \end{pmatrix} ^{\mathrm{T}} y =(y1y2...ym)T

y d ⃗ = ( y 1 − y ˉ y 2 − y ˉ . . . y m − y ˉ ) T \vec{y_{d}} = \begin{pmatrix} y_{1}-\bar{y} &y_{2}-\bar{y} &... & y_{m} -\bar{y} \end{pmatrix} ^{\mathrm{T}} yd =(y1yˉy2yˉ...ymyˉ)T

对w进行向量化得
w = x d ⃗ T y d ⃗ x d ⃗ T x d ⃗ w = \frac{ \vec{x_{d}}^{\mathrm{T}} \vec{y_{d}}}{\vec{x_{d}}^{\mathrm{T}} \vec{x_{d}}} w=xd Txd xd Tyd

你可能感兴趣的:(西瓜书公式推导)