西瓜书——线性模型笔记

西瓜书——线性模型笔记

  • 1. 一元线性回归
    • 1.1. 由最小二乘法导出损失函数 E ( w , b ) E(w, b) E(w,b)
    • 1.2. 证明损失函数
      • 1.2.1. 二元函数判断凹凸性:
      • 1.2.2. 二元凹凸函数求最值:
      • 1.2.3. 证明
    • 1.3. 分别对损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b w w w求一阶偏导数
    • 1.4. 令各自的一阶偏导数等于0解出 b b b w w w
  • 2. 二元线性回归
    • 2.1. 将 w w w b b b组合成 w ^ \hat{w} w^
    • 2.2. 由最小二乘法导出损失函数 E w ^ E_{\hat{w}} Ew^
    • 2.3. 证明损失函数 E w ^ E_{\hat{w}} Ew^是关于 w ^ \hat{w} w^的凸函数
    • 2.4. 对损失函数 E w ^ E_{\hat{w}} Ew^关于 w ^ \hat{w} w^求一阶偏导数
    • 2.5. 令一阶偏导数等于0解出 w ^ ∗ \hat{w}^{*} w^
  • 3. 广义线性模型
    • 3.1. 指数族分布
    • 3.2. 广义线性模型的三条假设
  • 4. 对数几率回归
    • 4.1. 对数几率回归的广义线性模型推导
    • 4.2. 极大似然估计法
    • 4.3. 对数几率回归的参数估计


线性模型

Lei_ZM
2019-09-10



1. 一元线性回归

求解偏置 b b b和权重 w w w推导思路

  1. 由最小二乘法导出损失函数 E ( w , b ) E(w, b) E(w,b)

  2. 证明损失函数

  3. 分别对损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b w w w求一阶偏导数

  4. 令各自的一阶偏导数等于0解出 b b b w w w


1.1. 由最小二乘法导出损失函数 E ( w , b ) E(w, b) E(w,b)

E ( w , b ) = ∑ i = 1 m ( y i − f ( x i ) ) 2 = ∑ i = 1 m ( y i − ( w x i + b ) ) 2 = ∑ i = 1 m ( y i − w x i − b ) 2 (西瓜书式3.4) \begin{aligned} E_{(w, b)} &=\sum_{i=1}^{m}\left(y_{i}-f\left(x_{i}\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} \tag{西瓜书式3.4} E(w,b)=i=1m(yif(xi))2=i=1m(yi(wxi+b))2=i=1m(yiwxib)2(西3.4)



1.2. 证明损失函数

1.2.1. 二元函数判断凹凸性:

f ( x , y ) f(x, y) f(x,y)在区域 D D D上具有二阶连续偏导数,记 A = f x x ′ ′ ( x , y ) A=f_{x x}^{\prime \prime}(x, y) A=fxx(x,y) B = f x y ′ ′ ( x , y ) B=f_{x y}^{\prime \prime}(x, y) B=fxy(x,y) C = f y y ′ ′ ( x , y ) C=f_{y y}^{\prime \prime}(x, y) C=fyy(x,y)。则:

  1. D D D上恒有 A > 0 A>0 A>0,且 A C − B 2 ≥ 0 AC-B^{2}\geq 0 ACB20时, f ( x , y ) f(x, y) f(x,y)在区域 D D D上是凸函数
  2. D D D上恒有 A < 0 A<0 A<0,且 A C − B 2 ≥ 0 AC-B^{2}\geq 0 ACB20时, f ( x , y ) f(x, y) f(x,y)在区域 D D D上是凹函数

1.2.2. 二元凹凸函数求最值:

f ( x , y ) f(x, y) f(x,y)是在开区域 D D D内具有连续偏导数的凸(或者凹)函数, ( x 0 , y 0 ) ∈ D (x_{0}, y_{0})\in D (x0,y0)D,且 f x ′ ( x 0 , y 0 ) = 0 f_{x}^{\prime}(x_{0}, y_{0})=0 fx(x0,y0)=0 f y ′ ( x 0 , y 0 ) = 0 f_{y}^{\prime}(x_{0}, y_{0})=0 fy(x0,y0)=0,则 f ( x 0 , y 0 ) f(x_{0}, y_{0}) f(x0,y0)必为 f ( x , y ) f(x, y) f(x,y) D D D内的最小值(或最大值)。


1.2.3. 证明

证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w b b b的凸函数——求 A = f x x ′ ′ ( x , y ) A=f_{xx}^{\prime \prime}(x, y) A=fxx(x,y)

∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (西瓜书式3.5) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=\frac{\partial}{\partial w}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) \\ &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) \end{aligned} \tag{西瓜书式3.5} wE(w,b)=w[i=1m(yi(wxi+b))2]=i=1mw(yiwxib)2=i=1m2(yiwxib)(xi)=2(wi=1mxi2i=1m(yib)xi)(西3.5)

故有:

∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ w [ 2 w ∑ i = 1 m x i 2 ] = 2 ∑ i = 1 m x i 2 \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial w}\left[2 w \sum_{i=1}^{m} x_{i}^{2}\right] \\ &=2 \sum_{i=1}^{m} x_{i}^{2} \end{aligned} w22E(w,b)=w(wE(w,b))=w[2(wi=1mxi2i=1m(yib)xi)]=w[2wi=1mxi2]=2i=1mxi2

此式即为 A = f x x ′ ′ ( x , y ) A=f_{xx}^{\prime \prime}(x, y) A=fxx(x,y)

证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w b b b的凸函数——求 B = f x y ′ ′ ( x , y ) B=f_{xy}^{\prime \prime}(x, y) B=fxy(x,y)

∂ 2 E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ b [ − 2 ∑ i = 1 m ( y i − b ) x i ] = ∂ ∂ b ( − 2 ∑ i = 1 m y i x i + 2 ∑ i = 1 m b x i ) = ∂ ∂ b ( 2 ∑ i = 1 m b x i ) = 2 ∑ i = 1 m x i \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w \partial b} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial b}\left[-2 \sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right] \\ &=\frac{\partial}{\partial b}\left(-2 \sum_{i=1}^{m} y_{i} x_{i}+2 \sum_{i=1}^{m} b x_{i}\right) \\ &=\frac{\partial}{\partial b}\left(2 \sum_{i=1}^{m} b x_{i}\right) \\ &=2 \sum_{i=1}^{m} x_{i} \end{aligned} wb2E(w,b)=b(wE(w,b))=b[2(wi=1mxi2i=1m(yib)xi)]=b[2i=1m(yib)xi]=b(2i=1myixi+2i=1mbxi)=b(2i=1mbxi)=2i=1mxi

此式即为 B = f x y ′ ′ ( x , y ) B=f_{xy}^{\prime \prime}(x, y) B=fxy(x,y)

证明损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w b b b的凸函数——求 C = f y y ′ ′ ( x , y ) C=f_{yy}^{\prime \prime}(x, y) C=fyy(x,y)

∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − 1 ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (西瓜书式3.6) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} \tag{西瓜书式3.6} bE(w,b)=b[i=1m(yi(wxi+b))2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(mbi=1m(yiwxi))(西3.6)

故有:

∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = ∂ ∂ b ( 2 m b ) = 2 m \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial b^{2}} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &=\frac{\partial}{\partial b}(2 m b) \\ &=2 m \end{aligned} b22E(w,b)=b(bE(w,b))=b[2(mbi=1m(yiwxi))]=b(2mb)=2m

此式即为 C = f y y ′ ′ ( x , y ) C=f_{yy}^{\prime \prime}(x, y) C=fyy(x,y)

综上所述,有:

{ A = f x x ′ ′ ( x , y ) = 2 ∑ i = 1 m x i 2 B = f x y ′ ′ ( x , y ) = 2 ∑ i = 1 m x i C = f y y ′ ′ ( x , y ) = 2 m \left\{ \begin{aligned} &A=f_{xx}^{\prime \prime}(x, y)=2 \sum_{i=1}^{m} x_{i}^{2} \\ &B=f_{xy}^{\prime \prime}(x, y)=2 \sum_{i=1}^{m} x_{i} \\ &C=f_{yy}^{\prime \prime}(x, y)=2 m \end{aligned} \right. A=fxx(x,y)=2i=1mxi2B=fxy(x,y)=2i=1mxiC=fyy(x,y)=2m

所以:

A C − B 2 = 2 m ⋅ 2 ∑ i = 1 m x i 2 − ( 2 ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 ⋅ m ⋅ 1 m ⋅ ( ∑ i = 1 m x i ) 2 = 4 m ∑ i = 1 m x i 2 − 4 m ⋅ x ˉ ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) ∑ i = 1 m x i x ˉ = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 = 4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x ˉ 2 ) = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 \begin{aligned} A C-B^{2} &=2 m \cdot 2 \sum_{i=1}^{m} x_{i}^{2}-\left(2 \sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4\left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4 \cdot m \cdot \frac{1}{m} \cdot\left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=4 m \sum_{i=1}^{m} x_{i}^{2}-4 m \cdot \bar{x} \cdot \sum_{i=1}^{m} x_{i} \\ &=4 m\left(\sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m} x_{i} \bar{x}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+x_{i} \bar{x}\right) \\ &\qquad \sum_{i=1}^{m} x_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \\ &=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+\bar{x}^{2}\right) \\ &=4 m \sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2} \end{aligned} ACB2=2m2i=1mxi2(2i=1mxi)2=4mi=1mxi24(i=1mxi)2=4mi=1mxi24mm1(i=1mxi)2=4mi=1mxi24mxˉi=1mxi=4m(i=1mxi2i=1mxixˉ)=4mi=1m(xi2xixˉ)=4mi=1m(xi2xixˉxixˉ+xixˉ)i=1mxixˉ=xˉi=1mxi=xˉmm1i=1mxi=mxˉ2=i=1mxˉ2=4mi=1m(xi2xixˉxixˉ+xˉ2)=4mi=1m(xixˉ)2

故有:

A C − B 2 = 4 m ∑ i = 1 m ( x i − x ˉ ) 2 ≥ 0 AC-B^{2} = 4 m \sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2} \geq 0 ACB2=4mi=1m(xixˉ)20

也即损失函数 E ( w , b ) E(w, b) E(w,b)是关于 w w w b b b的凸函数,得证!



1.3. 分别对损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b w w w求一阶偏导数

损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b求一阶偏导数:

∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − 1 ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (西瓜书式3.6) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} \tag{西瓜书式3.6} bE(w,b)=b[i=1m(yi(wxi+b))2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(mbi=1m(yiwxi))(西3.6)

损失函数 E ( w , b ) E(w, b) E(w,b)关于 w w w求一阶偏导数:

∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − ( w x i + b ) ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (西瓜书式3.5) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=\frac{\partial}{\partial w}\left[\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) \\ &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) \end{aligned} \tag{西瓜书式3.5} wE(w,b)=w[i=1m(yi(wxi+b))2]=i=1mw(yiwxib)2=i=1m2(yiwxib)(xi)=2(wi=1mxi2i=1m(yib)xi)(西3.5)



1.4. 令各自的一阶偏导数等于0解出 b b b w w w

令损失函数 E ( w , b ) E(w, b) E(w,b)关于 b b b的一阶偏导数等于0解出 b b b

∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) = 0 ⇒ m b − ∑ i = 1 m ( y i − w x i ) = 0 ⇒ b = 1 m ∑ i = 1 m ( y i − w x i ) = 1 m ∑ i = 1 m y i − w 1 m ∑ i = 1 m x i = y ˉ − w x ˉ (西瓜书式3.8) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) =0 \\ &\Rightarrow m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)=0 \\ & \begin{aligned} \Rightarrow b&=\frac{1}{m}\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \\ &=\frac{1}{m}\sum_{i=1}^{m} y_{i} - w \frac{1}{m}\sum_{i=1}^{m} x_{i} \\ &=\bar{y}-w\bar{x} \end{aligned} \end{aligned} \tag{西瓜书式3.8} bE(w,b)=2(mbi=1m(yiwxi))=0mbi=1m(yiwxi)=0b=m1i=1m(yiwxi)=m1i=1myiwm1i=1mxi=yˉwxˉ(西3.8)

令损失函数 E ( w , b ) E(w, b) E(w,b)关于 w w w的一阶偏导数等于0解出 w w w

∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) = 0 ⇒ w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i = 0 ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − ∑ i = 1 m b x i b = y ˉ − w x ˉ ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − ∑ i = 1 m ( y ˉ − w x ˉ ) x i ⇒ w ∑ i = 1 m x i 2 = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i + w x ˉ ∑ i = 1 m x i ⇒ w ∑ i = 1 m x i 2 − w x ˉ ∑ i = 1 m x i = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ⇒ w ( ∑ i = 1 m x i 2 − x ˉ ∑ i = 1 m x i ) = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ⇒ w = ∑ i = 1 m y i x i − y ˉ ∑ i = 1 m x i ∑ i = 1 m x i 2 − x ˉ ∑ i = 1 m x i y ˉ ∑ i = 1 m x i = 1 m ∑ i = 1 m y i ∑ i = 1 m x i = x ˉ ∑ i = 1 m y i x ˉ ∑ i = 1 m x i = 1 m ∑ i = 1 m x i ∑ i = 1 m x i = 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i x i − x ˉ ∑ i = 1 m y i ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‪ ‫ ‫ ‌ ‬ ‎ ‮ ‌ ‌ ‫ ⁠ ‌ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ (西瓜书式3.7) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) =0 \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}=0 \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2} = \sum_{i=1}^{m}y_{i} x_{i} - \sum_{i=1}^{m} b x_{i} \\ &\qquad b=\bar{y}-w\bar{x} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m}(\bar{y}-w \bar{x}) x_{i} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2} =\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}+w \bar{x} \sum_{i=1}^{m} x_{i} \\ &\Rightarrow w \sum_{i=1}^{m} x_{i}^{2}-w \bar{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ &\Rightarrow w\left(\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}\right)=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ &\begin{aligned} \Rightarrow w &= \frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \\ &\qquad \bar{y} \sum_{i=1}^{m} x_{i} = \frac{1}{m}\sum_{i=1}^{m} y_{i} \sum_{i=1}^{m} x_{i} = \bar{x} \sum_{i=1}^{m} y_{i} \\ &\qquad \bar{x}\sum_{i=1}^{m} x_{i} = \frac{1}{m}\sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} x_{i} = \frac{1}{m} \left(\sum_{i=1}^{m} x_{i}\right)^{2} \\ &=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{x} \sum_{i=1}^{m} y_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \\ &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}‎‬‎‪‍‭‎‏‌‎‬‎‪‍‭‎‪‫‫‌‬‎‮‌‌‫⁠‌ \end{aligned}‏‌‎‬‎‪‍‭‎ \end{aligned} \tag{西瓜书式3.7} wE(w,b)=2(wi=1mxi2i=1m(yib)xi)=0wi=1mxi2i=1m(yib)xi=0wi=1mxi2=i=1myixii=1mbxib=yˉwxˉwi=1mxi2=i=1myixii=1m(yˉwxˉ)xiwi=1mxi2=i=1myixiyˉi=1mxi+wxˉi=1mxiwi=1mxi2wxˉi=1mxi=i=1myixiyˉi=1mxiw(i=1mxi2xˉi=1mxi)=i=1myixiyˉi=1mxiw=i=1mxi2xˉi=1mxii=1myixiyˉi=1mxiyˉi=1mxi=m1i=1myii=1mxi=xˉi=1myixˉi=1mxi=m1i=1mxii=1mxi=m1(i=1mxi)2=i=1mxi2m1(i=1mxi)2i=1myixixˉi=1myi=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ)(西3.7)

w w w向量化,有:

w = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ 1 m ( ∑ i = 1 m x i ) 2 = ( 1 m ∑ i = 1 m x i ) ∑ i = 1 m x i = x ˉ ∑ i = 1 m x i = ∑ i = 1 m x i x ˉ = ∑ i = 1 m ( y i x i − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ ) ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ = ∑ i = 1 m ( y i x i − y i x ˉ − y i x ˉ − y i x ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ − x i x ˉ ) ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ ∑ i = 1 m y i x ˉ = x ˉ ∑ i = 1 m y i = 1 m ∑ i = 1 m x i ∑ i = 1 m y i = ∑ i = 1 m x i ⋅ 1 m ⋅ ∑ i = 1 m y i = ∑ i = 1 m x i y ˉ ∑ i = 1 m y i x ˉ = x ˉ ∑ i = 1 m y i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m y i = m x ˉ y ˉ = ∑ i = 1 m x ˉ y ˉ ∑ i = 1 m x i x ˉ = x ˉ ∑ i = 1 m x i = x ˉ ⋅ m ⋅ 1 m ⋅ ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 = ∑ i = 1 m ( y i x i − y i x ˉ − x i y ˉ − x ˉ y ˉ ) ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ − x ˉ 2 ) ‎ ‬ ‎ ‪ ‍ ‭ ‎ ‏ ‌ ‎ ‬ ‎ ‪ ‍ ‭ ‎ = ∑ i = 1 m ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 m ( x i − x ˉ ) 2 ‎ x = ( x 1 , x 2 , ⋯   , x m ) T y = ( y 1 , y 2 , ⋯   , y m ) T x d = ( x 1 − x ˉ , x 2 − x ˉ , ⋯   , x m − x ˉ ) T y d = ( y 1 − y ˉ , y 2 − y ˉ , ⋯   , y m − y ˉ ) T = x d T y d x d T x d \begin{aligned} w &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}‎‬‎‪‍‭‎‏‌‎‬‎‪‍‭‎ \\ &\qquad \frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2} = \left(\frac{1}{m} \sum_{i=1}^{m} x_{i}\right) \sum_{i=1}^{m} x_{i} = \bar{x} \sum_{i=1}^{m} x_{i} = \sum_{i=1}^{m} x_{i} \bar{x} \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}\right)}‎‬‎‪‍‭‎‏‌‎‬‎‪‍‭‎ \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}-y_{i} \bar{x}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}-x_{i} \bar{x}\right)}‎‬‎‪‍‭‎‏‌‎‬‎‪‍‭‎ \\ &\qquad \sum_{i=1}^{m} y_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} y_{i}=\frac{1}{m} \sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} x_{i} \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} x_{i} \bar{y} \\ &\qquad \sum_{i=1}^{m} y_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} y_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} y_{i}=m \bar{x} \bar{y}=\sum_{i=1}^{m} \bar{x} \bar{y} \\ &\qquad \sum_{i=1}^{m} x_{i} \bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \\ &=\frac{\sum_{i=1}^{m} \left(y_{i} x_{i}-y_{i} \bar{x}-x_{i} \bar{y}-\bar{x}\bar{y}\right)}{\sum_{i=1}^{m} \left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}-\bar{x}^{2}\right)}‎‬‎‪‍‭‎‏‌‎‬‎‪‍‭‎ \\ &=\frac{\sum_{i=1}^{m} \left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sum_{i=1}^{m} \left(x_{i}-\bar{x}\right)^{2}}‎ \\ &\qquad x=\left(x_{1},x_{2},\cdots, x_{m}\right)^{T} \\ &\qquad y=\left(y_{1},y_{2},\cdots,y_{m}\right)^{T} \\ &\qquad x_{d}=\left(x_{1}-\bar{x},x_{2}-\bar{x},\cdots,x_{m}-\bar{x}\right)^{T} \\ &\qquad y_{d}=\left(y_{1}-\bar{y},y_{2}-\bar{y},\cdots,y_{m}-\bar{y}\right)^{T} \\ &=\frac{x_{d}^{T} y_{d}}{x_{d}^{T} x_{d}} \end{aligned} w=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ)m1(i=1mxi)2=(m1i=1mxi)i=1mxi=xˉi=1mxi=i=1mxixˉ=i=1m(xi2xixˉ)i=1m(yixiyixˉ)=i=1m(xi2xixˉxixˉxixˉ)i=1m(yixiyixˉyixˉyixˉ)i=1myixˉ=xˉi=1myi=m1i=1mxii=1myi=i=1mxim1i=1myi=i=1mxiyˉi=1myixˉ=xˉi=1myi=xˉmm1i=1myi=mxˉyˉ=i=1mxˉyˉi=1mxixˉ=xˉi=1mxi=xˉmm1i=1mxi=mxˉ2=i=1mxˉ2=i=1m(xi2xixˉxixˉxˉ2)i=1m(yixiyixˉxiyˉxˉyˉ)=i=1m(xixˉ)2i=1m(xixˉ)(yiyˉ)x=(x1,x2,,xm)Ty=(y1,y2,,ym)Txd=(x1xˉ,x2xˉ,,xmxˉ)Tyd=(y1yˉ,y2yˉ,,ymyˉ)T=xdTxdxdTyd




2. 二元线性回归

求解权重 w ^ \hat{w} w^的公式推导推导思路:

  1. 由最小二乘法导出损失函数 E w ^ E_{\hat{w}} Ew^

  2. 证明损失函数 E w ^ E_{\hat{w}} Ew^是关于 w ^ \hat{w} w^的凸函数

  3. 对损失函数 E w ^ E_{\hat{w}} Ew^关于 w ^ \hat{w} w^求一阶偏导数

  4. 令各自的一阶偏导数等于0解出 w ^ ∗ \hat{w}^{*} w^


2.1. 将 w w w b b b组合成 w ^ \hat{w} w^

f ( x i ) = w T x i + b = ( w 1 w 2 … w d ) ( x i 1 x i 2 ⋮ x i d ) + b = w 1 x i 1 + w 2 x i 2 + … + w d x i d + b w d + 1 = b = w 1 x i 1 + w 2 x i 2 + … + w d x i d + w d + 1 ⋅ 1 = ( w 1 w 2 … w d w d + 1 ) ( x i 1 x i 2 ⋮ x i d 1 ) = w ^ T x ^ i \begin{aligned} f\left(\boldsymbol{x}_{i}\right) &=\boldsymbol{w}^{T} \boldsymbol{x}_{i}+b \\ &=\left(\begin{array}{cccc} {w_{1}} & {w_{2}} & {\dots} & {w_{d}}\end{array}\right) \left(\begin{array}{c}{x_{i 1}} \\ {x_{i 2}} \\ {\vdots} \\ {x_{i d}}\end{array}\right)+b \\ &=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+b \\ &\qquad w_{d+1}=b \\ &=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+w_{d+1} \cdot 1 \\ &=\left(\begin{array}{ccccc} {w_{1}} & {w_{2}} & {\dots} & {w_{d}} & {w_{d+1}}\end{array}\right) \left(\begin{array}{c}{x_{i 1}} \\ {x_{i 2}} \\ {\vdots} \\ {x_{i d}} \\ 1\end{array}\right) \\ &=\hat{w}^{T}\hat{x}_{i} \end{aligned} f(xi)=wTxi+b=(w1w2wd)xi1xi2xid+b=w1xi1+w2xi2++wdxid+bwd+1=b=w1xi1+w2xi2++wdxid+wd+11=(w1w2wdwd+1)xi1xi2xid1=w^Tx^i



2.2. 由最小二乘法导出损失函数 E w ^ E_{\hat{w}} Ew^

E w ^ = ∑ i = 1 m ( y i − f ( x ^ i ) ) 2 = ∑ m ( y i − w ^ T x ^ i ) 2 X = ( x 11 x 12 … x 1 d 1 x 21 x 22 … x 2 d 1 ⋮ ⋮ ⋱ ⋮ ⋮ x m 1 x m 2 … x m d 1 ) = ( x 1 T 1 x 2 T 1 ⋮ ⋮ x m T 1 ) = ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) y = ( y 1 , y 2 , ⋯   , y m ) T = ( y 1 − w ^ T x ^ 1 ) 2 + ( y 2 − w ^ T x ^ 2 ) 2 + ⋯ + ( y m − w ^ T x ^ m ) 2 = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( w ^ T x ^ 1 w ^ T x ^ 2 ⋮ w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ m T w ^ ) = ( y 1 y 2 ⋮ y m ) − ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) w ^ = y − X w ^ ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) T = ( y − X w ^ ) T = ( y − X w ^ ) T ( y − X w ^ ) \begin{aligned} E_{\hat{\boldsymbol{w}}} &=\sum_{i=1}^{m}\left(y_{i}-f\left(\hat{\boldsymbol{x}}_{i}\right)\right)^{2} \\ &=\sum^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{i}\right)^{2} \\ &\qquad \begin{aligned} &\mathbf{X} =\left(\begin{array}{ccccc} {x_{11}} & {x_{12}} & {\dots} & {x_{1 d}} & {1} \\ {x_{21}} & {x_{22}} & {\dots} & {x_{2 d}} & {1} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & {\vdots} \\ {x_{m 1}} & {x_{m 2}} & {\dots} & {x_{m d}} & {1} \end{array}\right) =\left(\begin{array}{cc} {\boldsymbol{x}_{1}^{\mathrm{T}}} & {1} \\ {\boldsymbol{x}_{2}^{\mathrm{T}}} & {1} \\ {\vdots} & {\vdots} \\ {\boldsymbol{x}_{m}^{\mathrm{T}}} & {1} \end{array}\right) =\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}} \end{array}\right) \\ &\boldsymbol{y}=\left(y_{1},y_{2},\cdots,y_{m}\right)^{T} \end{aligned} \\ &=\left(y_{1}-\hat{\boldsymbol{w}}^{T} \hat{x}_{1}\right)^{2} + \left(y_{2}-\hat{\boldsymbol{w}}^{T} \hat{x}_{2}\right)^{2} + \cdots + \left(y_{m}-\hat{\boldsymbol{w}}^{T} \hat{x}_{m}\right)^{2} \\ &=\left(\begin{array}{cccc} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} & {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} & {\cdots} & {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) \left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) \\ &\qquad \left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T} \hat{\boldsymbol{w}}} \end{array}\right) =\left(\begin{array}{c} {y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{m}} \end{array}\right) -\left(\begin{array}{c} {\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}} \end{array}\right) \hat{\boldsymbol{w}} =\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}} \\ &\qquad \left(\begin{array}{cccc} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} & {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} & {\cdots} & {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right) =\left(\begin{array}{c} {y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}} \\ {y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}} \\ {\vdots} \\ {y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}} \end{array}\right)^{T} =\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right)^{T} \\ &=\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right)^{T}\left(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}\right) \end{aligned} Ew^=i=1m(yif(x^i))2=m(yiw^Tx^i)2X=x11x21xm1x12x22xm2x1dx2dxmd111=x1Tx2TxmT111=x^1Tx^2Tx^mTy=(y1,y2,,ym)T=(y1w^Tx^1)2+(y2w^Tx^2)2++(ymw^Tx^m)2=(y1w^Tx^1y2w^Tx^2ymw^Tx^m)y1w^Tx^1y2w^Tx^2ymw^Tx^my1w^Tx^1y2w^Tx^2ym

你可能感兴趣的:(机器学习,人工智能,西瓜书,机器学习,线性模型)