一、一维线性回归
一维线性回归最好的解法是:最小二乘法
问题描述:给定数据集$D=\left \{ \left ( x_{1},y_{1} \right ),\left ( x_{2},y_{2} \right ),\cdots ,\left ( x_{m},y_{m} \right ) \right \}$,一维线性回归希望能找到一个函数$f\left ( x_{i} \right )$,使得$f\left ( x_{i} \right )=wx_{i}+b$能够与$y_{i}$尽可能接近。
损失函数:$$L\left ( w,b \right )=\sum_{i=1}^{m}\left [ f\left ( x_{i} \right )- y_{i} \right ]^{2}$$
目标:$$\left ( w^{*},b^{*} \right )=\underset{w,b}{argmin}\sum_{i=1}^{m}\left [ f\left ( x_{i} \right )- y_{i} \right ]^{2}=\underset{w,b}{argmin}\sum_{i=1}^{m}\left (y_{i}- wx_{i}-b \right )^{2} $$
求解损失函数的方法很直观,令损失函数的偏导数为零,即:$$\frac{\partial L\left ( w,b \right ) }{\partial w}=2\sum_{i=1}^{m}\left (y_{i}- wx_{i}-b \right )\left ( - x_{i}\right )\\=2\sum_{i=1}^{m}\left [ wx_{i}^{2} -\left ( y_{i}-b \right )x_{i}\right ]=2\left ( w\sum_{i=1}^{m}x_{i}^{2}- \sum_{i=1}^{m}\left ( y_{i}-b \right )x_{i}\right )=0$$
$$\frac{\partial L\left ( w,b \right ) }{\partial b}= 2 \sum_{i=1}^{m}\left (wx_{i}+b -y_{i}\right )=2\left ( mb- \sum_{i=1}^{m}\left ( y_{i}-wx_{i} \right )\right )=0$$
解上二式得:
$$ b= \frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-wx_{i} \right ) $$
$$w\sum_{i=1}^{m}x_{i}^{2}-\sum_{i=1}^{m}\left ( y_{i}-b \right )x_{i}=0$$
$$w\sum_{i=1}^{m}x_{i}^{2}-\sum_{i=1}^{m}y_{i}x_{i}+ \frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-wx_{i} \right )\sum_{i=1}^{m}x_{i}=0$$
$$w\sum_{i=1}^{m}x_{i}^{2}-\sum_{i=1}^{m}y_{i}x_{i}+\sum_{i=1}^{m}y_{i}\bar{x_{i}}-\frac{w}{m}\left ( \sum_{i=1}^{m}x_{i} \right )^{2}=0$$
$$w\left [ \sum_{i=1}^{m}x_{i}^{2} -\frac{1}{m}\left ( \sum_{i=1}^{m}x_{i} \right )^{2}\right ]=\sum_{i=1}^{m}y_{i}\left ( x_{i}-\bar{x_{i}} \right )$$
$$w=\frac{\sum_{i=1}^{m}y_{i}\left ( x_{i}-\bar{x_{i}} \right )}{\left [ \sum_{i=1}^{m}x_{i}^{2} -\frac{1}{m}\left ( \sum_{i=1}^{m}x_{i} \right )^{2}\right ]}$$
其中$\bar{x_{i}}=\frac{1}{m}\sum_{i=1}^{m}x_{i}$为$x_{i}的均值$
二、多元线性回归
假设每个样例$x_{i}$有d个属性,即
$x_{i} = \begin{bmatrix}
x_{i}^{\left(1\right )}\\
x_{i}^{\left(2\right )}\\
\vdots \\
x_{i}^{\left(d\right )}
\end{bmatrix}$
试图学得回归函数$f\left(\mathbf{ x_{i}} \right)$,$f\left(\mathbf{ x_{i}} \right)=\mathbf{w}^{T}\mathbf{x_{i}}+b$
损失函数仍采用军方误差的形式,同样可以采用最小二乘法对$\mathbf{ x}$和$b$进行估计。为了方便计算,我们把$\mathbf{ x}$和$b$写进同一个矩阵,如下:
$$\mathbf{w }= \begin{bmatrix}
w_{1}\\
w_{2}\\
\vdots\\
w_{d}\\
b
\end{bmatrix}$$
$$X=\begin{bmatrix}
x_{1}^{\left ( 1 \right )} & x_{1}^{\left ( 2 \right )}&... & x_{1}^{\left ( d \right )} &1 \\
x_{2}^{\left ( 1 \right )} & x_{2}^{\left ( 2 \right )}& ... & x_{2}^{\left ( d \right )} &1 \\
\vdots & \vdots & \ddots & \vdots & \vdots\\
x_{m}^{\left ( 1 \right )} & x_{m}^{\left ( 2 \right )}& ... & x_{m}^{\left ( d \right )} &1
\end{bmatrix}$$
$$X\mathbf{w}=\begin{bmatrix}
f\left ( x_{1} \right )\\
f\left ( x_{1} \right )\\
\vdots\\
f\left ( x_{d} \right )
\end{bmatrix}$$
$$\mathbf{Y}=\begin{bmatrix}
y_{1}\\
y_{2}\\
\vdots\\
y_{d}
\end{bmatrix}$$
三、推导多元线性回归
推导多元线性回归前,首先列出推导过程中一些常用的迹和矩阵求导的定理。
$\mathbf{z}^{T}\mathbf{z}=\sum_{i}z_{i}^{2}$,$\mathbf{z}$是列向量
$\mathbf{A}和\mathbf{B}$是矩阵,tr表示求矩阵的迹,则有:
$$tr\left ( \mathbf{AB }\right )=tr\left ( \mathbf{BA} \right )\\
tr\left ( \mathbf{ABC }\right )=tr\left ( \mathbf{CAB} \right )=tr\left ( \mathbf{BCA }\right )$$
若$f\left ( \mathbf{A} \right )=tr\left ( \mathbf{AB} \right )$,则$\bigtriangledown _{A} tr\left ( \mathbf{AB }\right )=\mathbf{B}^{T}$
$$tr\left (\mathbf{A} \right )=tr\left ( \mathbf{A}^{T} \right )\\
if\quad a\epsilon R,\quad tr\left ( a \right )=a$$
$$\bigtriangledown _{A}tr\left ( \mathbf{ABA^{T}C} \right )=\mathbf{CAB}+\mathbf{C^{T}AB^{T}}$$
由题意的:$L\left (\mathbf{ w} \right )=\frac{1}{2}\left ( X\mathbf{w}-\mathbf{Y} \right )^{T}\left ( X\mathbf{w}-\mathbf{Y} \right )$
$$\begin{aligned}
\bigtriangledown _{w}L\left (\mathbf{ w} \right )&=\frac{1}{2}\bigtriangledown _{w}\left ( X\mathbf{w}-\mathbf{Y} \right )^{T}\left ( X\mathbf{w}-\mathbf{Y} \right )\\
&=\frac{1}{2}\bigtriangledown _{w}\left ( \mathbf{w^{T}}X^{T}-\mathbf{Y^{T}} \right )\left ( X\mathbf{w}-\mathbf{Y} \right )\\
&=\frac{1}{2}\bigtriangledown _{w}\left ( \mathbf{w^{T}}X^{T}X\mathbf{w}- \mathbf{w^{T}}X^{T}\mathbf{Y}-\mathbf{Y}^{T}X\mathbf{w}+\mathbf{Y}^{T}\mathbf{Y}\right ) \\
&=\frac{1}{2}\bigtriangledown _{w}tr\left ( \mathbf{w^{T}}X^{T}X\mathbf{w}- \mathbf{w^{T}}X^{T}\mathbf{Y}-\mathbf{Y}^{T}X\mathbf{w}+\mathbf{Y}^{T}\mathbf{Y}\right )\\
&=\frac{1}{2}\bigtriangledown _{w}tr\left ( \mathbf{w^{T}}X^{T}X\mathbf{w}- \mathbf{w^{T}}X^{T}\mathbf{Y}-\mathbf{Y}^{T}X\mathbf{w}\right )\\
&=\frac{1}{2}\left [ \bigtriangledown _{w}tr \left ( \mathbf{w^{T}}X^{T}X\mathbf{w} \right )-\bigtriangledown _{w}tr \left ( \mathbf{w^{T}}X^{T}\mathbf{Y} \right )-\bigtriangledown _{w}tr \left ( \mathbf{Y}^{T}X\mathbf{w} \right )\right ] \\
&=\frac{1}{2}\left [ \bigtriangledown _{w}tr \left ( \mathbf{w}I\mathbf{w^{T}}X^{T}X \right )-\bigtriangledown _{w}tr \left ( \mathbf{Y}^{T}X\mathbf{w } \right )-\bigtriangledown _{w}tr \left (\mathbf{Y}^{T}X\mathbf{w} \right )\right ]\\
&= \frac{1}{2}\left [ X^{T}Xw+ X^{T}Xw - X^{T}Y - X^{T}Y\right ]\\
&=X^{T}\left ( Xw-Y \right )=0
\end{aligned}$$
注:
- 损失函数前的1/2是为了求偏导数方便人为加上的,但是不影响w和b的最优解
- 上式子的求解涉及了矩阵的逆运算,所以需要 $X^{T}X$ 是一个满秩矩阵或者正定矩阵,可以得到:$\mathbf{w}=\left ( X^{T}X \right )^{-1}X^{T}Y$
- 如果在现实任务中,不满足满秩矩阵时,我们可以用梯度下降法来求解