最小二乘法、加权最小二乘法——直线拟合

对多篇最小二乘法相关的资料的整合,如有错误,敬请指正!
原文地址1
原文地址2

线性回归

线性回归假设数据集中特征与结果存在着线性关系:
y = m x + c y = mx+c y=mx+c
y为结果,x为特征,m为系数,c为系数
我们需要找到m、c使得m*x+c得到的结果y与真实的y误差最小,这里使用平方差来衡量估计值与真实值得误差(如果只用差值就可能会存在负数);用于计算真实值与预测值的误差的函数称为:平方损失函数(square loss function);这里用L表示损失函数,所以有:
L n = ( y n − ( m x n + c ) ) 2 L_n = (y_n-(mx_n+c))^2 Ln=(yn(mxn+c))2
整个数据集上的平均损失为:
L = 1 N ∑ n = 1 N ( y n , f ( x n ; c , m ) ) L=\frac{1}{N} \sum_{n=1}^{N}(y_n,f(x_n;c,m)) L=N1n=1N(yn,f(xn;c,m))

我们要求得最匹配的mc使得L最小;数学表达式可以表示为:
arg ⁡ m   min ⁡ c   1 N ∑ n = 1 N L n ( y n ; c , m ) {\arg\limits_m}\ {\min\limits_c}\ \frac{1}{N}\sum_{n=1}^{N}L_n(y_n;c,m) marg cmin N1n=1NLn(yn;c,m)
最小二乘法用于求目标函数的最优解,它通过最小化误差的平方和寻找匹配项所以又称为:最小平方法;这里将用最小二乘法求得线性回归的最优解;

最小二乘法

数据集有1…N个数据组成,每个数据由{x,y}构成,x表示特征,y为实际结果;这里将线性回归模型定义为:
f ( x ; m , c ) = m x + c f(x;m,c)=mx+c f(x;m,c)=mx+c
平均损失函数为:
L = 1 N ∑ n = 1 N L n ( y n , f ( x n ; c , m ) ) = 1 N ∑ n = 1 N ( y n − f ( x n ; c , m ) ) 2 = 1 N ∑ n = 1 N ( y n − ( c + m x n ) ) 2 = 1 N ∑ n = 1 N ( y n − c − m x n ) ( y n − c − m x n ) = 1 N ∑ n = 1 N ( y n 2 − 2 y n c − 2 y n m x + c 2 + 2 c m x + m 2 x n 2 ) = 1 N ∑ n = 1 N ( y n 2 − 2 y n c + 2 m x ( c − y n ) + c 2 + m 2 x n 2 ) \begin{aligned} L &=\frac{1}{N}\sum_{n=1}^{N}L_n(y_n,f(x_n;c,m))\\ &=\frac{1}{N}\sum_{n=1}^{N}(y_n-f(x_n;c,m))^2\\ &=\frac{1}{N}\sum_{n=1}^{N}(y_n-(c+mx_n))^2\\ &=\frac{1}{N}\sum_{n=1}^{N}(y_n-c-mx_n)(y_n-c-mx_n)\\ &=\frac{1}N\sum_{n=1}^{N}(y_n^2-2y_nc-2y_nmx+c^2+2cmx+m^2x_n^2)\\ &=\frac{1}{N}\sum_{n=1}^{N}(y_n^2-2y_nc+2mx(c-y_n)+c^2+m^2x_n^2)\\ \end{aligned} L=N1n=1NLn(yn,f(xn;c,m))=N1n=1N(ynf(xn;c,m))2=N1n=1N(yn(c+mxn))2=N1n=1N(yncmxn)(yncmxn)=N1n=1N(yn22ync2ynmx+c2+2cmx+m2xn2)=N1n=1N(yn22ync+2mx(cyn)+c2+m2xn2)

要使L最小,其关于cm的偏导数为0,所以求偏导数,得出后让导数等于0,并对cm求解便能得到最小的L,此时的cm便是最匹配该模型的;

关于c的偏导数:

因为求得是关于c的偏导数,因此把L的等式中不包含c的项去掉,得到:
1 N ∑ n = 1 N ( c 2 − 2 y n c + 2 c m x n ) \frac{1}{N}\sum_{n=1}^{N}(c^2-2y_nc+2cmx_n) N1n=1N(c22ync+2cmxn)
整理式子把不包含下标n的往累加和外移得到:
c 2 + 2 c m 1 N ( ∑ n = 1 N x n ) − 2 c 1 N ( ∑ n = 1 N y n ) c^2+2cm\frac{1}{N}(\sum_{n=1}^{N}x_n)-2c\frac{1}{N}(\sum^{N}_{n=1}y_n) c2+2cmN1(n=1Nxn)2cN1(n=1Nyn)
那么对c求偏导数得:
∂ L ∂ c = 2 c + 2 m 1 N ( ∑ n = 1 N x n ) − 2 N ( ∑ n = 1 N y n ) \frac{\partial L }{\partial c}=2c+2m\frac{1}{N}(\sum_{n=1}^{N}x_n)-\frac{2}{N}(\sum_{n=1}^{N}y_n) cL=2c+2mN1(n=1Nxn)N2(n=1Nyn)

关与m的偏导数:

因为求得是关于m的偏导数,因此把L的等式中不包含m的项去掉,得到:
1 N ∑ n = 1 N ( m 2 x n 2 − 2 y n m x n + 2 c m x n ) \frac{1}{N}\sum_{n=1}^{N}(m^2x_n^2-2y_nmx_n+2cmx_n) N1n=1N(m2xn22ynmxn+2cmxn)
整理式子把不包含下标n的往累加和外移得到:
m 2 1 N ∑ n = 1 N ( x n 2 ) + 2 m 1 N ∑ n = 1 N x n ( c − y n ) m_2\frac{1}{N}\sum_{n=1}^{N}(x_n^2)+2m\frac{1}{N}\sum_{n=1}^{N}x_n(c-y_n) m2N1n=1N(xn2)+2mN1n=1Nxn(cyn)
那么对m求偏导数得:
∂ L ∂ m = 2 m 1 N ∑ n = 1 N ( x n 2 ) + 2 N ∑ n = 1 N x n ( c − y n ) \frac{\partial L }{\partial m}=2m\frac{1}{N}\sum_{n=1}^{N}(x_n^2)+\frac{2}{N}\sum_{n=1}^{N}x_n(c-y_n) mL=2mN1n=1N(xn2)+N2n=1Nxn(cyn)

求解mc

令关于c的偏导数等于0,求解:
2 c + 2 m 1 N ( ∑ n = 1 N x n ) − 2 N ( ∑ n = 1 N y n ) = 0 2c+2m\frac{1}{N}(\sum_{n=1}^{N}x_n)-\frac{2}{N}(\sum_{n=1}^{N}y_n)=0 2c+2mN1(n=1Nxn)N2(n=1Nyn)=0

2 c = 2 N ( ∑ n = 1 N y n ) − 2 m 1 N ( ∑ n = 1 N x n ) 2c=\frac{2}{N}(\sum_{n=1}^{N}y_n)-2m\frac{1}{N}(\sum_{n=1}^{N}x_n) 2c=N2(n=1Nyn)2mN1(n=1Nxn)

c = 1 N ( ∑ n = 1 N y n ) − m 1 N ( ∑ n = 1 N x n ) c=\frac{1}{N}(\sum_{n=1}^{N}y_n)-m\frac{1}{N}(\sum_{n=1}^{N}x_n) c=N1(n=1Nyn)mN1(n=1Nxn)

从上求解得到的值可以看出,上面式子中存在两个平均值:
x ‾ = 1 N ( ∑ n = 1 N x n ) , y ‾ = 1 N ( ∑ n = 1 N y n ) \overline{x}=\frac{1}{N}(\sum_{n=1}^{N}x_n),\overline{y}=\frac{1}{N}(\sum_{n=1}^{N}y_n) x=N1(n=1Nxn),y=N1(n=1Nyn)
则:
c = y ‾ − m x ‾ c=\overline{y}-m\overline{x} c=ymx
令关于m的偏导数等于0,求解:
2 m 1 N ∑ n = 1 N ( x n 2 ) + 2 N ∑ n = 1 N x n ( c − y n ) = 0 2m\frac{1}{N}\sum_{n=1}^{N}(x_n^2)+\frac{2}{N}\sum_{n=1}^{N}x_n(c-y_n)=0 2mN1n=1N(xn2)+N2n=1Nxn(cyn)=0
c和平均值关系带入得:
m 1 N ∑ n = 1 N ( x n 2 ) + 1 N ∑ n = 1 N x n ( y ‾ − m x ‾ − y n ) = 0 m\frac{1}{N}\sum_{n=1}^{N}(x_n^2)+\frac{1}{N}\sum_{n=1}^{N}x_n(\overline{y}-m\overline{x}-y_n)=0 mN1n=1N(xn2)+N1n=1Nxn(ymxyn)=0

m ( 1 N ∑ n = 1 N ( x n 2 ) − 1 N x ‾ ∑ n = 1 N x n ) = 1 N ∑ n = 1 N ( x n y n − x n y ‾ ) m(\frac{1}{N}\sum_{n=1}^{N}(x_n^2)-\frac{1}{N}\overline{x}\sum_{n=1}^{N}x_n)=\frac{1}{N}\sum_{n=1}^{N}(x_ny_n-x_n\overline{y}) m(N1n=1N(xn2)N1xn=1Nxn)=N1n=1N(xnynxny)

令:
x 2 ‾ = 1 N ∑ n = 1 N ( x n 2 ) ,   x y ‾ = 1 N ∑ n = 1 N ( x n y n ) \overline{x^2} =\frac{1}{N}\sum_{n=1}^{N}(x_n^2), \ \overline{xy}=\frac{1}{N}\sum_{n=1}^{N}(x_ny_n) x2=N1n=1N(xn2), xy=N1n=1N(xnyn)
则:
m = x y ‾ − x ‾   y ‾ x 2 ‾ − x ‾ 2 m=\frac{\overline{xy}-\overline{x}\ \overline{y}}{\overline{x^2}-\overline{x}^2} m=x2x2xyx y
至此,mc都已计算出

加权最小二乘法:

前面所求解的一般最小二乘法将时间序列中的各项数据的重要性同等看待,而事实上时间序列各项数据对未来的影响作用应是不同的。一般来说,近期数据比起远期数据对未来的影响更大。因此比较合理的方法就是使用加权的方法,对近期数据赋以较大的权数,对远期数据则赋以较小的权数。加权最小二乘法采用指数权数W(0 L n = W n ( y n − ( m x n + c ) ) 2 L_n = W_n(y_n-(mx_n+c))^2 Ln=Wn(yn(mxn+c))2

L = 1 N ∑ n = 1 N W n ( y n , f ( x n ; c , m ) ) L=\frac{1}{N} \sum_{n=1}^{N}W_n(y_n,f(x_n;c,m)) L=N1n=1NWn(yn,f(xn;c,m))

arg ⁡ m   min ⁡ c   1 N ∑ n = 1 N L n ( y n ; c , m ) = arg ⁡ m   min ⁡ c   1 N ∑ n = 1 N W n ( y n − ( m x n + c ) ) 2 {\arg\limits_m} \ {\min \limits_{c}}\ \frac{1}{N}\sum_{n=1}^{N}L_n(y_n;c,m)={\arg\limits_m}\ {\min\limits_c}\ \frac{1}{N}\sum_{n=1}^{N}W_n(y_n-(mx_n+c))^2 marg cmin N1n=1NLn(yn;c,m)=marg cmin N1n=1NWn(yn(mxn+c))2

同理,平均损失函数为:
L = 1 N ∑ n = 1 N L n ( y n , f ( x n ; c , m ) ) = 1 N ∑ n = 1 N W n ( y n − f ( x n ; c , m ) ) 2 = 1 N ∑ n = 1 N W n ( y n − ( c + m x n ) ) 2 = 1 N ∑ n = 1 N W n ( y n − c − m x n ) ( y n − c − m x n ) = 1 N ∑ n = 1 N W n ( y n 2 − 2 y n c − 2 y n m x + c 2 + 2 c m x + m 2 x n 2 ) = 1 N ∑ n = 1 N W n ( y n 2 − 2 y n c + 2 m x ( c − y n ) + c 2 + m 2 x n 2 ) \begin{aligned} L &=\frac{1}{N}\sum_{n=1}^{N}L_n(y_n,f(x_n;c,m))\\ &=\frac{1}{N}\sum_{n=1}^{N}W_n(y_n-f(x_n;c,m))^2\\ &=\frac{1}{N}\sum_{n=1}^{N}W_n(y_n-(c+mx_n))^2\\ &=\frac{1}{N}\sum_{n=1}^{N}W_n(y_n-c-mx_n)(y_n-c-mx_n)\\ &=\frac{1}N\sum_{n=1}^{N}W_n(y_n^2-2y_nc-2y_nmx+c^2+2cmx+m^2x_n^2)\\ &=\frac{1}{N}\sum_{n=1}^{N}W_n(y_n^2-2y_nc+2mx(c-y_n)+c^2+m^2x_n^2) \end{aligned} L=N1n=1NLn(yn,f(xn;c,m))=N1n=1NWn(ynf(xn;c,m))2=N1n=1NWn(yn(c+mxn))2=N1n=1NWn(yncmxn)(yncmxn)=N1n=1NWn(yn22ync2ynmx+c2+2cmx+m2xn2)=N1n=1NWn(yn22ync+2mx(cyn)+c2+m2xn2)

要使L最小,其关于cm的偏导数为0,所以求偏导数,得出后让导数等于0,并对cm求解便能得到最小的L,此时的cm便是最匹配该模型的;

关于c的偏导数:

因为求得是关于c的偏导数,因此把L的等式中不包含c的项去掉,得到:
1 N ∑ n = 1 N W n ( c 2 − 2 y n c + 2 c m x n ) \frac{1}{N}\sum_{n=1}^{N}W_n(c^2-2y_nc+2cmx_n) N1n=1NWn(c22ync+2cmxn)
整理式子把不包含下标n的往累加和外移得到:
c 2 1 N ∑ n = 1 N W n + 2 c m 1 N ( ∑ n = 1 N W n x n ) − 2 c 1 N ( ∑ n = 1 N W n y n ) c^2\frac{1}{N}\sum_{n=1}^{N}W_n+2cm\frac{1}{N}(\sum_{n=1}^{N}W_nx_n)-2c\frac{1}{N}(\sum^{N}_{n=1}W_ny_n) c2N1n=1NWn+2cmN1(n=1NWnxn)2cN1(n=1NWnyn)
那么对c求偏导数得:
∂ L ∂ c = 2 c 1 N ∑ n = 1 N W n + 2 m 1 N ( ∑ n = 1 N W n x n ) − 2 N ( ∑ n = 1 N W n y n ) \frac{\partial L }{\partial c}=2c\frac{1}{N}\sum_{n=1}^{N}W_n+2m\frac{1}{N}(\sum_{n=1}^{N}W_nx_n)-\frac{2}{N}(\sum_{n=1}^{N}W_ny_n) cL=2cN1n=1NWn+2mN1(n=1NWnxn)N2(n=1NWnyn)

关与m的偏导数:

因为求得是关于m的偏导数,因此把L的等式中不包含m的项去掉,得到:
1 N ∑ n = 1 N W n ( m 2 x n 2 − 2 y n m x n + 2 c m x n ) \frac{1}{N}\sum_{n=1}^{N}W_n(m^2x_n^2-2y_nmx_n+2cmx_n) N1n=1NWn(m2xn22ynmxn+2cmxn)
整理式子把不包含下标n的往累加和外移得到:
m 2 1 N ∑ n = 1 N ( W n x n 2 ) + 2 m 1 N ∑ n = 1 N W n x n ( c − y n ) m^2\frac{1}{N}\sum_{n=1}^{N}(W_nx_n^2)+2m\frac{1}{N}\sum_{n=1}^{N}W_nx_n(c-y_n) m2N1n=1N(Wnxn2)+2mN1n=1NWnxn(cyn)
那么对m求偏导数得:
∂ L ∂ m = 2 m 1 N ∑ n = 1 N ( W n x n 2 ) + 2 N ∑ n = 1 N W n x n ( c − y n ) \frac{\partial L }{\partial m}=2m\frac{1}{N}\sum_{n=1}^{N}(W_nx_n^2)+\frac{2}{N}\sum_{n=1}^{N}W_nx_n(c-y_n) mL=2mN1n=1N(Wnxn2)+N2n=1NWnxn(cyn)

求解mc

令关于c的偏导数等于0,求解:
2 c 1 N ∑ n = 1 N W n + 2 m 1 N ( ∑ n = 1 N W n x n ) − 2 N ( ∑ n = 1 N W n y n ) = 0 2c\frac{1}{N}\sum_{n=1}^{N}W_n+2m\frac{1}{N}(\sum_{n=1}^{N}W_nx_n)-\frac{2}{N}(\sum_{n=1}^{N}W_ny_n)=0 2cN1n=1NWn+2mN1(n=1NWnxn)N2(n=1NWnyn)=0

2 c = 2 N ( ∑ n = 1 N W n y n ) − 2 m ( 1 N ∑ n = 1 N W n x n ) 1 N ∑ n = 1 N W n 2c=\frac{\frac{2}{N}(\sum_{n=1}^{N}W_ny_n)-2m(\frac{1}{N}\sum_{n=1}^{N}W_nx_n)}{\frac{1}{N}\sum_{n=1}^{N}W_n} 2c=N1n=1NWnN2(n=1NWnyn)2m(N1n=1NWnxn)

c = 1 N ( ∑ n = 1 N W n y n ) − m ( 1 N ∑ n = 1 N W n x n ) 1 N ∑ n = 1 N W n c=\frac{\frac{1}{N}(\sum_{n=1}^{N}W_ny_n)-m(\frac{1}{N}\sum_{n=1}^{N}W_nx_n)}{\frac{1}{N}\sum_{n=1}^{N}W_n} c=N1n=1NWnN1(n=1NWnyn)m(N1n=1NWnxn)

令关于m的偏导数等于0,求解:
2 m 1 N ∑ n = 1 N ( W n x n 2 ) + 2 N ∑ n = 1 N W n x n ( c − y n ) = 0 2m\frac{1}{N}\sum_{n=1}^{N}(W_nx_n^2)+\frac{2}{N}\sum_{n=1}^{N}W_nx_n(c-y_n)=0 2mN1n=1N(Wnxn2)+N2n=1NWnxn(cyn)=0
c和平均值关系带入得:
2 m 1 N ∑ n = 1 N ( W n x n 2 ) + 2 N ∑ n = 1 N W n x n ( 1 N ( ∑ n = 1 N W n y n ) − m 1 N ( ∑ n = 1 N W n x n ) 1 N ∑ n = 1 N W n − y n ) = 0 2m\frac{1}{N}\sum_{n=1}^{N}(W_nx_n^2)+\frac{2}{N}\sum_{n=1}^{N}W_nx_n(\frac{\frac{1}{N}(\sum_{n=1}^{N}W_ny_n)-m\frac{1}{N}(\sum_{n=1}^{N}W_nx_n)}{\frac{1}{N}\sum_{n=1}^{N}W_n}-y_n)=0 2mN1n=1N(Wnxn2)+N2n=1NWnxn(N1n=1NWnN1(n=1NWnyn)mN1(n=1NWnxn)yn)=0

m = ( 1 N ∑ n = 1 N W n x n y n ) ∗ ( 1 N ∑ n = 1 N W n ) − ( 1 N ∑ n = 1 N W n x n ) ∗ ( 1 N ∑ n = 1 N W n y n ) ( 1 N ∑ n = 1 N W n x n 2 ) ∗ ( 1 N ∑ n = 1 N W n ) − ( 1 N ∑ n = 1 N W n x n ) ∗ ( 1 N ∑ n = 1 N W n y n ) m = \frac{(\frac{1}{N}\sum_{n=1}^{N}W_nx_ny_n)*(\frac{1}{N}\sum_{n=1}^{N}W_n)-(\frac{1}{N}\sum_{n=1}^{N}W_nx_n)*(\frac{1}{N}\sum_{n=1}^{N}W_ny_n)}{(\frac{1}{N}\sum_{n=1}^{N}W_nx_n^2)*(\frac{1}{N}\sum_{n=1}^{N}W_n)-(\frac{1}{N}\sum_{n=1}^{N}W_nx_n)*(\frac{1}{N}\sum_{n=1}^{N}W_ny_n)} m=(N1n=1NWnxn2)(N1n=1NWn)(N1n=1NWnxn)(N1n=1NWnyn)(N1n=1NWnxnyn)(N1n=1NWn)(N1n=1NWnxn)(N1n=1NWnyn)

至此,mc都已计算出

矩阵推导部分

一个n×n的矩阵A的迹是指A的主对角线上各元素的总和,记作tr(A)。即
t r ( A ) = ∑ i = 1 n a i i tr(A)=\sum_{i=1}^{n}a_{ii} tr(A)=i=1naii

  • 定理一:tr(AB)=tr(BA)

证明:
t r ( A B ) = ∑ i = 1 n ( A B ) i i = ∑ i = 1 n ∑ j = 1 m a i j b j i = ∑ j = 1 m ∑ i = 1 n b j i a i j = ∑ j = 1 m ( B A ) j j = t r ( B A ) tr(AB)=\sum_{i=1}^{n}(AB)_{ii}=\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ij}b_{ji}=\sum_{j=1}^{m}\sum_{i=1}^{n}b_{ji}a_{ij}=\sum_{j=1}^{m}(BA)_{jj}=tr(BA) tr(AB)=i=1n(AB)ii=i=1nj=1maijbji=j=1mi=1nbjiaij=j=1m(BA)jj=tr(BA)

  • 定理二:
    t r ( A B C ) = t r ( C A B ) = t r ( B C A ) tr(ABC)=tr(CAB)=tr(BCA) tr(ABC)=tr(CAB)=tr(BCA)

  • 定理三:

∂ t r ( A B ) ∂ A = ∂ t r ( B A ) ∂ A = B T \frac{\partial{tr(AB)}}{\partial A}=\frac{\partial{tr(BA)}}{\partial A}=B^T Atr(AB)=Atr(BA)=BT

其中Am×n的矩阵,Bn×m的矩阵
t r ( A B ) = t r ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( b 11 b 12 ⋯ b 1 m b 21 b 22 ⋯ b 2 m ⋮ ⋮ ⋱ ⋮ b n 1 b n 2 ⋯ b n m ) tr(AB)=tr\left(\begin{matrix}a_{11}&a_{12}&\cdots&a_{1n}\\a_{21}&a_{22}&\cdots&a_{2n}\\\vdots&\vdots&\ddots&\vdots\\a_{m1}&a_{m2}&\cdots&a_{mn}\end{matrix}\right) \left(\begin{matrix}b_{11}&b_{12}&\cdots&b_{1m}\\b_{21}&b_{22}&\cdots&b_{2m}\\\vdots&\vdots&\ddots&\vdots\\b_{n1}&b_{n2}&\cdots&b_{nm}\end{matrix}\right) tr(AB)=tra11a21am1a12a22am2a1na2namnb11b21bn1b12b22bn2b1mb2mbnm
只考虑对角线上的元素,那么有
t r ( A B ) = ∑ i = 1 n a 1 i b i 1 + ∑ i = 1 n a 2 i b i 2 + … + ∑ i = 1 n a m i b i m = ∑ i = 1 m ∑ j = 1 n a i j b j i tr(AB)=\sum_{i=1}^{n}a_{1i}b_{i1}+\sum_{i=1}^{n}a_{2i}b_{i2}+\ldots+\sum_{i=1}^{n}a_{mi}b_{im}=\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}b_{ji} tr(AB)=i=1na1ibi1+i=1na2ibi2++i=1namibim=i=1mj=1naijbji

∂ t r ( A B ) ∂ a i j = b i j ⇒ ∂ t r ( A B ) ∂ A = B T \frac{\partial tr(AB)}{\partial a_{ij}}=b_{ij}\Rightarrow \frac{\partial tr(AB)}{\partial A}=B^T aijtr(AB)=bijAtr(AB)=BT

  • 定理四:

∂ t r ( A T B ) ∂ A = ∂ t r ( B A T ) ∂ A = B \frac{\partial{tr(A^TB)}}{\partial A}=\frac{\partial{tr(BA^T)}}{\partial A}=B Atr(ATB)=Atr(BAT)=B

证明:
∂ t r ( A T B ) ∂ A = ∂ t r ( ( A T B ) T ) ∂ A = ∂ t r ( B T A ) ∂ A = ∂ t r ( A B T ) ∂ A = ( B T ) T = B \frac{\partial{tr(A^TB)}}{\partial A}=\frac{\partial{tr((A^TB)^T)}}{\partial A}=\frac{\partial{tr(B^TA)}}{\partial A}=\frac{\partial{tr(AB^T)}}{\partial A}=(B^T)^T=B Atr(ATB)=Atr((ATB)T)=Atr(BTA)=Atr(ABT)=(BT)T=B

  • 定理五:

t r ( A ) = t r ( A T ) tr(A)=tr(A^T) tr(A)=tr(AT)

  • 定理六:如果a是实数,那么有tr(a)=a
  • 定理七:

∂ t r ( A B A T C ) ∂ A = C A B + C T A B T \frac{\partial tr(ABA^TC)}{\partial A}=CAB+C^TAB^T Atr(ABATC)=CAB+CTABT

证明:
∂ t r ( A B A T C ) ∂ A = ∂ t r ( A B A T C ) ∂ A + ∂ t r ( A T C A B ) ∂ A = ( B A T C ) T + C A B = C T A B T + C A B \frac{\partial tr(ABA^TC)}{\partial A}=\frac{\partial tr(ABA^TC)}{\partial A}+\frac{\partial tr(A^TCAB)}{\partial A}=(BA^TC)^T+CAB=C^TAB^T+CAB Atr(ABATC)=Atr(ABATC)+Atr(ATCAB)=(BATC)T+CAB=CTABT+CAB

最小二乘法矩阵推导:

设:
x = ( x 0 ( 1 ) x 0 ( 2 ) ⋯ x 0 ( m ) x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋱ ⋮ x n ( 1 ) x n ( 2 ) ⋯ x n ( m ) )          θ = ( θ 0 θ 1 ⋮ θ n )       X = x T       Y = ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) x=\left(\begin{matrix}x_0^{(1)}&x_0^{(2)}&\cdots&x_0^{(m)}\\x_1^{(1)}&x_1^{(2)}&\cdots&x_1^{(m)}\\\vdots&\vdots&\ddots&\vdots\\x_n^{(1)}&x_n^{(2)}&\cdots&x_n^{(m)} \end{matrix}\right)\ \ \ \ \ \ \ \ \theta=\left(\begin{matrix}\theta_0\\\theta_1\\\vdots\\\theta_n\end{matrix}\right)\ \ \ \ \ X=x^T\ \ \ \ \ Y=\left(\begin{matrix}y^{(1)}\\y{(2)}\\\vdots\\y_{(m)}\end{matrix}\right) x=x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)        θ=θ0θ1θn     X=xT     Y=y(1)y(2)y(m)
其中x的每一列表示一组特征值,共n个,每一行表示有m组数据,θ表示每一个特征值的系数,X表示特征矩阵,Y表示实际的结果值。

则:
X θ − Y = ( ∑ i = 0 n x i ( 1 ) θ i − y ( 1 ) ∑ i = 0 n x i ( 2 ) θ i − y ( 2 ) ⋮ ∑ i = 0 n x i ( m ) θ i − y ( m ) ) = ( h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ) X\theta-Y=\left(\begin{matrix}\sum_{i=0}^{n}x_i^{(1)}\theta_i-y^{(1)}\\\sum_{i=0}^{n}x_i^{(2)}\theta_i-y^{(2)}\\\vdots\\\sum_{i=0}^{n}x_i^{(m)}\theta_i-y^{(m)}\end{matrix}\right)=\left(\begin{matrix}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)}\\\vdots\\h_\theta(x^{(m)})-y^{(m)}\end{matrix}\right) XθY=i=0nxi(1)θiy(1)i=0nxi(2)θiy(2)i=0nxi(m)θiy(m)=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
目标函数:
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 = 1 2 t r [ ( X θ − Y ) T ( X θ − Y ) ] J(\theta)=\frac{1}{2}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2=\frac{1}{2}tr[(X\theta-Y)^T(X\theta-Y)] J(θ)=21i=1m(hθ(x(i))y(i))2=21tr[(XθY)T(XθY)]
使目标函数最小,得到的θ就是最匹配的解,对目标函数求导:
∂ J ( θ ) ∂ θ = 1 2 ∂ t r ( θ T X T X θ − θ T X T Y − Y T X θ + Y T Y ) ∂ θ = 1 2 [ ∂ t r ( θ T X T X θ ) ∂ θ − ∂ t r ( θ T X T Y ) ∂ θ − ∂ t r ( Y T X θ ) ∂ θ ] = 1 2 [ X T X θ + X T X θ − X T Y − X T Y ] = X T X θ − X T Y \begin{aligned} \frac{\partial J(\theta)}{\partial \theta} &= \frac{1}{2}\frac{\partial tr(\theta^TX^TX\theta-\theta^T X^TY-Y^TX\theta+Y^TY)}{\partial \theta}\\&= \frac{1}{2}[\frac{\partial tr(\theta^TX^TX\theta)}{\partial \theta}-\frac{\partial tr(\theta^T X^TY)}{\partial \theta}-\frac{\partial tr(Y^TX\theta)}{\partial \theta}]\\& =\frac{1}{2}[X^TX\theta+X^TX\theta-X^TY-X^TY]\\&=X^TX\theta-X^TY \end{aligned} θJ(θ)=21θtr(θTXTXθθTXTYYTXθ+YTY)=21[θtr(θTXTXθ)θtr(θTXTY)θtr(YTXθ)]=21[XTXθ+XTXθXTYXTY]=XTXθXTY
令导数等于0求解:
X T X θ − X T Y = 0 θ = ( X T X ) − 1 X T Y X^TX\theta-X^TY=0\\ \theta = (X^TX)^{-1}X^TY XTXθXTY=0θ=(XTX)1XTY

加权最小二乘法矩阵推导:

加权矩阵:
W = ( w 1 0 0 ⋯ 0 0 w 2 0 ⋯ 0 0 0 w 3 ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 ⋯ w m ) W=\left(\begin{matrix}w_1&0&0&\cdots&0\\0&w_2&0&\cdots&0\\0&0&w_3&\cdots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&w_m\end{matrix}\right) W=w10000w20000w30000wm
Wm×m的矩阵,此时目标函数为:
J ( θ ) = 1 2 ∑ i = 1 m w i ( h θ ( x ( i ) ) − y ( i ) ) 2 = 1 2 t r [ ( X θ − Y ) T W ( X θ − Y ) ] J(\theta)=\frac{1}{2}\sum_{i=1}^{m}w_i(h_\theta(x^{(i)})-y^{(i)})^2=\frac{1}{2}tr[(X\theta-Y)^TW(X\theta-Y)] J(θ)=21i=1mwi(hθ(x(i))y(i))2=21tr[(XθY)TW(XθY)]
同理,使目标函数最小,得到的θ就是最匹配的解,对目标函数求导:
∂ J ( θ ) ∂ θ = 1 2 ∂ t r ( θ T X T W X θ − θ T X T W Y − Y T W X θ + Y T W Y ) ∂ θ = 1 2 [ ∂ t r ( θ T X T W X θ ) ∂ θ − ∂ t r ( θ T X T W Y ) ∂ θ − ∂ t r ( Y T W X θ ) ∂ θ ] = 1 2 [ X T W X θ + X T W T X θ − X T W Y − X T W T Y ] \begin{aligned} \frac{\partial J(\theta)}{\partial \theta} &= \frac{1}{2}\frac{\partial tr(\theta^TX^TWX\theta-\theta^T X^TWY-Y^TWX\theta+Y^TWY)}{\partial \theta}\\&= \frac{1}{2}[\frac{\partial tr(\theta^TX^TWX\theta)}{\partial \theta}-\frac{\partial tr(\theta^T X^TWY)}{\partial \theta}-\frac{\partial tr(Y^TWX\theta)}{\partial \theta}]\\& =\frac{1}{2}[X^TWX\theta+X^TW^TX\theta-X^TWY-X^TW^TY] \end{aligned} θJ(θ)=21θtr(θTXTWXθθTXTWYYTWXθ+YTWY)=21[θtr(θTXTWXθ)θtr(θTXTWY)θtr(YTWXθ)]=21[XTWXθ+XTWTXθXTWYXTWTY]
又因为W是对角阵:
∂ J ( θ ) ∂ θ = 1 2 [ X T W X θ + X T W T X θ − X T W Y − X T W T Y ] = 1 2 [ X T W X θ + X T W X θ − X T W Y − X T W Y ] = X T W X θ − X T W Y \begin{aligned} \frac{\partial J(\theta)}{\partial \theta} &=\frac{1}{2}[X^TWX\theta+X^TW^TX\theta-X^TWY-X^TW^TY]\\&=\frac{1}{2}[X^TWX\theta+X^TWX\theta-X^TWY-X^TWY]\\& =X^TWX\theta-X^TWY \end{aligned} θJ(θ)=21[XTWXθ+XTWTXθXTWYXTWTY]=21[XTWXθ+XTWXθXTWYXTWY]=XTWXθXTWY
令导数等于0求解:
X T W X θ − X T W Y = 0 θ = ( X T W X ) − 1 X T W Y X^TWX\theta-X^TWY=0\\ \theta = (X^TWX)^{-1}X^TWY XTWXθXTWY=0θ=(XTWX)1XTWY

你可能感兴趣的:(LR,线性回归,最小二乘法,加权最小二乘法)