最小二乘法的原理实质比较简单,本质的目的是要确定如下建立的一元线性回归模型的两个回归参数 a 1 a_1 a1和 b 1 b_1 b1:
y = a 1 x + b 1 y=a_1x+b_1 y=a1x+b1
若已知m组样本观测数据 ( x i , y i ) ( i = 1 , 2 , 3 , 4 ⋅ ⋅ ⋅ m ) (x_i,y_i)(i=1,2,3,4···m) (xi,yi)(i=1,2,3,4⋅⋅⋅m),经典的做法是根据离差平方和来达到一种最小的准则来进行确定的,即确定满足下面条件的 a 1 ′ a_1' a1′和 b 1 ′ b_1' b1′,它们使得下面函数取到最小值:
Q ( a 1 ′ , b 1 ′ ) = ∑ i = 1 m ( y i − a 1 ′ x i − b 1 ′ ) 2 Q(a_1',b_1')=\sum_{i=1}^m(y_i-a_1'x_i-b_1')^2 Q(a1′,b1′)=i=1∑m(yi−a1′xi−b1′)2
然后的问题就变得简单了,通过我们上述 Q Q Q函数的定义我们不难知道 Q ( a 1 ′ , b 1 ′ ) Q(a_1',b_1') Q(a1′,b1′)是一个非负函数,关于 a 1 ′ , b 1 ′ a_1',b_1' a1′,b1′的导数是存在的,通过分别对 a 1 ′ , b 1 ′ a_1',b_1' a1′,b1′来进行求导并令其为零会得到:
∂ Q ( a 1 ′ , b 1 ′ ) ∂ a 1 ′ = ∑ i = 1 m − 2 x i ( y i − a 1 ′ x i − b 1 ′ ) = 0 ⟶ ∑ i = 1 m x i y i = ∑ i = 1 m b 1 ′ x i + a 1 ′ ∑ i = 1 m x i 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ① \frac{\partial Q(a_1',b_1')}{\partial a_1'}=\sum_{i=1}^m-2x_i(y_i-a_1'x_i-b_1')=0\longrightarrow \sum_{i=1}^mx_iy_i=\sum_{i=1}^mb_1'x_i+a_1'\sum_{i=1}^mx_i^2·······························① ∂a1′∂Q(a1′,b1′)=i=1∑m−2xi(yi−a1′xi−b1′)=0⟶i=1∑mxiyi=i=1∑mb1′xi+a1′i=1∑mxi2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅①
∂ Q ( a 1 ′ , b 1 ′ ) ∂ b 1 ′ = ∑ i = 1 m − 2 ( y i − a 1 ′ x i − b 1 ′ ) = 0 ⟶ ∑ i = 1 m y i = m b 1 ′ + a 1 ′ ∑ i = 1 m x i ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ② \frac{\partial Q(a_1',b_1')}{\partial b_1'}=\sum_{i=1}^m-2(y_i-a_1'x_i-b_1')=0\longrightarrow \sum_{i=1}^my_i=mb_1'+a_1'\sum_{i=1}^mx_i·······································② ∂b1′∂Q(a1′,b1′)=i=1∑m−2(yi−a1′xi−b1′)=0⟶i=1∑myi=mb1′+a1′i=1∑mxi⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅②
联立 ① , ② ①,② ①,②分别对 m m m取平均我们很容易就求出来了 a 1 ′ 和 b 1 ′ a_1'和b_1' a1′和b1′:
a 1 ′ = x y ‾ − x ˉ y ˉ x 2 ‾ − ( x ‾ ) 2 a_1'={\overline {xy}-\bar x\bar y\over \overline{x^2}-(\overline{x})^2} a1′=x2−(x)2xy−xˉyˉ
b 1 ′ = y ‾ − a 1 ′ x ‾ b_1'=\overline {y}-a_1'\overline {x} b1′=y−a1′x
我们都知道最小二乘法并不能保证所有的样本测试数据点都在整个回归直线上面,而是比较“均匀”的分布在直线两边,我们可以考虑一种概率来重新对数据拟合误差的大小进行一种新规定:
若已知m组样本观测数据 ( x i , y i ) , y i ≠ 0 ( i = 1 , 2 , 3 , 4 ⋅ ⋅ ⋅ m ) (x_i,y_i),y_i\not=0(i=1,2,3,4···m) (xi,yi),yi=0(i=1,2,3,4⋅⋅⋅m),我们新规定线性回归模型如下:
y = a 2 x + b 2 y=a_2x+b_2 y=a2x+b2
下面定义概率 P i P_i Pi如下:
P i = ∣ y i − a 2 x i − b 2 ∣ y i P_i={|y_i-a_2x_i-b_2|\over y_i} Pi=yi∣yi−a2xi−b2∣
发现绝对值的存在显然不便于我们处理,我们不妨改进 P i P_i Pi的定义如下:
P i ( a 2 , b 2 ) = ( y i − a 2 x i − b 2 y i ) 2 P_i(a_2,b_2)=({y_i-a_2x_i-b_2\over y_i})^2 Pi(a2,b2)=(yiyi−a2xi−b2)2
而我们所要求的即为满足如下的函数 R R R的最小值时的 a 2 和 b 2 a_2和b_2 a2和b2:
R ( a 2 , b 2 ) = ∑ i = 1 m P i R(a_2,b_2)=\sum_{i=1}^mP_i R(a2,b2)=i=1∑mPi
根据 R R R函数的定义我们知道 R ( a 2 , b 2 ) R(a_2,b_2) R(a2,b2)是一个非负函数,关于 a 2 , b 2 a_2,b_2 a2,b2有偏导数令其偏导为零有:
∂ R ( a 2 , b 2 ) ∂ a 2 = ∑ i = 1 m − 2 x i y i ( 1 − x i y i a 2 − b 2 y i ) = 0 ⟶ ∑ i = 1 m x i y i = a 2 ∑ i = 1 m x i 2 y i 2 + b 2 ∑ i = 1 m x i y i 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ③ \frac{\partial R(a_2,b_2)}{\partial a_2}=\sum_{i=1}^m-2{x_i\over y_i}(1-{x_i\over y_i}a_2-{b_2\over y_i})=0\longrightarrow \sum_{i=1}^m{x_i\over y_i}=a_2\sum_{i=1}^m{x_i^2\over y_i^2}+b_2\sum_{i=1}^m{x_i\over y_i^2}·····························③ ∂a2∂R(a2,b2)=i=1∑m−2yixi(1−yixia2−yib2)=0⟶i=1∑myixi=a2i=1∑myi2xi2+b2i=1∑myi2xi⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅③
∂ R ( a 2 , b 2 ) ∂ b 2 = ∑ i = 1 m − 2 1 y i ( 1 − x i y i a 2 − b 2 y i ) = 0 ⟶ ∑ i = 1 m 1 y i = a 2 ∑ i = 1 m x i y i 2 + b 2 ∑ i = 1 m 1 y i 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ④ \frac{\partial R(a_2,b_2)}{\partial b_2}=\sum_{i=1}^m-2{1\over y_i}(1-{x_i\over y_i}a_2-{b_2\over y_i})=0\longrightarrow \sum_{i=1}^m{1\over y_i}=a_2\sum_{i=1}^m{x_i\over y_i^2}+b_2\sum_{i=1}^m{1\over y_i^2}·····························④ ∂b2∂R(a2,b2)=i=1∑m−2yi1(1−yixia2−yib2)=0⟶i=1∑myi1=a2i=1∑myi2xi+b2i=1∑myi21⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅④
联立 ③ , ④ ③,④ ③,④分别对 m m m取平均我们很容易就求出来了 a 2 和 b 2 a_2和b_2 a2和b2:
b 2 = ( x y ) ‾ × ( x y 2 ) ‾ − ( 1 y ) ‾ × ( x 2 y 2 ) ‾ ( x y 2 ) ‾ × ( x y 2 ) ‾ − ( 1 y 2 ) ‾ × ( x 2 y 2 ) ‾ b_2={\overline{({x\over y})}\times\overline{({x\over y^2})}-\overline{({1\over y})}\times\overline{({x^2\over y^2})}\over\overline{({x\over y^2})}\times\overline{({x\over y^2})}-\overline{({1\over y^2})}\times\overline{({x^2\over y^2})}} b2=(y2x)×(y2x)−(y21)×(y2x2)(yx)×(y2x)−(y1)×(y2x2)
a 2 = ( 1 y ) ‾ − b 2 ( 1 y 2 ) ‾ ( x y 2 ) ‾ a_2={\overline{({1\over y})}-b_2\overline{({1\over y^2})}\over\overline{({x\over y^2})}} a2=(y2x)(y1)−b2(y21)
之后通过利用编程来进行计算这两个参数即可,这种方法对我们的初始数据 ( x i , y i ) (x_i,y_i) (xi,yi)中的 y i y_i yi有着特殊的要求。
我们仍假定所求回归直线为:
y = a 3 x + b 3 y=a_3x+b_3 y=a3x+b3
经典最小二乘法仅仅是考虑的两点之间的直线距离,我们不妨这样考虑:可否将我们所有的样本点 ( x i , y i ) ( i = 1 , 2 , 3 , 4 ⋅ ⋅ ⋅ m ) (x_i,y_i)(i=1,2,3,4···m) (xi,yi)(i=1,2,3,4⋅⋅⋅m)向回归直线做垂线段,并将此平方总和作为度量指标来替代经典二乘法的距离呢?
据此想法,我们定义 S S S函数如下:
S ( a 3 , b 3 ) = ∑ i = 1 m ( a 3 x i − y i + b 3 ) 2 a 3 2 + 1 S(a_3,b_3)=\sum_{i=1}^m {(a_3x_i-y_i+b_3)^2\over a_3^2+1} S(a3,b3)=i=1∑ma32+1(a3xi−yi+b3)2
而我们所要求的即为满足如下的函数 S S S的最小值时的 a 3 和 b 3 a_3和b_3 a3和b3:
根据 S S S函数的定义我们知道 S ( a 3 , b 3 ) S(a_3,b_3) S(a3,b3)是一个非负函数,关于 a 3 , b 3 a_3,b_3 a3,b3有偏导数令其偏导为零有:
∂ S ( a 3 , b 3 ) ∂ a 3 = ∑ i = 1 m 2 x i ( a 3 x i − y i + b 3 ) ( a 3 2 + 1 ) − 2 a 3 ( a 3 x i − y i + b 3 ) 2 a 3 2 + 1 = 0 \frac{\partial S(a_3,b_3)}{\partial a_3}=\sum_{i=1}^m {2x_i(a_3x_i-y_i+b_3)(a_3^2+1)-2a_3(a_3x_i-y_i+b_3)^2\over a_3^2+1}=0 ∂a3∂S(a3,b3)=i=1∑ma32+12xi(a3xi−yi+b3)(a32+1)−2a3(a3xi−yi+b3)2=0
∂ S ( a 3 , b 3 ) ∂ b 3 = ∑ i = 1 m 2 ( a 3 x i − y i + b 3 ) a 3 2 + 1 = 0 \frac{\partial S(a_3,b_3)}{\partial b_3}=\sum_{i=1}^m {2(a_3x_i-y_i+b_3)\over a_3^2+1}=0 ∂b3∂S(a3,b3)=i=1∑ma32+12(a3xi−yi+b3)=0
即:
∑ i = 1 m a 3 ( x i 2 − y i 2 ) + ∑ i = 1 m ( a 3 2 − 1 ) x i y i + ∑ i = 1 m b 3 ( 1 − a 3 2 ) x i + ∑ i = 1 m 2 a 3 b 3 y i = a 3 b 3 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⑤ \sum_{i=1}^ma_3(x_i^2-y_i^2)+\sum_{i=1}^m(a_3^2-1)x_iy_i+\sum_{i=1}^mb_3(1-a_3^2)x_i+\sum_{i=1}^m2a_3b_3y_i=a_3b_3^2····································⑤ i=1∑ma3(xi2−yi2)+i=1∑m(a32−1)xiyi+i=1∑mb3(1−a32)xi+i=1∑m2a3b3yi=a3b32⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑤
∑ i = 1 m ( a 3 x i − y i + b 3 ) = 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⑥ \sum_{i=1}^m(a_3x_i-y_i+b_3)=0·····························⑥ i=1∑m(a3xi−yi+b3)=0⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑥
而由 ⑥ ⑥ ⑥我们可知:
a 3 x ‾ − y ‾ + b 3 = 0 a_3\overline x-\overline y+b_3=0 a3x−y+b3=0
b 3 = y ‾ − a 3 x ‾ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⑦ b_3=\overline y-a_3\overline x·····························⑦ b3=y−a3x⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⑦
联立 ⑤ , ⑦ ⑤,⑦ ⑤,⑦分别对 m m m取平均我们很容易就求出来了 a 3 和 b 3 a_3和b_3 a3和b3:
a 3 2 ( x y ‾ − x ˉ y ˉ ) + a 3 ( x 2 − y 2 ‾ − x ˉ x ˉ + y ˉ y ˉ ) + ( x ˉ y ˉ − x y ‾ ) a_3^2(\overline{xy}-\bar x\bar y)+a_3(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)+(\bar x\bar y-\overline{xy}) a32(xy−xˉyˉ)+a3(x2−y2−xˉxˉ+yˉyˉ)+(xˉyˉ−xy)
a 3 = − ( x 2 − y 2 ‾ − x ˉ x ˉ + y ˉ y ˉ ) ± ( x 2 − y 2 ‾ − x ˉ x ˉ + y ˉ y ˉ ) 2 − 4 ( x y ‾ − x ˉ y ˉ ) 2 2 ( x y ‾ − x ˉ y ˉ ) ( ( x y ‾ − x ˉ y ˉ ) ≠ 0 ) a_3={-(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)\pm\sqrt{(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)^2-4(\overline{xy}-\bar x\bar y)^2}\over2(\overline{xy}-\bar x\bar y)}((\overline{xy}-\bar x\bar y)\not=0) a3=2(xy−xˉyˉ)−(x2−y2−xˉxˉ+yˉyˉ)±(x2−y2−xˉxˉ+yˉyˉ)2−4(xy−xˉyˉ)2((xy−xˉyˉ)=0)
a 3 = − ( x ˉ y ˉ − x y ‾ ) ( x 2 − y 2 ‾ − x ˉ x ˉ + y ˉ y ˉ ) ( ( x y ‾ − x ˉ y ˉ ) = 0 ) a_3={-(\bar x\bar y-\overline{xy})\over(\overline{x^2-y^2}-\bar x\bar x+\bar y\bar y)}((\overline{xy}-\bar x\bar y)=0) a3=(x2−y2−xˉxˉ+yˉyˉ)−(xˉyˉ−xy)((xy−xˉyˉ)=0)
再由 ⑦ ⑦ ⑦可得 b 3 b_3 b3。
根据上面我们会发现有两个 a 3 a_3 a3供我们选择,当然我们选择哪一个是要根据题意而定的。并非都取,要结合实际数据来看。
上述为笔者考虑到的最小二乘法的一些改进方案,主要考虑了概率和垂线两种情况,未尝难免出现运算错误和稳定性考虑不周的情况,衷心恳请大家批评指正,笔者在此深表感谢!