岭回归在最小二乘法的基础上加上了一个 l 2 l_2 l2惩罚项
损失函数: J ( θ ) = 1 2 m ∑ i = 1 m [ ( ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 ) ] J\left(\theta \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{[({{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})}^{2}}+\lambda \sum\limits_{j=1}^{n}{\theta _{j}^{2}})]} J(θ)=2m1i=1∑m[((hθ(x(i))−y(i))2+λj=1∑nθj2)]
θ = ( X ′ X + α I ) − 1 X ′ Y \theta=(X'X+\alpha I)^{-1}X'Y θ=(X′X+αI)−1X′Y
一般形式:
重复以下步骤 直到收敛:
θ 0 : = θ 0 − a 1 m ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ) {\theta_0}:={\theta_0}-a\frac{1}{m}\sum\limits_{i=1}^{m}{(({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{0}^{(i)}}) θ0:=θ0−am1i=1∑m((hθ(x(i))−y(i))x0(i))
θ j : = θ j − a [ 1 m ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m θ j ] {\theta_j}:={\theta_j}-a[\frac{1}{m}\sum\limits_{i=1}^{m}{(({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{j}^{\left( i \right)}}+\frac{\lambda }{m}{\theta_j}] θj:=θj−a[m1i=1∑m((hθ(x(i))−y(i))xj(i)+mλθj]
j = 1 , 2 , . . . n j=1,2,...n j=1,2,...n
θ j : = θ j ( 1 − a λ m ) − a 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) {\theta_j}:={\theta_j}(1-a\frac{\lambda }{m})-a\frac{1}{m}\sum\limits_{i=1}^{m}{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{j}^{\left( i \right)}} θj:=θj(1−amλ)−am1i=1∑m(hθ(x(i))−y(i))xj(i)
α \alpha α是控制模型复杂度的因子,可看做收缩率的大小。 α \alpha α越大,收缩率越大,系数对于共线性的鲁棒性更强
矩阵形式:
θ = θ ( 1 − α λ m ) − α 1 m α X T ( X θ − Y ) \theta= \theta(1-\alpha \frac \lambda m) -\alpha \frac 1 m \alpha{X}^T({X\theta} -{Y}) θ=θ(1−αmλ)−αm1αXT(Xθ−Y)
import numpy as np
# 准备数据
X = np.array([[1, 2], [3, 2], [1, 3], [2, 3], [3, 3], [3, 4]])
y = np.array([3.1, 5.1, 4.2, 5.2, 5.9, 6.8])
n_samples, n_features = X.shape
# # 给X添加一列1, 将y转换成(n_samples, 1) 便于计算
X = np.concatenate((np.ones(n_samples).reshape((n_samples, 1)), X), axis=1)
y = y.reshape((n_samples, 1))
- **用正规方程求解theta**
```python
from numpy.linalg import pinv
alpha = 0.1
theta = pinv(X.T @ X + alpha) @ X.T @ y # A@B 等于 np.dot(A, B)
intercept = theta[0, 0] # 截距项
coef = theta[1:, 0] # 系数
func = intercept + coef[0]
print("截距项:%s" % intercept)
print("系数:%s" % coef)
输出:
截距项:0.06761565836301364
系数:[0.92597865 1.03843416]
梯度下降法大同小异,仿照普通最小二乘法里的实现即可,这里不再重复。