5.3 Ridge 回归分析

岭回归(英文名:ridge regression, Tikhonov regularization)是一种专用于共线性数据分析的有偏估计回归方法,实质上是一种改良的最小二乘估计法,通过放弃最小二乘法的无偏性,以损失部分信息、降低精度为代价获得回归系数更为符合实际、更可靠的回归方法,对病态数据的拟合要强于最小二乘法。
参数α控制要对模型进行正则化的程度。如果α=0,那么岭回归就是线性回归。如果α非常大,那么所有权重都非常接近于零,结果是一条穿过数据平均值的平线。
5.3 Ridge 回归分析_第1张图片

## 定义回归函数
from sklearn.linear_model import  Ridge
from sklearn.metrics import r2_score,mean_absolute_error
def ridge_regression(data,test, predictors,pre_y, alpha):
    # 拟合模型
    ridgereg = Ridge(alpha=alpha,normalize=True, max_iter=1e5)
    ridgereg.fit(data[predictors],data[pre_y])
    y_pred = ridgereg.predict(test[predictors])
    
    # 输出模型的结果
    ret = [alpha]
    ret.append(r2_score(test[pre_y],y_pred))
    ret.append(mean_absolute_error(test[pre_y],y_pred))
    ret.extend(ridgereg.coef_)
    return ret
# 初始化预测自变量 和预测因变量
predictors= ['AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
prey = "Y"
# 定义alpha的取值范围
alpha_ridge = np.linspace(0.00005,2,20)

# 初始化数据表用来板寸系数和得分
col = ['alpha','r2_score','mae','AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,len(alpha_ridge))]
coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)

## 百分之80的数据用于训练,剩余的数据用于测试
np.random.seed(24)
index = np.random.permutation(diabete.shape[0])
trainindex = index[1:350]
testindex = index[350:-1]
diabete_train = diabete.iloc[trainindex,:]
diabete_test = diabete.iloc[testindex,:]

#根据alpha的值迭代:
for i in range(len(alpha_ridge)):
    coef_matrix_ridge.iloc[i,] = ridge_regression(diabete_train,diabete_test,predictors,prey, alpha_ridge[i])

coef_matrix_ridge.sample(5)

5.3 Ridge 回归分析_第2张图片

# 初始化预测自变量 和预测因变量
predictors= ['AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
prey = "Y"
# 定义alpha的取值范围
alpha_lasso = np.linspace(0.00005,2,20)

# 初始化数据表用来板寸系数和得分
col = ['alpha','r2_score','mae','AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
ind = ['alpha_%.2g'%alpha_lasso[i] for i in range(0,len(alpha_lasso))]
coef_matrix_lasso = pd.DataFrame(index=ind, columns=col)
## 百分之80的数据用于训练,剩余的数据用于测试
np.random.seed(24)
index = np.random.permutation(diabete.shape[0])
trainindex = index[1:350]
testindex = index[350:-1]
diabete_train = diabete.iloc[trainindex,:]
diabete_test = diabete.iloc[testindex,:]

#根据alpha的值迭代:
for i in range(len(alpha_ridge)):
    coef_matrix_ridge.iloc[i] = ridge_regression(diabete_train,diabete_test,predictors,prey, alpha_ridge[i])

coef_matrix_ridge.sample(5)
coef_matrix_ridge.sort_values("mae").head(5)

5.3 Ridge 回归分析_第3张图片

ploty = ['AGE', 'SEX', 'BMI', 'BP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
shape = ["s","p","*","h","+","x","D","o","v",">"]
plt.figure(figsize=(15,6))
plt.grid("on")
plt.subplot(1,2,1)
for ii in np.arange(len(ploty)):
    plt.plot(coef_matrix_ridge["alpha"],coef_matrix_ridge[ploty[ii]],
             color = plt.cm.Set1(ii / len(ploty)),label = ploty[ii],
            marker = shape[ii])
    
    plt.legend()
    plt.xlabel("Alpha")
    plt.ylabel("标准化系数",FontProperties = fonts)

plt.subplot(1,2,2)
plt.plot(coef_matrix_ridge["alpha"],coef_matrix_ridge["mae"],linewidth = 2)
plt.xlabel("Alpha")
plt.ylabel("绝对值误差",FontProperties = fonts)
plt.suptitle("Ridge回归分析",FontProperties = fonts)
plt.show()

5.3 Ridge 回归分析_第4张图片

你可能感兴趣的:(机器学习)