Generalized Linear Models 官方文档
Generalized Linear Models 中文文档
先把要导入的库声明了:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
data = pd.read_csv('xxx.csv')
data.head()#读取前五行数据,如果是最后五行,用data.tail()
data.shape #查看数据维度
X = data[['f1', 'f2', 'f3', 'f4']] #样本特征
X.head()
y = data[['t']] #样本输出
y.head()
把X和y的样本组合划分成两部分,一部分是训练集,一部分是测试集,代码如下:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
查看下训练集和测试集的维度:
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X_train, y_train)`
模型系数结果:
print (linreg.intercept_)
print (linreg.coef_)
#模型拟合测试集
y_pred = linreg.predict(X_test)
from sklearn import metrics
#用scikit-learn计算MSE
print ("MSE:",metrics.mean_squared_error(y_test, y_pred))
print ("RMSE:",np.sqrt(metrics.mean_squared_error(y_test, y_pred)))# 用scikit-learn计算RMSE
采用10折交叉验证,即cross_val_predict中的cv参数为10:
X = data[['AT', 'V', 'AP', 'RH']]
y = data[['PE']]
from sklearn.model_selection import cross_val_predict
predicted = cross_val_predict(linreg, X, y, cv=10)
# 用scikit-learn计算MSE
print ("MSE:",metrics.mean_squared_error(y, predicted))
# 用scikit-learn计算RMSE
print ("RMSE:",np.sqrt(metrics.mean_squared_error(y, predicted)))
fig, ax = plt.subplots()
ax.scatter(y, predicted)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()
点击参考链接