项目链接+源代码:https://github.com/w1449550206/Boston-house-price-forecast.git
给定的这些特征,是专家们得出的影响房价的结果属性。我们此阶段不需要自己去探究特征是否有用,只需要使用这些特征。到后面量化很多特征需要我们自己去寻找
回归当中的数据大小不一致,是否会导致结果影响较大。所以需要做标准化处理。
均方误差(Mean Squared Error)MSE)评价机制:
注:yi为预测值,¯y为真实值
#要用到的包
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import SGDRegressor
data = load_boston()
x_train,x_test,y_train,y_test = train_test_split(data.data, data.target,test_size = 0.2, random_state = 10)#random_state = 10保证划分数据集一样的
#用接口standardscaler
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
estimator = LinearRegression()
estimator.fit(x_train,y_train)
y_predict = estimator.predict(x_test)
mean_squared_error(y_pred=y_predict,y_true=y_test)#得到了均方误差,越小越好
data = load_boston()
x_train,x_test,y_train,y_test = train_test_split(data.data, data.target,test_size = 0.2, random_state = 10)#random_state = 10保证划分数据集一样的
#用接口standardscaler
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
estimator = SGDRegressor(max_iter=1000,tol=0.001)#tol=0.001是指的是每次迭代是否损失函数越来越小,如果损失函数的值小于0.001的话就停止迭代
estimator.fit(x_train,y_train)
y_predict = estimator.predict(x_test)
mean_squared_error(y_pred=y_predict,y_true=y_test)#得到了均方误差,越小越好
我们也可以尝试去修改学习率
estimator = SGDRegressor(max_iter=1000,learning_rate="constant",eta0=0.1)
此时我们可以通过调参数,找到学习率效果更好的值。