使用线性回归实现波士顿房价预测

使用线性回归实现波士顿房价预测

本文不进行线性回归基础知识讲解,只提供三种方法对波士顿房价进行预测,这三种方法分别是:

1、使用正规方程的优化方法对波士顿房价进行预测
2、使用梯度下降的优化方法对波士顿房价进行预测
3、使用岭回归对波士顿房价进行预测

分别提供三段不同的代码:
共同代码段1:

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

使用正规方程的优化方法对波士顿房价进行预测
代码:

def linear_regression_and_normal_equation():
    """
    使用正规方程的优化方法对波士顿房价进行预测
    :return:
    """

    # 1) 获取数据
    boston = load_boston()

    # 2)划分数据集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target,random_state=1)

    # 3)标准化
    # 首先使用transfer = StandardScaler()来实例化一个转换器
    transfer = StandardScaler()
    # 我们要对训练集和测试集进行相同的归一化 / 标准化处理
    # 先处理训练集
    # fit_transform是fit(求出数据的平方差、均值等数据)和transform(进行标准化计算)的结合
    # 此时,标准化所需要的参数已经被训练出来并且存储到实例化的transfer(StandardScaler类)中了
    # 因为要对测试集进行相同的标准的标准化
    # 所以必须使用训练集中的标准化参数(平方差、均值等)来标准化
    x_train = transfer.fit_transform(x_train)
    # 直接进行标准化计算,不再重复使用fit()计算一个新的标准化参数
    x_test = transfer.transform(x_test)

    # 4)预估器
    # 使用线性回归模型
    # fit_intercept:是否计算偏置
    # LinearRegression.coef_:回归系数
    # LinearRegression.intercept_:偏置
    estimator = LinearRegression(fit_intercept=True)
    estimator.fit(x_train, y_train)

    # 5)得出模型
    # coef_是权重系数,intercept_是偏置
    print("正规方程的权重系数为:\n",estimator.coef_)
    print("正规方程的偏置为:\n",estimator.intercept_)

    # 6)模型评估
    y_predict = estimator.predict(x_test)
    score = estimator.score(x_test, y_test)
    print("预测房价为:\n",y_predict)
    error = mean_squared_error(y_test,y_predict)
    print("正规方程的均方误差为:\n",error)

    return None

使用梯度下降的优化方法对波士顿房价进行预测
代码:

def linear_regression_and_gradient_descent():
    """
    使用梯度下降的优化方法对波士顿房价进行预测
    :return:
    """
    # 1)获取数据
    boston = load_boston()

    # 2划分数据集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=1)

    # 3)标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4)预估器
    # SGDRegressor类实现了随机梯度下降学习,它支持不同的loss函数和正则化惩罚项来拟合线性回归模型
    # loss =”squared_loss”: 普通最小二乘法
    # fit_intercept:是否计算偏置
    # learning_rate: string, optional
    # 学习率填充
    # 'constant': eta = eta0
    # 'optimal': eta = 1.0 / (alpha * (t + t0))[default]
    # 'invscaling': eta = eta0 / pow(t, power_t)
    # power_t = 0.25:存在父类当中
    # 对于一个常数值的学习率来说,可以使用learning_rate =’constant’ ,并使用eta0来指定学习率
    estimator = SGDRegressor(learning_rate="constant", eta0=0.0001, max_iter=1000, penalty="l1")
    estimator.fit(x_train, y_train)

    # 5)得出摸型
    print("梯度下降的权重系数为:\n", estimator.coef_)
    print("梯度下降的偏置为:\n", estimator.intercept_)

    # 6)模型估计
    y_predict = estimator.predict(x_test)
    score = estimator.score(x_test, y_test)
    print("预测房价为:\n",y_predict)
    error = mean_squared_error(y_test, y_predict)
    print("梯度下降的均方误差为:\n",error)

    return None

使用岭回归对波士顿房价进行预测

def ridge_return():
    """
    使用岭回归对波士顿房价进行预测
    :return:
    """
    # 1)获取数据
    boston = load_boston()

    # 2)划分数据集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target,random_state=1)

    # 3)标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4)预估器
    estimator = Ridge(alpha=0.5, max_iter=1000)
    estimator.fit(x_train, y_train)

    # 5)得出模型
    print("岭回归的权重系数为:\n", estimator.coef_)
    print("岭回归的偏置为:\n", estimator.intercept_)

    # 6)模型评估
    y_predict = estimator.predict(x_test)
    print("预测房价为:\n", y_predict)
    error = mean_squared_error(y_test, y_predict)
    print("岭回归的均方误差为:\n",error)

    return None

输出结果分别为:
输出结果1:

正规方程的权重系数为:
 [-1.07145146  1.34036243  0.26298069  0.66554537 -2.49842551  1.97524314
  0.19516605 -3.14274974  2.66736136 -1.80685572 -2.13034748  0.56172933
 -4.03223518]
正规方程的偏置为:
 22.344591029023768
预测房价为:
 [32.37816533 27.95684437 18.07213891 21.63166556 18.93029508 19.96277202
 32.2834674  18.06715668 24.72989076 26.85359369 27.23326816 28.57021239
 21.18778302 26.94393815 23.37892579 20.89176865 17.11746934 37.73997945
 30.51980066  8.44489436 20.86557977 16.21989418 25.13605925 24.77658813
 31.40497629 11.02741407 13.82097563 16.80208261 35.94637198 14.7155729
 21.23939821 14.15079469 42.72492585 17.83887162 21.84610225 20.40178099
 17.50287927 27.00093206  9.80760408 20.00288662 24.27066782 21.06719021
 29.47089776 16.48482565 19.38852695 14.54778282 39.39838319 18.09810655
 26.22164983 20.60676525 25.09994066 24.48366723 25.02297948 26.84986898
  5.01517985 24.12809513 10.72843392 26.83178157 16.8023533  35.48142073
 19.50937911 27.43260347 16.58016763 19.151488   10.9990262  32.05016535
 36.32672849 21.8596379  24.8158357  25.32934192 23.36795453  6.98356201
 16.83774771 20.27043864 20.74890857 21.85918305 34.17775836 27.94673486
 24.86029952 34.43415796 18.61651831 24.02302532 34.45439496 13.32264718
 20.7154011  30.1583435  17.06611728 24.20119805 19.18051951 16.98160423
 26.8073424  41.02666829 14.44767989 23.26993252 14.93803206 21.93017824
 22.81878103 29.16467031 36.7033389  20.41387117 17.86800518 17.49942601
 25.07246443 21.9827349   8.28652561 21.52177032 16.50788716 33.00114509
 24.49693379 25.08491201 38.29621948 28.93273167 14.85478187 34.7429184
 35.50029467 32.89599805 20.98069467 16.67849644 34.24728954 39.01179205
 21.57169864 15.71337489 27.33121768 18.73350137 27.27438226 21.16402252
 26.00459084]
正规方程的均方误差为:
 21.89776539604949
梯度下降的权重系数为:
 [-0.94499129  1.00634505 -0.12590732  0.71505854 -2.07058046  2.16521589

输出结果2:

梯度下降的权重系数为:
 [-0.93676107  0.98152698 -0.14014208  0.71501736 -2.02611755  2.18394075
  0.03139983 -2.73540994  1.46600721 -0.60423607 -2.05478787  0.58386292
 -3.90185393]
梯度下降的偏置为:
 [22.34426927]
预测房价为:
 [31.18639657 28.19527592 17.96811934 22.20791867 18.66980745 20.36987809
 31.09332616 18.29799828 24.16032186 27.06868223 26.78527762 29.1182838
 21.66923101 26.42267569 23.1676702  20.22431387 17.1207814  37.76882252
 30.36632039  9.13685398 20.83915777 16.82072792 25.21211636 24.96362759
 31.01324824 10.87052281 14.1014198  18.19859554 35.82813641 14.35581674
 22.68150919 14.26848804 41.49407426 17.94832567 23.42763585 20.9339498
 17.57139821 27.51299259  9.01040479 19.70885878 25.84136556 21.35853888
 29.01453462 16.00399501 18.99725416 14.83921608 39.66773549 17.88980634
 25.99075092 20.9186056  24.99182006 24.37586226 25.28756173 26.66064526
  6.75325996 23.81054223 10.73519652 26.73602542 17.15385591 35.57872445
 19.46864367 27.48576748 16.25817669 18.62300659 11.07716975 31.73115521
 36.44869999 24.10721527 24.5051846  25.1182554  23.72115641  6.71086633
 16.26099667 20.7468384  21.00376525 21.46441168 33.77687    28.25277658
 25.64765147 33.35661856 19.01404206 24.43065033 34.78766657 13.61657778
 22.05694728 30.19359326 17.10169057 24.6512378  19.61667835 17.44138911
 26.9248539  41.18111603 16.267368   23.37020207 15.10057358 22.07795728
 23.077651   28.08546169 36.60878188 20.97447931 17.33109599 17.66200724
 25.23656026 21.90833021  8.14354442 21.94773738 15.8092484  32.828879
 24.09595903 25.64513248 38.15352645 28.73956101 14.74159077 33.68079238
 35.22712815 33.61814523 20.87523933 16.96357911 33.64365487 39.02735123
 23.00033891 15.87137002 27.68229335 18.87636921 26.91619555 21.58538843
 25.83561994]
梯度下降的均方误差为:
 21.855710521829288

输出结果3:

岭回归的权重系数为:
 [-1.06529961  1.32518886  0.24624079  0.66760443 -2.47705108  1.98317048
  0.188086   -3.12079188  2.61499643 -1.75783793 -2.12515378  0.56218556
 -4.02074809]
岭回归的偏置为:
 22.344591029023768
预测房价为:
 [32.32171816 27.96473106 18.07762496 21.6584351  18.91317728 19.97907593
 32.23001454 18.08173832 24.70796802 26.85817932 27.20441876 28.59066609
 21.20627903 26.92359717 23.36846228 20.85622895 17.11841625 37.7282184
 30.51049407  8.48543953 20.8656972  16.24631376 25.13674703 24.78269506
 31.38492794 11.02678908 13.84344969 16.86896027 35.92889522 14.70147436
 21.30765463 14.17015584 42.66238417 17.84150857 21.91224552 20.42360708
 17.50470501 27.02162833  9.77774288 19.98841513 24.33324787 21.0836586
 29.44463627 16.46738564 19.36988861 14.56343621 39.39246305 18.08635958
 26.20485675 20.61947142 25.08769334 24.48276159 25.03403759 26.83485936
  5.10006976 24.117713   10.73669    26.82113213 16.82369318 35.47440863
 19.50368158 27.43151293 16.56613491 19.12745536 11.01854625 32.02835221
 36.3197916  21.95399064 24.80308807 25.3146707  23.38284677  6.98045299
 16.80990059 20.29391707 20.75941077 21.83783801 34.14286814 27.9604287
 24.88527725 34.38382632 18.63464739 24.03514951 34.45856851 13.33963363
 20.77835292 30.15771549 17.06775827 24.21923573 19.19985773 17.0044756
 26.80676104 41.01666633 14.52497646 23.27583675 14.94335469 21.9401705
 22.82884209 29.11551685 36.68655442 20.43790273 17.84319503 17.50550001
 25.07892786 21.98470199  8.28972361 21.54150416 16.47783812 32.9812746
 24.48042779 25.10382922 38.27534467 28.91747933 14.85250764 34.69013874
 35.47638061 32.9188873  20.97955231 16.69501499 34.21593069 38.99901769
 21.63096553 15.72169492 27.34400955 18.73620332 27.26181716 21.18042244
 25.99630827]
岭回归的均方误差为:
 21.895561133849878

文章可能存在多处不足,还请各位大佬批评指正。
完整可运行代码请点击:https://download.csdn.net/download/weixin_44525542/86234542

你可能感兴趣的:(数据集分类,线性回归,机器学习,python)