使用逻辑回归预测波士顿房价

逻辑回归

房价预测的例子是很多机器学习课程的经典入门案例,房价受多种因素的影响,例如房屋面积、卧室数量等等,那么是否存在一个方程式能够表达这些因素与房价的定量关系呢?其实这就是机器学习需要解决的问题,寻找最佳匹配方程。本次实验采用的是波士顿房价数据集,关于该数据集,可以参考sklearn的datasets。房价是一个连续值,预测连续值的问题属于逻辑回归问题。借助python强大的机器学习类库,我们可以轻松地实现一个模型,那不同的模型在这个问题上的表现又如何呢,下面是一个简单的测试。

数据准备

from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
X, y = load_boston(True)
X = scale(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
print(X_train[0], y_train[0])
[ 0.67497414 -0.48772236  1.01599907 -0.27259857  1.60072524 -0.93690454
  0.900575   -0.94020538  1.66124525  1.53092646  0.80657583  0.44105193
  1.43354842] 12.8

线性模型

这部分模型并没有进行调参优化,所有参数均采用了默认值

Lasso模型

from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score
lasso = Lasso()
y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(r2_score(y_train,lasso.predict(X_train)))
print(r2_score_lasso)
0.6558542290928164
0.7063382166531593

ElasticNet模型

from sklearn.linear_model import ElasticNet
enet = ElasticNet()
y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(r2_score(y_train,enet.predict(X_train)))
print(r2_score_enet)
0.6319495878879611
0.6989301886731756

SVR模型

from sklearn import svm
svr = svm.SVR()
svr.fit(X_train,y_train)
y_pred_svr = svr.predict(X_test)
print(r2_score(y_train,svr.predict(X_train)))
print(r2_score(y_test,y_pred_svr))
0.6430295360107772
0.6908834297202064

深度模型

这个模型其实是一个“很浅很窄”的模型,但其实深度模型之所以强大,就在于深度,每一层可以发现不同层次的特征,模型越深层次越深。这里选择一个很浅很窄模型的原因是数据量太少,模式太深训练不起来,而且容易过拟合。

from keras.models import Sequential
from keras.layers import Dense

seq = Sequential()
seq.add(Dense(13, activation='relu',input_dim=X.shape[1]))
seq.add(Dense(13, activation='relu'))
seq.add(Dense(1, activation='relu'))
seq.compile(loss='mse', optimizer='adam')
seq.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=200, batch_size=10)
print(r2_score(y_test, seq.predict(X_test)))
Epoch 195/200
354/354 [==============================] - 0s 167us/step - loss: 8.2531 - val_loss: 7.7015
Epoch 196/200
354/354 [==============================] - 0s 156us/step - loss: 8.0658 - val_loss: 7.7211
Epoch 197/200
354/354 [==============================] - 0s 167us/step - loss: 8.0975 - val_loss: 7.6756
Epoch 198/200
354/354 [==============================] - 0s 164us/step - loss: 8.0590 - val_loss: 7.6672
Epoch 199/200
354/354 [==============================] - 0s 164us/step - loss: 8.0256 - val_loss: 7.7370
Epoch 200/200
354/354 [==============================] - 0s 173us/step - loss: 8.0272 - val_loss: 7.6881
0.905115228567158

结果对比

可以看出,深度模型的表现优于其他模式。但实际上,深度模型是经历了多次“玄学调参”的结果,而其他模型并没有经历调参优化,所以这个结果也仅供参考吧。

import pandas as pd
df = pd.DataFrame(columns=['真实价格','DNN','SVR','ENET','LASSO'])
df['真实价格']=y_test[:20]
df['DNN'] = seq.predict(X_test[:20]).reshape(20,)
df['SVR'] = svr.predict(X_test[:20]).reshape(20,)
df['ENET'] = enet.predict(X_test[:20]).reshape(20,)
df['LASSO'] = lasso.predict(X_test[:20]).reshape(20,)
print(df)
    真实价格        DNN        SVR       ENET      LASSO
0   22.0  18.252920  21.089813  23.385601  23.312081
1   28.7  25.008955  23.710673  26.422968  26.891553
2   13.1  16.204874  15.304707  17.510302  15.982067
3   22.5  18.993956  17.257705  19.499183  17.915353
4   20.0  21.143742  21.106587  22.952084  22.742153
5   42.8  45.877789  28.505149  32.595058  34.055575
6   17.5  18.026285  17.776068  19.260607  18.383730
7   14.5  16.182840  15.307121  17.582478  15.724255
8    8.4  10.724962  12.276034   9.835185   7.893643
9   50.0  48.813076  35.115328  34.616077  35.574188
10  27.5  34.224918  19.906433  13.545103  14.658437
11  14.9  14.568207  15.160364  18.622336  18.828576
12  14.5  17.585505  17.397113  20.635838  20.146342
13  17.8  16.192526  16.716685  18.274871  17.565880
14  18.9  20.941542  20.661864  22.726536  22.514724
15  33.8  34.242413  31.564452  30.224818  30.590694
16  23.0  28.125729  24.612038  26.795441  25.767934
17  10.5   7.609363  12.236361  15.076118  14.950944
18  50.0  44.337326  30.579195  32.720842  35.185425
19  23.1  21.985710  23.457522  24.123978  24.204844

你可能感兴趣的:(机器学习)