墨尔本房价预测

决策树

# 评估参数:mean_absolute_error 绝对平均差
# 辅助评估:train_test_split 数据随机分割
# DecisionTreeRegressor 决策树
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor


# Path
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
# y为房价
y = home_data.SalePrice
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
# X是对应y的房价影响因素
X = home_data[features]

# 数据随机分割
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# 决策树模型建立
iowa_model = DecisionTreeRegressor(random_state=1)
# 模型适应
iowa_model.fit(train_X, train_y)

# 辅助评估里的预测
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))

############################################################
(输出)Validation MAE when not specifying max_leaf_nodes: 29,653 
############################################################

# 数据随机分割找到的最佳叶节点数 max_leaf_nodes=100
# 最终模型如下
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))

############################################################
(输出)Validation MAE for best value of max_leaf_nodes: 27,283
############################################################

两个输出值相比,第二个MAE更小。MAE越小代表模型越合理,这是因为数据随机分割解决了拟合问题。属于” 机器学习—数据处理杂谈 “ 中的方法一

随机森林

# RandomForestRegressor 随机森林
from sklearn.ensemble import RandomForestRegressor

# 模型
rf_model = RandomForestRegressor(random_state = 1)

# 适应
rf_model.fit(train_X, train_y)

# 预测
rf_predictions = rf_model.predict(val_X)
# 计算MAE
rf_val_mae = mean_absolute_error(rf_predictions, val_y)

print("Validation MAE for Random Forest Model: {}".format(rf_val_mae))
############################################################
(输出)Validation MAE for Random Forest Model: 21,857.15912981083
############################################################

可以看到随机森林的应用比数据随机分割更为优秀

你可能感兴趣的:(机器学习,机器学习,python,算法)