Python数据分析实战(四)——决策树泰坦尼克号生还预测

决策树——泰坦尼克号生还预测

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.metrics import classification_report
import graphviz
import pydotplus
data = pd.read_csv(r'./data/titanic_data.csv')
data.drop('PassengerId',axis=1,inplace=True)  # 删除PassengerId列
data.loc[data['Sex']=='male','Sex']=1  #'male'用1替换
data.loc[data['Sex']=='female','Sex']=1  #'female'用0替换
data['Age'].fillna(data['Age'].mean(),inplace=True)  # 用均值替换缺失值
dtc = DecisionTreeClassifier(max_depth=5,random_state=8)  # 决策树分类器
dtc.fit(data.iloc[:,1:],data['Survived'])  # 模型训练没那么济南济南济南济南
pre = dtc.predict((data.iloc[:,1:]))  # 模型预测
data['Survived']==pre  # 判断是否预测值是否正确
# print(classification_report(data['Survived'],pre))  # 打印分类报告


dot_data = export_graphviz(dtc,feature_names=['Pclass', 'Sex', 'Age'],class_names='Survived')  # 决策树内核
# graph = graphviz.Source(dot_data)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_png(r"./data/titanic.png")  # 写入png
              precision    recall  f1-score   support

           0       0.75      0.85      0.79       549
           1       0.69      0.54      0.60       342

    accuracy                           0.73       891
   macro avg       0.72      0.69      0.70       891
weighted avg       0.72      0.73      0.72       891

Python数据分析实战(四)——决策树泰坦尼克号生还预测_第1张图片

你可能感兴趣的:(Python机器学习实战)