疾病建模预测

应用疾病数据建模预测分析,用了逻辑回归和决策树去建模,以准确率和auc值为评价指标。

1 导入包

import pandas as pd
import numpy as np

2 读取数据

hy_data = pd.read_csv(r'/work/hy_data.csv')
hy_data.head()
corr = hy_data.corr()

3 相关性分析

import seaborn as sns
import matplotlib.pyplot as plt
plt.subplots(figsize=(12,12))
sns.heatmap(corr,annot=True,vmax=1,square=True,cmap='Reds')
plt.savefig(r'/work/hy_data_corr.png')
plt.show()


从相关性分析的角度分析,得出建模变量cp,trestbps,restecg,thalach,exang,oldpeak,slope,ca,thal和目标变量相关性很大,接下来用他们去建模。

4 建模

逻辑回归建模

hy_train_data = hy_data.loc[:['cp','trestbps','restecg','thalach','exang','oldpeak','slope','ca','thal','target']]
from sklearn.model_selection import train_test_split  
from sklearn.linear_model import LogisticRegression  
from sklearn import metrics  
from sklearn.metrics import roc_curve, auc ,roc_auc_score  
X = hy_train_data.iloc[:,:-1]
y = hy_train_data.iloc[:,-1]
# 划分训练集和测试集  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  
  
# 创建逻辑回归模型  
model = LogisticRegression(
    C=7957.0,max_iter=10,solver="liblinear"
        )  
# 训练模型  
model.fit(X_train, y_train)  
# 预测测试集结果  
y_pred = model.predict(X_test)  
# 计算准确率  
accuracy = metrics.accuracy_score(y_test, y_pred)  
print(f'Accuracy: {accuracy}')
#roc曲线面积
roc_auc = roc_auc_score(y_test,y_pred)  
print('roc曲线下的面积:',roc_auc)

模型结果指标:准确率和auc值
Accuracy: 0.8558558558558559
roc曲线下的面积: 0.8487105084866639

决策树建模

from sklearn.tree import DecisionTreeClassifier 
from sklearn.metrics import accuracy_score
clf = DecisionTreeClassifier()  
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)  
accuracy = accuracy_score(y_test, y_pred)  
print("Accuracy:", accuracy)
#roc曲线面积
print('auc值:',roc_auc_score(y_test,y_pred))

决策树建模结果显示为:
Accuracy: 1.0
auc值: 1.0
决策树效果最好,可以用决策树去预测数据。

你可能感兴趣的:(决策树,逻辑回归,机器学习)