机器学习—逻辑回归算法分析心脏病

  • 算法原理

       逻辑回归是一种与线性回归非常类似的算法。然而从本质上讲,线型回归处理的问题类型与逻辑回归不一致。线性回归处理的是数值问题,也就是最后预测出的结果是数字,例如房价。而逻辑回归属于分类算法,也就是说,逻辑回归预测结果是离散的分类,例如判断这封邮件是否是垃圾邮件,以及用户是否会点击此广告等等。所以逻辑回归是一种经典的二分类算法。

       实现方面的话,逻辑回归只是对对线性回归的计算结果加上了一个Sigmoid函数,将数值结果转化为了0到1之间的概率(Sigmoid函数的图像一般来说并不直观,你只需要理解对数值越大,函数越逼近1,数值越小,函数越逼近0),接着我们根据这个概率可以做预测,例如概率大于0.5,则这封邮件就是垃圾邮件,或者肿瘤是否是恶性的,是否患有心脏病等等。

数据属性:

1.age: age in years

2. sex: sex (1 = male; 0 = female)

3. cp: chest pain type

4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)

5.chol: serum cholestoral in mg/dl

6. bs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

7. restecg: resting electrocardiographic results

8. thalach: maximum heart rate achieved

9. exang: exercise induced angina (1 = yes; 0 = no)

10.oldpeak = ST depression induced by exercise relative to rest

11. slope: the slope of the peak exercise ST segment

12.ca: number of major vessels (0-3) colored by flourosopy

13.  thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

14. num: diagnosis of heart disease (angiographic disease status)

1.年龄:以年为单位的年龄

2. 性别:性别(1 =男性;0 =女性)

3.Cp:胸痛型

4. 静息血压(入院时以毫米汞柱计)

5.Chol:血清胆碱含量,mg/dl

6. Bs:(空腹血糖> 120 mg/dl) (1 = true;0 = false)

7. Restecg:静息心电图结果

8. Thalach:达到的最大心率

9. Exang:运动诱发心绞痛(1 = yes;0 = no)

10.运动相对于休息诱发的ST型抑郁

11. 坡度:运动峰ST段的坡度

12.Ca:主要血管数(0 ~ 3),荧光染色

13. Thal: 3 = normal;6 =固定缺陷;7 =可逆缺陷

14. Num:心脏病诊断(血管造影疾病状态) 1表示患病,0表示不患病

代码运行:

#1.数据获取
import pandas as pd
columnnames=["1.age: age in years",
"2. sex: sex (1 = male; 0 = female)",
"3. cp: chest pain type",
"4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)",
"5.chol: serum cholestoral in mg/dl",
"6. bs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)",
"7. restecg: resting electrocardiographic results",
"8. thalach: maximum heart rate achieved",
"9. exang: exercise induced angina (1 = yes; 0 = no)",
"10.oldpeak = ST depression induced by exercise relative to rest",
"11. slope: the slope of the peak exercise ST segment",
"12.ca: number of major vessels (0-3) colored by flourosopy",
"13.  thal: 3 = normal; 6 = fixed defect; 7 = reversable defect",
"num: diagnosis of heart disease (angiographic disease status)"]
df=pd.read_csv("D:\mlData\processed.hungarian 1.data",names=columnnames)
df
# 2.数据处理
#空值填充
#f向前填充 b向后填充
df1.fillna(method='ffill',inplace=True)
#是否存在缺省值
print(df1.isnull().sum())
3.数据集划分
#提取1-13特征值 14目标值
y=df1["num: diagnosis of heart disease (angiographic disease status)"]
x=df1.iloc[:,1:-5]
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=33)
ytrain
# 4、特征工程--标准化
from sklearn.preprocessing import StandardScaler
#创建标准化对象
transfer=StandardScaler()
#数据集标准化处理
xtrain=transfer.fit_transform(xtrain)
xtrain=transfer.fit_transform(xtest)
xtest
#5.逻辑回归模型训练
from sklearn.linear_model import LogisticRegression
#创建逻辑回归对象
lr= LogisticRegression()
#数据集变型,不变型数据会导致预测时报错
#输出逻辑对象
lr.fit(xtrain,ytrain)
#权重
lr.coef_
#截距
lr.intercept_
# 6.模型评估
ypredict=lr.predict(xtest)
ypredict=ytest
score=lr.score(xtest,ytest)
score
#观测结果分类
from sklearn.metrics import classification_report
report=classification_report(ytest,ypredict,labels=[0,1],target_names=["不患病","患病"])
print(report)

运行结果如下图所示:

机器学习—逻辑回归算法分析心脏病_第1张图片 

数据获取在我的主页有介绍怎么获取。在编译过程中有问题的可以留言询问,虽然回复不一定及时但是看到了就会回。

 

 

你可能感兴趣的:(机器学习,回归,逻辑回归)