逻辑回归预测癌症案例

逻辑回归API

  • sklearn.Linear_model.logisticRegression
  • sklearn.Linear_model.logisticRegression(penalty=‘12’,c=1.0)
  • logistic回归分类器
  • coef_:回归系数

应用场景

  • 广告点击率
  • 是否为垃圾邮件
  • 是否患病
  • 金融诈骗
  • 虚假账号

癌症分类流程

  1. 网上获取数据(工具pandas)
  2. 数据缺失值处理,标准化
  3. logisticRegression估计器流程

代码:

from sklearn.datasets import load_boston # 波士顿房价数据集使用API
from sklearn.linear_model import LogisticRegression ##回归预测时使用的API  Ridge岭回归  LogisticRegression逻辑回归
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler ## 标准化API
from sklearn.metrics import mean_squared_error,classification_report
from sklearn.externals import joblib
import pandas as pd
import numpy as np

def charge_data():
    # 构造标签名字
    colums=["colun1","colum2","colum3","colum4","colum5","colum6","colum7","colum8","colum9","colum10","TARGET"]
    # 读取数据
    data=pd.read_csv("./breast-cancer-wisconsin.data",names=colums)

    # 缺失值处理
    data=data.replace(to_replace="?",value=np.nan)
    data=data.dropna()
    # 数据集分割
    x_train,x_text,y_train,y_text=train_test_split(data[colums[1:10]],data[colums[10]],test_size=0.25)
    # print("特征值,训练集的\n",x_train)
    # print("特征值,测试集的\n",x_text)
    # print("目标值,训练集的\n",y_train)
    # print("目标值,测试集的\n",y_text)

    # 特征值进行标准化处理
    std=StandardScaler()
    std.fit_transform(x_train)
    std.transform(x_text)

    # 逻辑回归预测
    lg=LogisticRegression(C=1.0)
    lg.fit(x_train,y_train)
    print("回归参数:",lg.coef_)
    pre=lg.predict(x_train)
    print("预测值",pre)
    print("准确率:",lg.score(x_text,y_text))
    print("召回率:\n",classification_report(y_train,pre,labels=[2,4],target_names=["良性","恶性"]))
    return None

# https://archive.ics.uci.edu/ml/machine-learning-databases.data
if __name__ == '__main__':
    charge_data()

运行结果:
逻辑回归预测癌症案例_第1张图片

你可能感兴趣的:(逻辑回归预测癌症案例)