sklearn中的roc_auc_score(二分类或多分类)

官方API地址:

sklearn.metrics.roc_auc_score — scikit-learn 1.2.2 documentationExamples using sklearn.metrics.roc_auc_score: Release Highlights for scikit-learn 0.22 Release Highlights for scikit-learn 0.22 Probability Calibration curves Probability Calibration curves Multicl...https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc_score#sklearn.metrics.roc_auc_score

对于二分类

直接用预测值与标签值计算

代码:

# ---encoding:utf-8---
# @Time    : 2023/6/6 17:41
# @Author  : CBAiotAigc
# @Email   :[email protected]
# @Site    : 
# @File    : 癌症分类.py
# @Project : 机器学习
# @Software: PyCharm
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn.metrics import accuracy_score, roc_auc_score


def logistic_regression_二分类():
    data = pd.read_csv("./breast-cancer-wisconsin.csv")
    data.info()

    data = data.replace(to_replace="?", value=np.NaN)
    data = data.dropna()

    x = data.iloc[:, 1:-1].values
    y = data.iloc[:, -1].values

    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, shuffle=True, stratify=y, random_state=22)
    transformer = StandardScaler()
    x_train = transformer.fit_transform(x_train)
    x_test = transformer.transform(x_test)

    estimator = LogisticRegression()
    estimator.fit(x_train, y_train)

    y_pred = estimator.predict(x_test)
    print(accuracy_score(y_pred, y_test))
    print(roc_auc_score(y_test, y_test))


if __name__ == '__main__':
    logistic_regression_二分类()

 对于多分类

与二分类y_pred不同的是,概率分数y_pred_prob

代码:

# ---encoding:utf-8---
# @Time    : 2023/6/6 17:41
# @Author  : CBAiotAigc
# @Email   :[email protected]
# @Site    : 
# @File    : 癌症分类.py
# @Project : 机器学习
# @Software: PyCharm
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn.metrics import accuracy_score, roc_auc_score


def logistic_regression_多分类():
    data = pd.read_csv("./iris.csv", header=None)
    data.info()
    x = data.iloc[:, :-1].values
    data.columns = ["1", "2", "3", "4", "Class"]
    y = data[["Class"]]

    def myapply(x):
        classify = x.unique().tolist()
        list_ = []
        for current in x:
            for idx, c in enumerate(classify):
                if current == c:
                    list_.append(idx)

        return list_

    y = y.apply(myapply)["Class"].values
    # print(y)

    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, shuffle=True, stratify=y, random_state=22)

    transformer = StandardScaler()
    x_train = transformer.fit_transform(x_train)
    x_test = transformer.fit_transform(x_test)

    estimator = LogisticRegression()
    estimator.fit(x_train, y_train)

    y_pred = estimator.predict(x_test)

    print(accuracy_score(y_pred, y_test))
    print(roc_auc_score(y_test, estimator.predict_proba(x_test), multi_class="ovr"))


if __name__ == '__main__':
    logistic_regression_多分类()

roc_auc_score的multi_class参数的解释:

multi_class是用于多分类问题的参数。在二元分类时,分类器需要将每个实例分到两个类别之一。而在多元分类时,分类器一般需要将一个实例分到多个类别之一。

multi_class参数共有三种取值:

  1. 'raise': 如果标签中包含了多个类别并且multi_class没有显式地被设置为 'ovr' 或 'ovo' ,那么roc_auc_score函数将抛出一个 ValueError 异常。

  2. 'ovr': One-vs-rest策略。将多分类问题拆成多个二分类子问题。对于每个类别,都训练一个二分类模型来区分该类别和其他所有类别的差异。对于多分类问题,将会创建n个模型,其中n是类别的数量。

  3. 'ovo': One-vs-one策略。 每次只选择两个类别计算AUC,最终的AUC为所有类别的AUC均值。对于多分类问题,将会创建$ n*(n-1)/2 $个模型,其中n是类别的数量。

在处理具有大量类别的多分类问题时,ovo 策略的计算代价会变得非常高,因为它需要构建大量的二分类器。而 ovr 策略通常比 ovo 策略更有效,但也更容易受到样本不平衡和噪音数据的影响。

你可能感兴趣的:(python,人工智能,sklearn,分类,机器学习)