sklearn 线性回归linear regression做分类任务

测试了一下使用linear regression做分类任务准确率的确很高啊,我的的思路是:

  1. 需要预测的是 0 1 ,而linear regression本来的预测值是连续变量
  2. 把linear regression预测的结果 >0.5 的当成 1,把 <0.5 的当成0
  3. 然后把预测结果与实际的结果比较

核心代码(交叉验证)

这里根据自己的数据填充:

  • x_train_std:正则化后的训练的 X
  • y_train:训练的 Y
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_validate

def linear_score(true_value, predict):
    predict[predict < 0.5] = 0
    predict[predict > 0.5] = 1
    return predict[predict == true_value].size / predict.size

liner_model = LinearRegression()
scoring = {
    'linear_score': make_scorer(linear_score, greater_is_better=True)
}
kfold = KFold(n_splits=10, random_state=0)
cv_cross = cross_validate(liner_model, x_train_std, y_train, cv=kfold, scoring=scoring)

print(cv_cross['test_linear_score'].mean()) # 交叉验证的均值
print(cv_cross['test_linear_score'].std()) # 交叉验证的方差

你可能感兴趣的:(数据处理,sklearn)