sklearn中常用的分类算法汇总及svm分类模型

转载!转载!转载!非原创,此博客仅用于学习。
sklearn中常用的分类算法(模块名–函数名–算法名):
(1) linear_model LogisticRegression 逻辑回归


>>> from sklearn.linear_model import LogisticRegression

>>> clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01)

>>> clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01)

(2)svm SVC 支持向量机


>>> from sklearn import svm

>>> clf = svm.SVC()

(3)neighbors KNeighborsClassifier knn近邻分类


>>> from sklearn.neighbors import NearestNeighbors

>>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)

(4)naive_bayes GaussianNB 朴素贝叶斯


>>> from sklearn.naive_bayes import GaussianNB

>>> gnb = GaussianNB()

(5)tree Decision TreeClassifier 分类决策树


>>> from sklearn import tree

>>> clf = tree.DecisionTreeClassifier()

(6)ensemble RandomForestClassifier 随机森林分类


>>> from sklearn.ensemble import RandomForestClassifier

>>> clf = RandomForestClassifier(n_estimators=10)

(7)Kmeans算法


>>> from sklearn.cluster import KMeans

>>> kmeans = KMeans(init='k-means++', n_clusters=n_digits, n_init=10)

(8)层次聚类(Hierarchical clustering)——支持多种距离


>>> from sklearn.cluster import AgglomerativeClustering

>>> model = AgglomerativeClustering(linkage=linkage,

connectivity=connectivity, n_clusters=n_clusters)

使用sklearn估计器构建SVM模型


##导入各模块和所需函数

import numpy as np

from sklearn.datasets import load_breast_cancer

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

 

##cancer数据集特征

cancer = load_breast_cancer()

cancer_data = cancer['data']

cancer_target = cancer['target']

cancer_names = cancer['feature_names']

 

## 将数据划分为训练集测试集

cancer_data_train,cancer_data_test, cancer_target_train,cancer_target_test = \

train_test_split(cancer_data,cancer_target,test_size = 0.2,random_state = 22)

## 数据标准化

stdScaler = StandardScaler().fit(cancer_data_train)

cancer_trainStd = stdScaler.transform(cancer_data_train)

cancer_testStd = stdScaler.transform(cancer_data_test)

## 建立SVM模型

svm = SVC().fit(cancer_trainStd,cancer_target_train)

print('建立的SVM模型为:\n',svm)

 

## 预测训练集结果

cancer_target_pred = svm.predict(cancer_testStd)

print('预测前20个结果为:\n',cancer_target_pred[:20])

将预测结果和真实结果做对比,求出准确率,代码如下:


## 求出预测和真实一样的数目

true = np.sum(cancer_target_pred == cancer_target_test )

print('预测对的结果数目为:', true)

print('预测错的的结果数目为:', cancer_target_test.shape[0]-true)

print('预测结果准确率为:', true/cancer_target_test.shape[0])

单单准确率并不能很好的反映模型的性能,为了有效的判断一个预测模型的效能表现,需要结合真实值计算出精确率,召回率,F1值,Cohen’s Kappa系数等指标。详情见下:
方法名称——最佳值——sklearn函数
Precision(精确率) 1.0 metrics.precision_score
Recall(召回率) 1.0 metrics.recall_score
F1值 1.0 metrics.f1_score
Cohen’s Kappa系数1.0 metrics.cohen_kappa_score
ROC曲线 最靠近y轴 metrics.roc_curve
代码如下:


from sklearn.metrics import accuracy_score,precision_score, \

recall_score,f1_score,cohen_kappa_score

 

print('使用SVM预测breast_cancer数据的准确率为:', 

accuracy_score(cancer_target_test,cancer_target_pred))      

print('使用SVM预测breast_cancer数据的精确率为:',

      precision_score(cancer_target_test,cancer_target_pred))

print('使用SVM预测breast_cancer数据的召回率为:',

      recall_score(cancer_target_test,cancer_target_pred))

print('使用SVM预测breast_cancer数据的F1值为:',

      f1_score(cancer_target_test,cancer_target_pred))

print('使用SVM预测breast_cancer数据的Cohen’s Kappa系数为:',

      cohen_kappa_score(cancer_target_test,cancer_target_pred))

另外,sklearn的metrics模块除了提供precision等单一评价指标的函数外,还提供了一个能输出分类模型评价报告的函数classification_report,代码如下:


from sklearn.metrics import classification_report

print('使用SVM预测iris数据的分类报告为:','\n',

      classification_report(cancer_target_test,

            cancer_target_pred))

除此之外,还可以用ROC曲线的方式来评价分类模型,代码如下:


from sklearn.metrics import roc_curve

import matplotlib.pyplot as plt

 

## 求出ROC曲线的x轴和y轴

fpr, tpr, thresholds = roc_curve(cancer_target_test,cancer_target_pred)

plt.figure(figsize=(10,6))

plt.xlim(0,1)     ##设定x轴的范围

plt.ylim(0.0,1.1) ## 设定y轴的范围

plt.xlabel('False Postive Rate')

plt.ylabel('True Postive Rate')

plt.plot(fpr,tpr,linewidth=2, linestyle="-",color='red')

plt.show()

转自:https://blog.csdn.net/qq_20412595/article/details/82192927
https://blog.csdn.net/zm_1900/article/details/89106643

你可能感兴趣的:(sklearn)