高级编程技术(Python)作业17

Exercises for sklearn:
高级编程技术(Python)作业17_第1张图片
Solution:

from sklearn import datasets, cross_validation, metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier


def performance_evaluation(string, y_test, pred):
    print(string+":")
    acc = metrics.accuracy_score(y_test, pred)
    print('Accuracy:', acc)
    f1 = metrics.f1_score(y_test, pred)
    print('F1-score:', f1)
    auc = metrics.roc_auc_score(y_test, pred)
    print('AUC ROC:', auc, end='\n\n')


dataset = datasets.make_classification(n_samples=1000, n_features=10, n_informative=2, 
                                       n_redundant=2, n_classes=2)

kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
for train_index, test_index in kf:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("GaussianNB", y_test, pred)

clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("SVM", y_test, pred)

clf = RandomForestClassifier(n_estimators=6)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("Random Forest", y_test, pred)

Output:

C:\Python\Python36-32\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
GaussianNB:
Accuracy: 0.9
F1-score: 0.9038461538461537
AUC ROC: 0.9011604641856743

SVM:
Accuracy: 0.9
F1-score: 0.9038461538461537
AUC ROC: 0.9011604641856743

Random Forest:
Accuracy: 0.94
F1-score: 0.9387755102040817
AUC ROC: 0.9399759903961585

注释:此次作业输出报了一个DeprecationWarning,这个Warning是用于警告编程人员,我们正在使用的模块在某个库的下个版本将会被移到另一个模块中,暂时对实验结果没有影响,并不用在意。

The short report summarizing the methodology and the results:

  • methodology:
    三种学习的方法和性能分析的方式都十分的类似:
    首先,先建立数据集,然后将数据集分为训练集和测试集;
    其次,对训练集选择一种训练方式进行训练;
    然后,训练结束后会生成预测集;
    最后对预测集和测试集对比进行性能分析。
  • results:
    经过多次试验的结果,随机森林的方法基本上都会优于高斯朴素贝叶斯和SVM算法,高斯朴素贝叶斯和SVM性能相似,基本看不出明显的差异。

你可能感兴趣的:(python作业)