先说结论:这三种算法中,SVM算法效果和RFC算范效果最好,NB稍微差一些
代码:
from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
X,y=datasets.make_classification(n_samples=1000,n_features=10,n_classes=2)
kf=cross_validation.KFold(len(X),n_folds=10,shuffle=True)
for train_index,test_index in kf:
X_train,y_train=X[train_index],y[train_index]
X_test,y_test=X[test_index],y[test_index]
#NB算法
print("NB:")
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
#performance评估
acc = metrics.accuracy_score(y_test, pred)
print("Evaluated by accuracy score: ",end="")
print(acc)
f1 = metrics.f1_score(y_test, pred)
print("Evaluated by f1 score: ",end="")
print(f1)
auc = metrics.roc_auc_score(y_test, pred)
print("Evaluated by roc auc score: ",end="")
print(auc)
print("*******************************************************\n")
#SVC算法
print("SVC:")
for C in [1e-02, 1e-01, 1e00, 1e01, 1e02]:
clf = SVC(C, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
#performance评估
acc = metrics.accuracy_score(y_test, pred)
print("Evaluated by accuracy score: ",end="")
print(acc)
f1 = metrics.f1_score(y_test, pred)
print("Evaluated by f1 score: ",end="")
print(f1)
auc = metrics.roc_auc_score(y_test, pred)
print("Evaluated by roc auc score: ",end="")
print(auc)
print("-----------------------------------------")
print("*******************************************************\n")
#RFC算法
print("RFC:")
for n_estimators in [10, 100, 1000]:
clf = RandomForestClassifier(n_estimators=6)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
#performance评估
acc = metrics.accuracy_score(y_test, pred)
print("Evaluated by accuracy score:",end="")
print(acc)
f1 = metrics.f1_score(y_test, pred)
print("Evaluated by f1 score:",end="")
print(f1)
auc = metrics.roc_auc_score(y_test, pred)
print("Evaluated by roc auc score:",end="")
print(auc)
print("-----------------------------------------")
print("*******************************************************\n")
结果:
D:\Anaconda\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
NB:
Evaluated by accuracy score: 0.88
Evaluated by f1 score: 0.888888888889
Evaluated by roc auc score: 0.880808080808
*******************************************************
SVC:
Evaluated by accuracy score: 0.85
Evaluated by f1 score: 0.851485148515
Evaluated by roc auc score: 0.857575757576
-----------------------------------------
Evaluated by accuracy score: 0.89
Evaluated by f1 score: 0.895238095238
Evaluated by roc auc score: 0.893939393939
-----------------------------------------
Evaluated by accuracy score: 0.91
Evaluated by f1 score: 0.915887850467
Evaluated by roc auc score: 0.912121212121
-----------------------------------------
Evaluated by accuracy score: 0.89
Evaluated by f1 score: 0.897196261682
Evaluated by roc auc score: 0.891919191919
-----------------------------------------
Evaluated by accuracy score: 0.87
Evaluated by f1 score: 0.873786407767
Evaluated by roc auc score: 0.875757575758
-----------------------------------------
*******************************************************
RFC:
Evaluated by accuracy score:0.92
Evaluated by f1 score:0.924528301887
Evaluated by roc auc score:0.923232323232
-----------------------------------------
Evaluated by accuracy score:0.89
Evaluated by f1 score:0.899082568807
Evaluated by roc auc score:0.889898989899
-----------------------------------------
Evaluated by accuracy score:0.92
Evaluated by f1 score:0.925925925926
Evaluated by roc auc score:0.921212121212
-----------------------------------------
*******************************************************
[Finished in 8.9s]