svm_SVC_使用GridSearchCV_癌症数据

#scikit-learn中datasets自带的医学癌症数据
#使用默认的高斯函数,使用GridSearchCV来自动选择参数gamma,可以得到在超参数调优器的使用下,得到的最优模型评分
from sklearn import  datasets
datas=datasets.load_breast_cancer()
print(datas)

运行结果:

{'DESCR': 'Breast Cancer Wisconsin (Diagnostic) Database\n=============================================\n\nNotes\n-----\nData Set Characteristics:\n    :Number of Instances: 569\n\n    :Number of Attributes: 30 numeric, predictive attributes and the class\n\n    :Attribute Information:\n        - radius (mean of distances from center to points on the perimeter)\n        - texture (standard deviation of gray-scale values)\n        - perimeter\n        - area\n        - smoothness (local variation in radius lengths)\n        - compactness (perimeter^2 / area - 1.0)\n        - concavity (severity of concave portions of the contour)\n        - concave points (number of concave portions of the contour)\n        - symmetry \n        - fractal dimension ("coastline approximation" - 1)\n\n        The mean, standard error, and "worst" or largest (mean of the three\n        largest values) of these features were computed for each image,\n        resulting in 30 features.  For instance, field 3 is Mean Radius, field\n        13 is Radius SE, field 23 is Worst Radius.\n\n        - class:\n                - WDBC-Malignant\n                - WDBC-Benign\n\n    :Summary Statistics:\n\n    ===================================== ====== ======\n                                           Min    Max\n    ===================================== ====== ======\n    radius (mean):                        6.981  28.11\n    texture (mean):                       9.71   39.28\n    perimeter (mean):                     43.79  188.5\n    area (mean):                          143.5  2501.0\n    smoothness (mean):                    0.053  0.163\n    compactness (mean):                   0.019  0.345\n    concavity (mean):                     0.0    0.427\n    concave points (mean):                0.0    0.201\n    symmetry (mean):                      0.106  0.304\n    fractal dimension (mean):             0.05   0.097\n    radius (standard error):              0.112  2.873\n    texture (standard error):             0.36   4.885\n    perimeter (standard error):           0.757  21.98\n    area (standard error):                6.802  542.2\n    smoothness (standard error):          0.002  0.031\n    compactness (standard error):         0.002  0.135\n    concavity (standard error):           0.0    0.396\n    concave points (standard error):      0.0    0.053\n    symmetry (standard error):            0.008  0.079\n    fractal dimension (standard error):   0.001  0.03\n    radius (worst):                       7.93   36.04\n    texture (worst):                      12.02  49.54\n    perimeter (worst):                    50.41  251.2\n    area (worst):                         185.2  4254.0\n    smoothness (worst):                   0.071  0.223\n    compactness (worst):                  0.027  1.058\n    concavity (worst):                    0.0    1.252\n    concave points (worst):               0.0    0.291\n    symmetry (worst):                     0.156  0.664\n    fractal dimension (worst):            0.055  0.208\n    ===================================== ====== ======\n\n    :Missing Attribute Values: None\n\n    :Class Distribution: 212 - Malignant, 357 - Benign\n\n    :Creator:  Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n\n    :Donor: Nick Street\n\n    :Date: November, 1995\n\nThis is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\nhttps://goo.gl/U2Uwz2\n\nFeatures are computed from a digitized image of a fine needle\naspirate (FNA) of a breast mass.  They describe\ncharacteristics of the cell nuclei present in the image.\n\nSeparating plane described above was obtained using\nMultisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree\nConstruction Via Linear Programming." Proceedings of the 4th\nMidwest Artificial Intelligence and Cognitive Science Society,\npp. 97-101, 1992], a classification method which uses linear\nprogramming to construct a decision tree.  Relevant features\nwere selected using an exhaustive search in the space of 1-4\nfeatures and 1-3 separating planes.\n\nThe actual linear program used to obtain the separating plane\nin the 3-dimensional space is that described in:\n[K. P. Bennett and O. L. Mangasarian: "Robust Linear\nProgramming Discrimination of Two Linearly Inseparable Sets",\nOptimization Methods and Software 1, 1992, 23-34].\n\nThis database is also available through the UW CS ftp server:\n\nftp ftp.cs.wisc.edu\ncd math-prog/cpo-dataset/machine-learn/WDBC/\n\nReferences\n----------\n   - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n     for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n     Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n     San Jose, CA, 1993.\n   - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n     prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n     July-August 1995.\n   - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n     to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n     163-171.\n', 'target_names': array(['malignant', 'benign'],
      dtype=' 
  
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn import metrics
from sklearn.svm import SVC
import numpy as np
x_train,x_test,y_train,y_test=train_test_split(datas.data,datas.target)
thresholds=np.linspace(0,0.001,100)#设置gamma参数列表
thresholds

运行结果:

array([  0.00000000e+00,   1.01010101e-05,   2.02020202e-05,
         3.03030303e-05,   4.04040404e-05,   5.05050505e-05,
         6.06060606e-05,   7.07070707e-05,   8.08080808e-05,
         9.09090909e-05,   1.01010101e-04,   1.11111111e-04,
         1.21212121e-04,   1.31313131e-04,   1.41414141e-04,
         1.51515152e-04,   1.61616162e-04,   1.71717172e-04,
         1.81818182e-04,   1.91919192e-04,   2.02020202e-04,
         2.12121212e-04,   2.22222222e-04,   2.32323232e-04,
         2.42424242e-04,   2.52525253e-04,   2.62626263e-04,
         2.72727273e-04,   2.82828283e-04,   2.92929293e-04,
         3.03030303e-04,   3.13131313e-04,   3.23232323e-04,
         3.33333333e-04,   3.43434343e-04,   3.53535354e-04,
         3.63636364e-04,   3.73737374e-04,   3.83838384e-04,
         3.93939394e-04,   4.04040404e-04,   4.14141414e-04,
         4.24242424e-04,   4.34343434e-04,   4.44444444e-04,
         4.54545455e-04,   4.64646465e-04,   4.74747475e-04,
         4.84848485e-04,   4.94949495e-04,   5.05050505e-04,
         5.15151515e-04,   5.25252525e-04,   5.35353535e-04,
         5.45454545e-04,   5.55555556e-04,   5.65656566e-04,
         5.75757576e-04,   5.85858586e-04,   5.95959596e-04,
         6.06060606e-04,   6.16161616e-04,   6.26262626e-04,
         6.36363636e-04,   6.46464646e-04,   6.56565657e-04,
         6.66666667e-04,   6.76767677e-04,   6.86868687e-04,
         6.96969697e-04,   7.07070707e-04,   7.17171717e-04,
         7.27272727e-04,   7.37373737e-04,   7.47474747e-04,
         7.57575758e-04,   7.67676768e-04,   7.77777778e-04,
         7.87878788e-04,   7.97979798e-04,   8.08080808e-04,
         8.18181818e-04,   8.28282828e-04,   8.38383838e-04,
         8.48484848e-04,   8.58585859e-04,   8.68686869e-04,
         8.78787879e-04,   8.88888889e-04,   8.98989899e-04,
         9.09090909e-04,   9.19191919e-04,   9.29292929e-04,
         9.39393939e-04,   9.49494949e-04,   9.59595960e-04,
         9.69696970e-04,   9.79797980e-04,   9.89898990e-04,
         1.00000000e-03])

#这步,直接影响下面的数据
param_grid={'gamma':thresholds}
clf=GridSearchCV(SVC(kernel='rbf'),param_grid,cv=5)
clf.fit(x_train,y_train)

运行结果:

GridSearchCV(cv=5, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'gamma': array([  0.00000e+00,   1.01010e-05, ...,   9.89899e-04,   1.00000e-03])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

print("最佳效果:%0.3f"% clf.best_score_)
print("最优参数组合:")
best_parameters=clf.best_estimator_.get_params()
for param_name in sorted(param_grid.keys()):
    print('\t%s:%r' %(param_name,best_parameters[param_name]))
运行结果:
最佳效果:0.925
最优参数组合:
	gamma:8.0808080808080811e-05

注:这步结果不唯一,下面的数据也不一致,原因详见https://blog.csdn.net/wjwfighting/article/details/80970396,开头有提到

print("训练集评分:",clf.score(x_train,y_train))
print("测试集评分:",clf.score(x_test,y_test))

训练集评分: 0.934272300469
测试集评分: 0.979020979021

predicted=clf.predict(x_test)
print('预测值:',predicted)
print('实际值:',y_test)
预测值: [0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 0 1
 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 0 1
 1 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1
 1 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 0 0]
实际值: [0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 0 1
 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0 1
 1 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1
 1 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 0 0]
print('精准值:',metrics.precision_score(predicted,y_test))
print('召回率:',metrics.recall_score(predicted,y_test))
print('F1:',metrics.f1_score(predicted,y_test))
print("准确率:",np.mean(predicted==y_test))
运行结果:
精准值: 0.988888888889
召回率: 0.978021978022
F1: 0.983425414365
准确率: 0.979020979021
#分类报告
xx=metrics.classification_report(y_test,predicted,target_names=datas.target_names)
print(xx)

运行结果:

                 precision    recall  f1-score   support

  malignant       0.98      0.96      0.97        53
     benign       0.98      0.99      0.98        90

avg / total       0.98      0.98      0.98       143

confusion_matrix=metrics.confusion_matrix(y_test,predicted)
print('--混淆矩阵--')
print(confusion_matrix)

运行结果:


--混淆矩阵--
[[51  2]
 [ 1 89]]


















你可能感兴趣的:(机器学习,GridSearchCV)