sklearn中决策树的GridSearchCV参数调优

决策树的超参数有:

  • max_depth(树的深度)
  • max_leaf_nodes(叶子结点的数目)
  • max_features(最大特征数目)
  • min_samples_leaf(叶子结点的最小样本数)
  • min_samples_split(中间结点的最小样本树)
  • min_weight_fraction_leaf(叶子节点的样本权重占总权重的比例)
  • min_impurity_split(最小不纯净度)也可以调整

1、数据准备与《sklearn中决策树的使用》中相同,这里不再累述、

2、使用步骤

from sklearn.tree import DecisionTreeClassifier

model_DD = DecisionTreeClassifier()

max_depth = range(1,10,1)
min_samples_leaf = range(1,10,2)
tuned_parameters = dict(max_depth=max_depth, min_samples_leaf=min_samples_leaf)

from sklearn.model_selection import GridSearchCV
DD = GridSearchCV(model_DD, tuned_parameters,cv=10)
DD.fit(X_train, y_train)

print("Best: %f using %s" % (DD.best_score_, DD.best_params_))

y_prob = DD.predict_proba(X_test)[:,1] # This will give you positive class prediction probabilities  
y_pred = np.where(y_prob > 0.5, 1, 0) # This will threshold the probabilities to give class predictions.
DD.score(X_test, y_pred)

print('The AUC of GridSearchCV Desicion Tree is', roc_auc_score(y_test,y_pred))

#DD.grid_scores_

test_means = DD.cv_results_[ 'mean_test_score' ]
#test_stds = DD.cv_results_[ 'std_test_score' ]
#pd.DataFrame(DD.cv_results_).to_csv('DD_min_samples_leaf_maxdepth.csv')

# plot results
test_scores = np.array(test_means).reshape(len(max_depth), len(min_samples_leaf))

for i, value in enumerate(max_depth):
    plt.plot(min_samples_leaf, test_scores[i], label= 'test_max_depth:'   + str(value))

    
plt.legend()
plt.xlabel( 'min_samples_leaf' )                                                                                                      
plt.ylabel( 'accuray' )
plt.show()

 

你可能感兴趣的:(人工智能,机器学习操作)