python与机器学习(五)——决策树

决策树(Decision Tree)
通过sklearn库的决策树模型对iris数据进行多分类,并进行结果评估
导入:

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn import datasets
from sklearn.datasets import load_breast_cancer
from sklearn import tree
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.model_selection import KFold
import sklearn

加载数据:

# 加载iris数据
iris = load_iris()
# x,y数据赋值
x, y = iris.data, iris.target

print('x:', x.shape)
print(x[:10,:])
print('y:', y.shape)
print(y)

运行结果如下:

x: (150, 4)
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]]
y: (150,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

使用sklearn库对iris数据集进行乱序切分为训练集和测试集(7:3比例),并使用决策树模型对测试集进行分类,最后对分类结果的准确率、召回率进行评估:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test= train_test_split(iris.data, iris.target, test_size=0.3, random_state=0)

breast_cancer = load_breast_cancer()
data = breast_cancer.data
target = breast_cancer.target
target_names = ['class train', 'class test']
clf = tree.DecisionTreeClassifier(random_state=0)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

# print(confusion_matrix(y_test, y_pred, labels=[1, 0]))        # 混淆矩阵
print(classification_report(y_test, y_pred, target_names=target_names,labels=[1, 0]))     
# 显示主要分类指标的文本报告

运行结果如下:

             precision    recall  f1-score   support

class train       1.00      0.94      0.97        18
 class test       1.00      1.00      1.00        16

avg / total       1.00      0.97      0.98        34

使用sklearn库的决策树模型对iris数据集进行10折交叉验证,评估每折的正确率,并计算平均准确率:

from sklearn.model_selection import cross_val_score
import numpy as np
d_scores = cross_val_score(DecisionTreeClassifier(), iris.data, iris.target, cv=10)

for i in range(len(d_scores)):
     print("第",i+1,"折的准确率 ",format(d_scores[i],'.4f'))

print("平均准确率:    ",np.average(d_scores))

运行结果如下:

第 1 折的准确率  1.0000
第 2 折的准确率  0.9333
第 3 折的准确率  1.0000
第 4 折的准确率  0.9333
第 5 折的准确率  0.9333
第 6 折的准确率  0.8667
第 7 折的准确率  0.9333
第 8 折的准确率  1.0000
第 9 折的准确率  1.0000
第 10 折的准确率  1.0000
平均准确率:     0.96

修改决策树模型中的参数(如criterion、max_depth、spliter等)评估10折交叉验证下的平均准确率,至少验证4组不同参数的决策树模型:

Dy = []
kf=KFold(n_splits=10,shuffle=True,random_state=True)

x=iris['data']
kiss = DecisionTreeClassifier(criterion='entropy', max_depth=30, splitter='best')
Accuracy=0
for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf2 = kiss.fit(x_train, y_train)
    pre = clf2.predict(x_test)
    This_Time = accuracy_score(y_test, pre, normalize = True )
    Accuracy+=This_Time
    Dy.append(This_Time)
print("第一次准确率",Accuracy/10)


kiss = DecisionTreeClassifier(criterion='gini', max_depth=20, splitter='random')
Accuracy=0
for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf2 = kiss.fit(x_train, y_train)
    pre = clf2.predict(x_test)
    This_Time = accuracy_score(y_test, pre, normalize = True )
    Accuracy+=This_Time
    Dy.append(This_Time)
print("第二次准确率",Accuracy/10)


kiss = DecisionTreeClassifier(criterion='entropy', max_depth=20, splitter='random')
Accuracy=0
for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf2 = kiss.fit(x_train, y_train)
    pre = clf2.predict(x_test)
    This_Time = accuracy_score(y_test, pre, normalize = True )
    Accuracy+=This_Time
    Dy.append(This_Time)
print("第三次准确率",Accuracy/10)

kiss = DecisionTreeClassifier(criterion='entropy', max_depth=30, splitter='random')
Accuracy=0
for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf2 = kiss.fit(x_train, y_train)
    pre = clf2.predict(x_test)
    This_Time = accuracy_score(y_test, pre, normalize = True )
    Accuracy+=This_Time
    Dy.append(This_Time)
print("第四次准确率",Accuracy/10)

运行结果如下:

第一次准确率 0.9600000000000002
第二次准确率 0.9200000000000002
第三次准确率 0.9466666666666669
第四次准确率 0.9466666666666669

你可能感兴趣的:(python与机器学习,python,机器学习,决策树)