python数据分析处理笔记-绘制决策树(02)

Decision Tree

1. Basic information of DT

Decision tree is a basic classification and regression method. Here we mainly discuss the Decision Tree of Classification.

In classification problem, Decision Tree represents the process of classifying instances based on features, it can be considered as a collection of if-then, and it also can be considered as the condition probability distribution defined on feature space and class space.

Usually, it has three steps: feature selection, generation, pruning.

Classify through Decision Tree: Starting from root node, test a certain characteristic of the instance, and assign the instance to its child nodes according to the test results. At this time, each child node corresponds to a value of the character. Repeat the above process (test and assign), until reach the leaf node. Finally assign the instance to the class of leaf node.

  • The target of Decision Tree Learning: Build a decision tree model according to the given training dataset so that it can correctly classify instance.
  • The nature of decision tree learning: Conclude a set of classify rule according to the given training dataset. Or in other words to estamite the condition probability model from given dataset.
  • The loss function of decision tree learning: Regularized maximum likelihood function

2. How to draw a decision tree

  1. Generate Decision Tree model and train it
"""
X_train: 特征训练集
Y_train: 标注训练集
X_test: 特征测试集
Y_test: 标注测试集
"""

# 01.创建模型
from sklearn.tree import DecisionTreeClassifier
DTree_model = DecisionTreeClassifier()

# 02.训练模型
DTree_model = DTree_model.fit(X_train, Y_train)

# 03.模型预测
Y_pred = DTree_model.predict(X_test)

# 04.模型评价(accuracy_score:准确率;recall_score:召回率; f1_score:F值)
from sklearn.metrics import accuracy_score, recall_score, f1_score
print('ACC:', accuracy_score(Y_test, Y_pred))
print('REC:', recall_score(Y_test, Y_pred))
print('F-Score:', f1_score(Y_test, Y_pred))
  1. Prepare the environment

    1. download Graphviz (a tool library use to draw decision tree)

      Firstly, download the corresponding installation package according to your own computer system from the website Download | Graphviz

      Secondly, double click this file to install

      python数据分析处理笔记-绘制决策树(02)_第1张图片

      Here, pay attention to select item 2 or 3, which will automically add the environment variables of the system.

      If accidentally choose the wrong item, you can also add the environment variables manually.

      Copy the installation address of graphviz (if you choose default download address, the address should be C:\Program Files\Graphviz\bin)

      Settings - Advanced Settings - Environment Variables

      python数据分析处理笔记-绘制决策树(02)_第2张图片

      python数据分析处理笔记-绘制决策树(02)_第3张图片

    2. download Graphviz library in python

      enter the follow codes in termial

      pip install graphviz
      
    3. enter the follow codes in termial to verify if successful installed

      dot -version
      

      If successfully installed, it’ll present as follows.

      python数据分析处理笔记-绘制决策树(02)_第4张图片

  2. Draw the decision tree

    from sklearn.tree import export_graphviz
    import graphviz
    dot_data = export_graphviz(DTree_model,
                                   feature_names=X_train.columns,
                                   filled=True,
                                   rounded=True,
                                   special_characters=True)
    graph = graphviz.Source(dot_data)
    graph.render('tree') #it will save as pdf 
    
  3. Solve the Chinese display problem

    When the dot_data font configuration is not modified, the font used to draw the image is: fontname = helvetica. This font is a Western font and does not support Chinese fonts. When we use it directly without modifying the font configuration, there will be garbled Chinese fonts in the picture Condition! ! !

    so we change follow line of code from

    graph = graphviz.Source(dot_data)
    

    to

    graph = graphviz.Source(dot_data.replace('helvetica', 'Microsoft YaHei'))
    

你可能感兴趣的:(python,决策树,算法)