使用决策树预测隐形眼镜类型

任务描述

        本关任务:编写一个例子讲解决策树如何预测患者需要佩戴的隐形眼镜类型。使用小数据集,我们就可以利用决策树学到很多知识:眼科医生是如何判断患者需要佩戴的镜片类型,一旦理解了决策树的工作原理,我们甚至也可以帮助人们判断需要佩戴的镜片类型。

相关知识

        为了完成本关任务,你需要掌握:1.如何处理隐形眼镜数据集,2.如何使用决策树来进行预测如何处理隐形眼镜数据集

        隐形眼镜数据集包含很多患者眼部状况的观察条件以及医生推荐的隐形眼镜类型。隐形眼镜类型包括硬材质、软材质以及不适合佩戴隐形眼镜。数据来源于UCI数据库,为了更容易显示数据,我么对数据做了简单的更改。

代码如下:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from six import StringIO
from sklearn import tree

if __name__ == '__main__':
    with open('./src/step3/lenses.txt', 'r') as fr:                               #加载文件
        lenses = [inst.strip().split('\t') for inst in fr.readlines()]#处理文件
    lenses_target = []                                                #提取每组数据的类别,保存在列表里
    for each in lenses:
        lenses_target.append(each[-1])
    print(lenses_target)

    lensesLabels = ['age', 'prescript', 'astigmatic', 'tearRate']     #特征标签
    lenses_list = []                                                  #保存lenses数据的临时列表
    lenses_dict = {}                                                  #保存lenses数据的字典,用于生成pandas
    for each_label in lensesLabels:                                   #提取信息,生成字典
        for each in lenses:
            lenses_list.append(each[lensesLabels.index(each_label)])
        lenses_dict[each_label] = lenses_list
        lenses_list = []
    # print(lenses_dict)                                              #打印字典信息
    # print(lenses_dict)                                              #打印字典信息
    ###########
    lenses_pd = pd.DataFrame(lenses_dict)                             #生成pandas.DataFrame
    lenses_pd = lenses_pd[lensesLabels]
    print(lenses_pd)                                                  #打印pandas.DataFrame
    le = LabelEncoder()                                               #创建LabelEncoder()对象,用于序列化
    for col in lenses_pd.columns:                                     #为每一列序列化
        lenses_pd[col] = le.fit_transform(lenses_pd[col])
    print(lenses_pd)
    clf = tree.DecisionTreeClassifier(max_depth = 4)                  #创建DecisionTreeClassifier()类
    clf = clf.fit(lenses_pd.values.tolist(), lenses_target)         #使用数据,构建决策树
    #############

        很多同学会在six 这个库这里报错,这个库之前属于sklearn,目前已经独立出来了,需要独立引用!!!!!!!!!!把sklearn.externals.six换掉就好啦~

        附数据集:lenses.txt

young	myope	no	reduced	no lenses
young	myope	no	normal	soft
young	myope	yes	reduced	no lenses
young	myope	yes	normal	hard
young	hyper	no	reduced	no lenses
young	hyper	no	normal	soft
young	hyper	yes	reduced	no lenses
young	hyper	yes	normal	hard
pre	myope	no	reduced	no lenses
pre	myope	no	normal	soft
pre	myope	yes	reduced	no lenses
pre	myope	yes	normal	hard
pre	hyper	no	reduced	no lenses
pre	hyper	no	normal	soft
pre	hyper	yes	reduced	no lenses
pre	hyper	yes	normal	no lenses
presbyopic	myope	no	reduced	no lenses
presbyopic	myope	no	normal	no lenses
presbyopic	myope	yes	reduced	no lenses
presbyopic	myope	yes	normal	hard
presbyopic	hyper	no	reduced	no lenses
presbyopic	hyper	no	normal	soft
presbyopic	hyper	yes	reduced	no lenses
presbyopic	hyper	yes	normal	no lenses

你可能感兴趣的:(决策树,机器学习,人工智能)