西瓜书决策树实现(基于ID3)——采用字典数据结构

一、前言
这段时间疫情不那么严重,回公司上班了。平时工作比较忙,而且重点在学习数学。很久没有更新,最近实现《西瓜书》决策树,贴出来给大家共享。西瓜数据集2.0如下:在这里插入代码片
[‘青绿’, ‘蜷缩’, ‘浊响’, ‘清晰’, ‘凹陷’, ‘硬滑’, ‘好瓜’],
[‘乌黑’, ‘蜷缩’, ‘沉闷’, ‘清晰’, ‘凹陷’, ‘硬滑’, ‘好瓜’],
[‘乌黑’, ‘蜷缩’, ‘浊响’, ‘清晰’, ‘凹陷’, ‘硬滑’, ‘好瓜’],
[‘青绿’, ‘蜷缩’, ‘沉闷’, ‘清晰’, ‘凹陷’, ‘硬滑’, ‘好瓜’],
[‘浅白’, ‘蜷缩’, ‘浊响’, ‘清晰’, ‘凹陷’, ‘硬滑’, ‘好瓜’],
[‘青绿’, ‘稍蜷’, ‘浊响’, ‘清晰’, ‘稍凹’, ‘软粘’, ‘好瓜’],
[‘乌黑’, ‘稍蜷’, ‘浊响’, ‘稍糊’, ‘稍凹’, ‘软粘’, ‘好瓜’],
[‘乌黑’, ‘稍蜷’, ‘浊响’, ‘清晰’, ‘稍凹’, ‘硬滑’, ‘好瓜’],
[‘乌黑’, ‘稍蜷’, ‘沉闷’, ‘稍糊’, ‘稍凹’, ‘硬滑’, ‘坏瓜’],
[‘青绿’, ‘硬挺’, ‘清脆’, ‘清晰’, ‘平坦’, ‘软粘’, ‘坏瓜’],
[‘浅白’, ‘硬挺’, ‘清脆’, ‘模糊’, ‘平坦’, ‘硬滑’, ‘坏瓜’],
[‘浅白’, ‘蜷缩’, ‘浊响’, ‘模糊’, ‘平坦’, ‘软粘’, ‘坏瓜’],
[‘青绿’, ‘稍蜷’, ‘浊响’, ‘稍糊’, ‘凹陷’, ‘硬滑’, ‘坏瓜’],
[‘浅白’, ‘稍蜷’, ‘沉闷’, ‘稍糊’, ‘凹陷’, ‘硬滑’, ‘坏瓜’],
[‘乌黑’, ‘稍蜷’, ‘浊响’, ‘清晰’, ‘稍凹’, ‘软粘’, ‘坏瓜’],
[‘浅白’, ‘蜷缩’, ‘浊响’, ‘模糊’, ‘平坦’, ‘硬滑’, ‘坏瓜’],
[‘青绿’, ‘蜷缩’, ‘沉闷’, ‘稍糊’, ‘稍凹’, ‘硬滑’, ‘坏瓜’]
二、样本数据读取及存储
为了便于数据操作,每个数据样本存储为字典,字典key为样本各个特征,比如纹理,敲声等,字典value对应特征标签值,比如清晰、沉闷。

def read_data(filename):
    
    """
    Function : 读取西瓜数据集
    
    Input: filename: 数据集文件名
          
    Output: data:西瓜数据集列表,列表元素为字典,每个字典保存西瓜属性
           
    """
    
    text_list = []
    with open(filename,"r") as f:
        #当读到最后一行的下一行时,line 为空集,停止读取
        while True:
            line = f.readline()
            if not line:
                break
            #删除每行尾换行符
            line = line.strip("\n")
            #s删除每行头尾空格
            line = line.strip(" ")
            #删除每行头尾的[ ,]
            line = line.strip("[")
            line = line.strip(",")
            line = line.strip("]")
            if line != "":
                text_list.append(line)
    
    #创建数据列表,每个西瓜数据为一个字典,字典形成列表
    dataset = []
    
   
    for i,text_line in enumerate(text_list):
        
        #把每行字符串分割为列表
        split_data_text = text_line.split( ",")
        
        #每个西瓜数据初始化一个字典对象并保存该西瓜的数据
        dic_example = {
   }
        dic_example["编号"] = i + 1
        #删除每个特征标签的引号和空格
        dic_example["色泽"] = split_data_text[0].replace("'","").strip()
        dic_example["根蒂"] = split_data_text[1].replace("'","").strip()
        dic_example["敲声"] = split_data_text[2].replace("'","").strip()
        dic_example["纹理"] = split_data_text[3].replace("'","").strip()
        dic_example["脐眼"] = split_data_text[4].replace("'","").strip()
        dic_example["触感"] = split_data_text[5].replace("'","").strip()
        dic_example["标签"] = split_data_text[6].replace("'","").strip()
        
        #将西瓜数据字典加入列表
        dataset.append(dic_example)
                
    return dataset
    #建立数据集

数据读取结果如下:

filename = "西瓜数据集2.0.txt"
dataset = read_data(filename)
[{
   '编号': 1,
  '色泽': '青绿',
  '根蒂': '蜷缩',
  '敲声': '浊响',
  '纹理': '清晰',
  '脐眼': '凹陷',
  '触感': '硬滑',
  '标签': '好瓜'},
 {
   '编号': 2,
  '色泽': '乌黑',
  '根蒂': '蜷缩',
  '敲声': '沉闷',
  '纹理

你可能感兴趣的:(机器学习,决策树)