周志华《机器学习》中的西瓜数据集

周志华《机器学习》一书中大量例题习题用到了“西瓜数据集3.0”和“西瓜数据集3.0a”,两个数据集的区别是“西瓜数据集3.0”有离散属性而“西瓜数据集3.0a”都是连续属性。生成这两个数据集的代码如下,运行代码即可生成python数据文件watermelon_3.0.npzwatermelon_3.0a.npz

write_dataset_watermelon3.py

# -*- coding: utf-8 -*-
"""
Created on Mon Aug 27 21:24:11 2018

Write 'Machine Learning, Zhihua Zhou' P84 watermelon_3.0 dataset to
'watermelon_3.0.npy'

@author: weiyx15
"""

''' 
[x]
色泽:乌黑-0, 青绿-1, 浅白-2
根蒂:蜷缩-0, 稍蜷-1, 硬挺-2
敲声:浊响-0, 沉闷-1, 清脆-2
纹理:清晰-0, 稍糊-1, 模糊-2
脐部:凹陷-0, 稍凹-1, 平坦-2
触感:硬滑-0, 软粘-1
密度:<数值>
含糖率:<数值>
[y]
好瓜:是-0, 否-1
'''

import numpy as np

xn_discrete = 6
xn_continuous = 2
yn = 2
x_discrete =  [3, 3, 3, 3, 3, 2]
x = np.array([[1, 0, 0, 0, 0, 0, .697, .46], 
              [0, 0, 1, 0, 0, 0, .774, .376], 
              [0, 0, 0, 0, 0, 0, .634, .264], 
              [1, 0, 1, 0, 0, 0, .608, .318], 
              [2, 0, 0, 0, 0, 0, .556, .215], 
              [1, 1, 0, 0, 1, 1, .403, .237], 
              [0, 1, 0, 1, 1, 1, .481, .149], 
              [0, 1, 0, 0, 1, 0, .437, .211], 
              [0, 1, 1, 1, 1, 0, .666, .091], 
              [1, 2, 2, 0, 2, 1, .243, .267], 
              [2, 2, 2, 2, 2, 0, .245, .057], 
              [2, 0, 0, 2, 2, 1, .343, .099], 
              [1, 1, 0, 1, 0, 0, .639, .161], 
              [2, 1, 1, 1, 0, 0, .657, .198], 
              [0, 1, 0, 0, 1, 1, .36, .37], 
              [2, 0, 0, 2, 2, 0, .593, .042], 
              [1, 0, 1, 1, 1, 0, .719, .103]])
y = np.array([1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0])
np.savez('watermelon_3.0.npz', xn_discrete, xn_continuous, yn, x_discrete, x, y)

write_dataset_watermelon3a.py

# -*- coding: utf-8 -*-
"""
Created on Mon Aug 20 20:19:18 2018

Write 'Machine Learning, Zhihua Zhou' P89 watermelon_3.0a dataset to
'watermelon_3.0a.npy'

@author: weiyx15
"""

import numpy as np

x = np.array([[.697, .46], [.774, .376], [.634, .264], [.608, .318], 
              [.556, .215], [.403, .237], [.481, .149], [.437, .211], 
              [.666, .091], [.243, .267], [.245, .057], [.343, .099], 
              [.639, .161], [.657, .198], [.36, .37], [.593, .042], 
              [.719, .103]])
y = np.array([1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0])
np.savez('watermelon_3.0a.npz', x, y)

 

你可能感兴趣的:(Python,Machine,Learning,周志华《机器学习》读书笔记)