红酒数据探索(Python内置对象)

编写python程序完成以下五个任务:
1、读取数据并打印数据集前5行记录
2、打印数据集内,品质 “quality” 变量总共的品质等级
3、将数据按红酒等级“quality”进行切分为六个子集,保存到一个字典中,字典的键为“quality”具体数值,值为归属于该“quality”的样本列表。并打印品质 “quality” 变量为3的子集的所有样本列表
4、统计并打印每个品质下的样本数
5、计算并打印每个品质下变量fixed acidity的均值
程序及实现:
1、读取数据并打印数据集前5行记录:

import csv
path = '\***\wine quality red.csv'
f = open(path,"r")
content = list(csv.reader(f))
f.close()
print(content[:5])

在这里插入图片描述2、打印数据集内,品质 “quality” 变量的总共的品质等级

qualities = []
for sample in content[1:]:
    qualities.append(int(sample[-1]))
unity_quality = set(qualities)
print(unity_quality)

{3, 4, 5, 6, 7, 8}

3、将数据按红酒等级“quality”进行切分为六个子集,保存到一个字典中,字典的键为“quality”具体数值,值为归属于该“quality”的样本列表。并打印品质 “quality” 变量为3的子集的所有样本列表

quality_subsets = {}

for sample in content[1:]:
    quality = int(sample[-1])

    if quality not in quality_subsets.keys():
        quality_subsets[quality] = [sample]
    else:
        quality_subsets[quality].append(sample)
print(quality_subsets.keys())
print(quality_subsets[3])

dict_keys([5, 6, 7, 4, 8, 3])
[[‘11.6’, ‘0.58’, ‘0.66’, ‘2.2’, ‘0.074’, ‘10’, ‘47’, ‘1.0008’, ‘3.25’, ‘0.57’, ‘9’, ‘3’], [‘10.4’, ‘0.61’, ‘0.49’, ‘2.1’, ‘0.2’, ‘5’, ‘16’, ‘0.9994’, ‘3.16’, ‘0.63’, ‘8.4’, ‘3’], [‘7.4’, ‘1.185’, ‘0’, ‘4.25’, ‘0.097’, ‘5’, ‘14’, ‘0.9966’, ‘3.63’, ‘0.54’, ‘10.7’, ‘3’], [‘10.4’, ‘0.44’, ‘0.42’, ‘1.5’, ‘0.145’, ‘34’, ‘48’, ‘0.99832’, ‘3.38’, ‘0.86’, ‘9.9’, ‘3’], [‘8.3’, ‘1.02’, ‘0.02’, ‘3.4’, ‘0.084’, ‘6’, ‘11’, ‘0.99892’, ‘3.48’, ‘0.49’, ‘11’, ‘3’], [‘7.6’, ‘1.58’, ‘0’, ‘2.1’, ‘0.137’, ‘5’, ‘9’, ‘0.99476’, ‘3.5’, ‘0.4’, ‘10.9’, ‘3’], [‘6.8’, ‘0.815’, ‘0’, ‘1.2’, ‘0.267’, ‘16’, ‘29’, ‘0.99471’, ‘3.32’, ‘0.51’, ‘9.8’, ‘3’], [‘7.3’, ‘0.98’, ‘0.05’, ‘2.1’, ‘0.061’, ‘20’, ‘49’, ‘0.99705’, ‘3.31’, ‘0.55’, ‘9.7’, ‘3’], [‘7.1’, ‘0.875’, ‘0.05’, ‘5.7’, ‘0.082’, ‘3’, ‘14’, ‘0.99808’, ‘3.4’, ‘0.52’, ‘10.2’, ‘3’], [‘6.7’, ‘0.76’, ‘0.02’, ‘1.8’, ‘0.078’, ‘6’, ‘12’, ‘0.996’, ‘3.55’, ‘0.63’, ‘9.95’, ‘3’]]

4、统计并打印每个品质下的样本数

subset_numbers = []

for key, value in quality_subsets.items():
    subset_numbers.append((key, len(value)))
print(subset_numbers)

[(5, 681), (6, 638), (7, 199), (4, 53), (8, 18), (3, 10)]

5、计算并打印每个品质下变量fixed acidity的均值

mean_acidities = []

for quality, samples in quality_subsets.items():
    sum_ = 0
    for sample in samples:
        sum_ += float(sample[0])
    mean_acidities.append((quality, sum_/len(samples)))
print(mean_acidities)

[(5, 8.167254038179149), (6, 8.347178683385575), (7, 8.872361809045225), (4, 7.779245283018868), (8, 8.566666666666665), (3, 8.36)]

你可能感兴趣的:(python,开发语言)