编写python程序完成以下五个任务:
1、读取数据并打印数据集前5行记录
2、打印数据集内,品质 “quality” 变量总共的品质等级
3、将数据按红酒等级“quality”进行切分为六个子集,保存到一个字典中,字典的键为“quality”具体数值,值为归属于该“quality”的样本列表。并打印品质 “quality” 变量为3的子集的所有样本列表
4、统计并打印每个品质下的样本数
5、计算并打印每个品质下变量fixed acidity的均值
程序及实现:
1、读取数据并打印数据集前5行记录:
import csv
path = '\***\wine quality red.csv'
f = open(path,"r")
content = list(csv.reader(f))
f.close()
print(content[:5])
2、打印数据集内,品质 “quality” 变量的总共的品质等级
qualities = []
for sample in content[1:]:
qualities.append(int(sample[-1]))
unity_quality = set(qualities)
print(unity_quality)
{3, 4, 5, 6, 7, 8}
3、将数据按红酒等级“quality”进行切分为六个子集,保存到一个字典中,字典的键为“quality”具体数值,值为归属于该“quality”的样本列表。并打印品质 “quality” 变量为3的子集的所有样本列表
quality_subsets = {}
for sample in content[1:]:
quality = int(sample[-1])
if quality not in quality_subsets.keys():
quality_subsets[quality] = [sample]
else:
quality_subsets[quality].append(sample)
print(quality_subsets.keys())
print(quality_subsets[3])
dict_keys([5, 6, 7, 4, 8, 3])
[[‘11.6’, ‘0.58’, ‘0.66’, ‘2.2’, ‘0.074’, ‘10’, ‘47’, ‘1.0008’, ‘3.25’, ‘0.57’, ‘9’, ‘3’], [‘10.4’, ‘0.61’, ‘0.49’, ‘2.1’, ‘0.2’, ‘5’, ‘16’, ‘0.9994’, ‘3.16’, ‘0.63’, ‘8.4’, ‘3’], [‘7.4’, ‘1.185’, ‘0’, ‘4.25’, ‘0.097’, ‘5’, ‘14’, ‘0.9966’, ‘3.63’, ‘0.54’, ‘10.7’, ‘3’], [‘10.4’, ‘0.44’, ‘0.42’, ‘1.5’, ‘0.145’, ‘34’, ‘48’, ‘0.99832’, ‘3.38’, ‘0.86’, ‘9.9’, ‘3’], [‘8.3’, ‘1.02’, ‘0.02’, ‘3.4’, ‘0.084’, ‘6’, ‘11’, ‘0.99892’, ‘3.48’, ‘0.49’, ‘11’, ‘3’], [‘7.6’, ‘1.58’, ‘0’, ‘2.1’, ‘0.137’, ‘5’, ‘9’, ‘0.99476’, ‘3.5’, ‘0.4’, ‘10.9’, ‘3’], [‘6.8’, ‘0.815’, ‘0’, ‘1.2’, ‘0.267’, ‘16’, ‘29’, ‘0.99471’, ‘3.32’, ‘0.51’, ‘9.8’, ‘3’], [‘7.3’, ‘0.98’, ‘0.05’, ‘2.1’, ‘0.061’, ‘20’, ‘49’, ‘0.99705’, ‘3.31’, ‘0.55’, ‘9.7’, ‘3’], [‘7.1’, ‘0.875’, ‘0.05’, ‘5.7’, ‘0.082’, ‘3’, ‘14’, ‘0.99808’, ‘3.4’, ‘0.52’, ‘10.2’, ‘3’], [‘6.7’, ‘0.76’, ‘0.02’, ‘1.8’, ‘0.078’, ‘6’, ‘12’, ‘0.996’, ‘3.55’, ‘0.63’, ‘9.95’, ‘3’]]
4、统计并打印每个品质下的样本数
subset_numbers = []
for key, value in quality_subsets.items():
subset_numbers.append((key, len(value)))
print(subset_numbers)
[(5, 681), (6, 638), (7, 199), (4, 53), (8, 18), (3, 10)]
5、计算并打印每个品质下变量fixed acidity的均值
mean_acidities = []
for quality, samples in quality_subsets.items():
sum_ = 0
for sample in samples:
sum_ += float(sample[0])
mean_acidities.append((quality, sum_/len(samples)))
print(mean_acidities)
[(5, 8.167254038179149), (6, 8.347178683385575), (7, 8.872361809045225), (4, 7.779245283018868), (8, 8.566666666666665), (3, 8.36)]