目录
一:数据集准备
二:加载文件
三:分组操作进行统计
可以创建一个txt,并放置pycharm工程目录下
下面是博主的数据集测试,所用数据,需要的自取
1001,Chinese,1,80
1001,Chinese,2,81
1001,Chinese,3,79
1001,Chinese,4,86
1001,Math,1,69
1001,Math,2,70
1001,Math,3,79
1001,Math,4,90
1001,English,1,90
1001,English,2,89
1001,English,3,92
1001,English,4,96
1002,Chinese,1,85
1002,Chinese,2,86
1002,Chinese,3,88
1002,Chinese,4,71
1002,Math,1,63
1002,Math,2,96
1002,Math,3,68
1002,Math,4,72
1002,English,1,63
1002,English,2,93
1002,English,3,86
1002,English,4,75
1003,Chinese,1,87
1003,Chinese,2,81
1003,Chinese,3,82
1003,Chinese,4,77
1003,Math,1,69
1003,Math,2,91
1003,Math,3,61
1003,Math,4,79
1003,English,1,68
1003,English,2,82
1003,English,3,87
1003,English,4,96
1004,Chinese,1,81
1004,Chinese,2,77
1004,Chinese,3,92
1004,Chinese,4,68
1004,Math,1,96
1004,Math,2,85
1004,Math,3,85
1004,Math,4,74
1004,English,1,67
1004,English,2,63
1004,English,3,96
1004,English,4,77
加载文件 read_csv
import numpy as np
import pandas as pd
# 列名
columns = ['sno', 'subject', 'unit', 'score']
# 1 加载文件
df = pd.read_csv("student.txt", sep=',', names=columns)
print(df.head())
print(df.shape)
sno subject unit score
0 1001 Chinese 1 80
1 1001 Chinese 2 81
2 1001 Chinese 3 79
3 1001 Chinese 4 86
4 1001 Math 1 69
(48, 4)
如上结果
加载读取数据集head 头部数据,
shape 规格形状 为4列,分别为学号、科目、次数、分数
1 按照科目分组
# 根据科目进行分组
df_subject = df.groupby('subject')
for i in df_subject:
print(i)
import numpy as np
import pandas as pd
# 列名
columns = ['sno', 'subject', 'unit', 'score']
# 1 加载文件
df = pd.read_csv("student.txt", sep=',', names=columns, index_col=0)
# 分组操作进行统计
# 根据科目进行分组、只获取分数
df_subject = df.groupby('subject')
for i in df_subject:
print(i)
('Chinese', subject unit score
sno
1001 Chinese 1 80
1001 Chinese 2 81
1001 Chinese 3 79
1001 Chinese 4 86
1002 Chinese 1 85
1002 Chinese 2 86
1002 Chinese 3 88
1002 Chinese 4 71
1003 Chinese 1 87
1003 Chinese 2 81
1003 Chinese 3 82
1003 Chinese 4 77
1004 Chinese 1 81
1004 Chinese 2 77
1004 Chinese 3 92
1004 Chinese 4 68)
('English', subject unit score
sno
1001 English 1 90
1001 English 2 89
1001 English 3 92
1001 English 4 96
1002 English 1 63
1002 English 2 93
1002 English 3 86
1002 English 4 75
1003 English 1 68
1003 English 2 82
1003 English 3 87
1003 English 4 96
1004 English 1 67
1004 English 2 63
1004 English 3 96
1004 English 4 77)
('Math', subject unit score
sno
1001 Math 1 69
1001 Math 2 70
1001 Math 3 79
1001 Math 4 90
1002 Math 1 63
1002 Math 2 96
1002 Math 3 68
1002 Math 4 72
1003 Math 1 69
1003 Math 2 91
1003 Math 3 61
1003 Math 4 79
1004 Math 1 96
1004 Math 2 85
1004 Math 3 85
1004 Math 4 74)
2 各个科目只取分数
# 根据科目进行分组、只获取分数
df_subject = df.groupby('subject')['score']
for i in df_subject:
print(i)
import numpy as np
import pandas as pd
# 列名
columns = ['sno', 'subject', 'unit', 'score']
# 1 加载文件
df = pd.read_csv("student.txt", sep=',', names=columns, index_col=0)
# 分组操作进行统计
# 根据科目进行分组、只获取分数
df_subject = df.groupby('subject')['score']
for i in df_subject:
print(i)
('Chinese', sno
1001 80
1001 81
1001 79
1001 86
1002 85
1002 86
1002 88
1002 71
1003 87
1003 81
1003 82
1003 77
1004 81
1004 77
1004 92
1004 68
Name: score, dtype: int64)
('English', sno
1001 90
1001 89
1001 92
1001 96
1002 63
1002 93
1002 86
1002 75
1003 68
1003 82
1003 87
1003 96
1004 67
1004 63
1004 96
1004 77
Name: score, dtype: int64)
('Math', sno
1001 69
1001 70
1001 79
1001 90
1002 63
1002 96
1002 68
1002 72
1003 69
1003 91
1003 61
1003 79
1004 96
1004 85
1004 85
1004 74
Name: score, dtype: int64)
3 各个科目的平均成绩
df_subject_mean = df.groupby('subject')['score'].mean()
print(df_subject_mean)
import numpy as np
import pandas as pd
# 列名
columns = ['sno', 'subject', 'unit', 'score']
# 1 加载文件
df = pd.read_csv("student.txt", sep=',', names=columns, index_col=0)
# 分组操作进行统计
# 根据科目进行分组、只获取分数
# df_subject = df.groupby('subject')['score']
# for i in df_subject:
# print(i)
df_subject_mean = df.groupby('subject')['score'].mean()
print(df_subject_mean)
subject
Chinese 81.3125
English 82.5000
Math 77.9375
Name: score, dtype: float64
4 各个同学,各个科目的平均成绩
# 分组因素 学号 科目
df_mean = df.groupby(['sno', 'subject'])['score'].mean()
print(df_mean)
import numpy as np
import pandas as pd
# 列名
columns = ['sno', 'subject', 'unit', 'score']
# 1 加载文件
df = pd.read_csv("student.txt", sep=',', names=columns, index_col=0)
# 分组操作进行统计
# 每个同学 每个科目的平均成绩
# 分组因素 学号 科目
df_mean = df.groupby(['sno', 'subject'])['score'].mean()
print(df_mean)
sno subject
1001 Chinese 81.50
English 91.75
Math 77.00
1002 Chinese 82.50
English 79.25
Math 74.75
1003 Chinese 81.75
English 83.25
Math 75.00
1004 Chinese 79.50
English 75.75
Math 85.00
Name: score, dtype: float64