pandas 分组计数 取出前n条记录

pandas 分组计数 取出前n条记录

总的来说,两行代码即可搞定。

test_data.groupby('release_year')['genre'].value_counts()

# output,结果为 Series
release_year  genre          
1960          Drama               13
              Action               8
              Comedy               8
              Horror               7
              Romance              6
              Thriller             6
              Western              6
              Adventure            5
              History              5
              Family               3
              Science Fiction      3
              Crime                2
              Fantasy              2
              War                  2
              Foreign              1
              Music                1
1961          Drama               16
              Comedy              10
              Action               7
              Romance              7
              Adventure            6
              Family               5
              Science Fiction      4
              History              3
              Horror               3
              Western              3
              Crime                2
...
Name: genre, Length: 1049, dtype: int64
# 将以上结果转换为 dataframe
my_data = test_data.groupby('release_year')['genre'].value_counts().rename('count').reset_index()

# output
release_year    genre   count
0   1960    Drama   13
1   1960    Action  8
2   1960    Comedy  8
3   1960    Horror  7
4   1960    Romance 6
5   1960    Thriller    6
6   1960    Western 6
7   1960    Adventure   5
8   1960    History 5
9   1960    Family  3
10  1960    Science Fiction 3
11  1960    Crime   2
12  1960    Fantasy 2
13  1960    War 2
14  1960    Foreign 1
15  1960    Music   1
16  1961    Drama   16
17  1961    Comedy  10
18  1961    Action  7
19  1961    Romance 7
20  1961    Adventure   6
21  1961    Family  5
22  1961    Science Fiction 4
23  1961    History 3
24  1961    Horror  3
25  1961    Western 3
26  1961    Crime   2
27  1961    Fantasy 2
28  1961    Music   2
29  1961    War 2
... ... ... ...
1019    2014    Crime   65
1020    2014    Science Fiction 62
1021    2014    Family  43
1022    2014    Animation   36
1023    2014    Fantasy 36
1024    2014    Mystery 36
1025    2014    Music   28
1026    2014    War 23
1027    2014    History 15
1028    2014    TV Movie    14
1029    2014    Western 6
1030    2015    Drama   260
1031    2015    Thriller    171
1032    2015    Comedy  162
1033    2015    Horror  125
1034    2015    Action  107
1035    2015    Science Fiction 86
1036    2015    Adventure   69
1037    2015    Documentary 57
1038    2015    Romance 57
1039    2015    Crime   51
1040    2015    Family  44
1041    2015    Mystery 42
1042    2015    Animation   39
1043    2015    Fantasy 33
1044    2015    Music   33
1045    2015    TV Movie    20
1046    2015    History 15
1047    2015    War 9
1048    2015    Western 6

1049 rows × 3 columns
# 最后,先groupby(), 然后获取每组中的前n条数据,结果为 dataframe
my_data.groupby('release_year').head(3)

# output
    release_year    genre   count
0   1960    Drama   13
1   1960    Action  8
2   1960    Comedy  8
16  1961    Drama   16
17  1961    Comedy  10
18  1961    Action  7
33  1962    Drama   21
34  1962    Action  8
35  1962    Adventure   7
50  1963    Comedy  13
51  1963    Drama   13
52  1963    Thriller    10
67  1964    Drama   20
68  1964    Comedy  16
69  1964    Crime   10
85  1965    Drama   20
86  1965    Thriller    11
87  1965    Action  9
103 1966    Comedy  16
104 1966    Drama   16
105 1966    Action  14
121 1967    Comedy  17
122 1967    Drama   16
123 1967    Romance 11
138 1968    Drama   20
139 1968    Comedy  9
140 1968    Action  6
155 1969    Drama   13
156 1969    Comedy  12
157 1969    Action  10
... ... ... ...
853 2006    Drama   197
854 2006    Comedy  155
855 2006    Thriller    114
873 2007    Drama   197
874 2007    Comedy  151
875 2007    Thriller    125
893 2008    Drama   233
894 2008    Comedy  169
895 2008    Thriller    127
913 2009    Drama   224
914 2009    Comedy  198
915 2009    Thriller    157
932 2010    Drama   211
933 2010    Comedy  169
934 2010    Thriller    135
952 2011    Drama   214
953 2011    Comedy  172
954 2011    Thriller    146
972 2012    Drama   232
973 2012    Comedy  176
974 2012    Thriller    160
992 2013    Drama   253
993 2013    Comedy  175
994 2013    Thriller    175
1011    2014    Drama   284
1012    2014    Comedy  185
1013    2014    Thriller    179
1030    2015    Drama   260
1031    2015    Thriller    171
1032    2015    Comedy  162

168 rows × 3 columns

你可能感兴趣的:(pandas用法)