DataFrame的排序
前提:加载numpy,pandas,Series,DataFrame
生成一个dataframe,指定索引,具体如图:
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.arange(20).reshape(4,5),index = ['First','Second','Third','Fourth'],columns=['d','b','a','c','e'])
out[1]:
d b a c e
First 0 1 2 3 4
Second 5 6 7 8 9
Third 10 11 12 13 14
Fourth 15 16 17 18 19
dataframe的几种排序。
dataframe(df1)按索引和按列名排序分别使用df1.sort_index()、df1.sort_index(axis=1)即可,如图
df1.sort_index()
out[2]:
d b a c e
First 0 1 2 3 4
Fourth 15 16 17 18 19
Second 5 6 7 8 9
Third 10 11 12 13 14
df1.sort_index(axis=1)
如果要对df1按降序排序,那么只需添加参数ascending = False即可,如图
df1.sort_index(ascending=False)
out[3]:
d b a c e
Third 10 11 12 13 14
Second 5 6 7 8 9
Fourth 15 16 17 18 19
First 0 1 2 3 4
为了更加方便演示dataframe如何根据一列或者多列排序,再新生成一个dataframe,命名为df2,如下:
df2 = DataFrame({'c':[6,3,8,-2,0],'a':[2,2,3,1,4],'b':['Jan','May','Sep','Feb','Aug']})
df2
out[4]:
c a b
0 6 2 Jan
1 3 2 May
2 8 3 Sep
3 -2 1 Feb
4 0 4 Aug
现在分别使用
df2.sort_values(by = 'b')-对df2按照b列排序
df2.sort_values(by = ['b','a'])对df2按照b列排序后如果有相同的再按照a列排序
df2.sort_values(by = ['a','b'])对df2按照a列排序后如果有相同的再按照b列排序
DataFrame的排名:
df2按照索引和列排序分别用df2.rank()和df2.rank(axis = 1)即可,如下:
df2.rank()
out[5]:
c a b
0 4.0 2.5 3.0
1 3.0 2.5 4.0
2 5.0 4.0 5.0
3 1.0 1.0 2.0
4 2.0 5.0 1.0
df2.rank(axis = 1,ascending = True)
out[6]:
c a
0 2.0 1.0
1 2.0 1.0
2 2.0 1.0
3 1.0 2.0
4 1.0 2.0
df2.rank(axis = 1,ascending = False)
out[7]:
c a
0 1.0 2.0
1 1.0 2.0
2 1.0 2.0
3 2.0 1.0
4 2.0 1.0