参考文章:
Pandas详解八之ReIndex重新索引
pandas中关于set_index和reset_index的用法
Pandas set_index&reset_index
《利用python进行数据分析》中写:
reindex()方法用于创建一个符合新索引的新对象
①对于Series类型,调用reindex()会将数据按照新的索引进行排列,如果某个索引值之前不存在,则引入缺失值
如:
se1=pd.Series([1,7,3,9],index=['d','c','a','f'])
se1
输出:
d 1
c 7
a 3
f 9
dtype: int64
se2=se1.reindex(['a','b','c','d','e','f'])
se2
输出:
a 3.0
b NaN
c 7.0
d 1.0
e NaN
f 9.0
dtype: float64
Series比较好理解,不用多说
②DataFrame中,reindex()可以改变行索引和列索引
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
index=['a', 'c', 'd'],
columns=['Ohio', 'Texas', 'California'])
frame
frame2 = frame.reindex(['a', 'b', 'c', 'd'])
frame2
输出:
Ohio Texas California
a 0.0 1.0 2.0
b NaN NaN NaN
c 3.0 4.0 5.0
d 6.0 7.0 8.0
列还可以使用columns关键字重建索引
states = ['Texas', 'Utah', 'California']
frame.reindex(columns=states)
输出:
Texas Utah California
a 1 NaN 2
c 4 NaN 5
d 7 NaN 8
reset_index(),顾名思义,即设置索引。可以设置单索引和复合索引
调用这个函数会生成一个新的DataFrame, 新的df使用一个列或多个列作为索引
In [307]: data
Out[307]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
In []: indexed1 = data.set_index('c')
indexed1
Out[]:
a b d
c
z bar one 1.0
y bar two 2.0
x foo one 3.0
w foo two 4.0
In []: indexed2 = data.set_index(['a', 'b'])
indexed2
Out[]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
In []: data
Out[]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
In []: data.reset_index()
Out[]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0