pandas set_index, reset_index, reindex

1. set_index

  • DataFrame.set_index(keysdrop=Trueappend=Falseinplace=Falseverify_integrity=False)
  • keys: 将keys列设置为index(可设置单级索引和多级索引)
  • 用于设置索引或者多级索引
In [307]: data
Out[307]: 
     a    b  c    d
0  bar  one  z  1.0
1  bar  two  y  2.0
2  foo  one  x  3.0
3  foo  two  w  4.0
 
In [308]: indexed1 = data.set_index('c')
 
In [309]: indexed1
Out[309]: 
     a    b    d
c               
z  bar  one  1.0
y  bar  two  2.0
x  foo  one  3.0
w  foo  two  4.0
 
In [310]: indexed2 = data.set_index(['a', 'b'])
 
In [311]: indexed2
Out[311]: 
         c    d
a   b          
bar one  z  1.0
    two  y  2.0
foo one  x  3.0
    two  w  4.0

 

2. reset_index

  • DataFrame.reset_index(level=Nonedrop=Falseinplace=Falsecol_level=0col_fill='')
  • 将index恢复为列属性
  • 将索引或者多级索引恢复成属性
In [318]: data
Out[318]: 
         c    d
a   b          
bar one  z  1.0
    two  y  2.0
foo one  x  3.0
    two  w  4.0
 
In [319]: data.reset_index()
Out[319]: 
     a    b  c    d
0  bar  one  z  1.0
1  bar  two  y  2.0
2  foo  one  x  3.0
3  foo  two  w  4.0

 

3. reindex

  • DataFrame.reindex(labels=Noneindex=Nonecolumns=Noneaxis=Nonemethod=Nonecopy=Truelevel=Nonefill_value=nanlimit=Nonetolerance=None)
  • columns: 要修改的列名
  • 修改数据集的列名
>>> p2
   col1  col2  col3
0     1     6     2
1     2     1     8
2     3     0     1
>>> p2.reindex(columns=['col2', 'col3', 'col1'])
   col2  col3  col1
0     6     2     1
1     1     8     2
2     0     1     3

# 注意和df.columns的区别: reindex会重新排列数据
# df.columns只是改变列名不会移动数据
>>> p2
   col1  col2  col3
0     1     6     2
1     2     1     8
2     3     0     1
>>> p2.columns = ['col2', 'col3', 'col1']
>>> p2
   col2  col3  col1
0     1     6     2
1     2     1     8
2     3     0     1

 

你可能感兴趣的:(Pandas)