Python与数据科学实战课程——第三章Pandas:Reindexing Series and DataFrame

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

Series reindex

s1 = Series([1,2,3,4],index=["A","B","C","D"])
s1

A 1
B 2
C 3
D 4
dtype: int64

s1.reindex(index=["A","B","C","D","E"])

A 1.0
B 2.0
C 3.0
D 4.0
E NaN
dtype: float64

s1.reindex(index=["A","B","C","D","E"],fill_value=10)

A 1
B 2
C 3
D 4
E 10
dtype: int64

s2 = Series(["a","b","c"],index=[1,5,10])
s2

1 a
5 b
10 c
dtype: object

s2.reindex(index=range(15))

0 NaN
1 a
2 NaN
3 NaN
4 NaN
5 b
6 NaN
7 NaN
8 NaN
9 NaN
10 c
11 NaN
12 NaN
13 NaN
14 NaN
dtype: object

s2.reindex(index=range(15),method="ffill")   #ffill (forward fill,前向填充,与前面的已经赋过值一致)

0 NaN
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
9 b
10 c
11 c
12 c
13 c
14 c
dtype: object

reindex dataframe

df1 = DataFrame(np.random.rand(25).reshape(5,5),index=["A","B","C","D","E"],columns=["c1","c2","c3","c4","c5"])
df1
c1 c2 c3 c4 c5
A 0.757715 0.918084 0.948077 0.742951 0.487620
B 0.588601 0.287319 0.812112 0.301332 0.672896
C 0.595671 0.151094 0.879628 0.494836 0.531921
D 0.743913 0.412434 0.675041 0.056901 0.242197
E 0.226964 0.213529 0.850152 0.606388 0.690059
df1.reindex(index=["A","B","C","D","E","F"])
c1 c2 c3 c4 c5
A 0.757715 0.918084 0.948077 0.742951 0.487620
B 0.588601 0.287319 0.812112 0.301332 0.672896
C 0.595671 0.151094 0.879628 0.494836 0.531921
D 0.743913 0.412434 0.675041 0.056901 0.242197
E 0.226964 0.213529 0.850152 0.606388 0.690059
F NaN NaN NaN NaN NaN
df1.reindex(columns=["c1","c2","C3","C4","C5","C6"])  
# 可以看到 reindex并不是重命名,c3->C3后,原有列内容也全部被覆盖了
c1 c2 C3 C4 C5 C6
A 0.757715 0.918084 NaN NaN NaN NaN
B 0.588601 0.287319 NaN NaN NaN NaN
C 0.595671 0.151094 NaN NaN NaN NaN
D 0.743913 0.412434 NaN NaN NaN NaN
E 0.226964 0.213529 NaN NaN NaN NaN
df1.reindex(columns=["c1","c2","c3","c4","c5","c6"])  
c1 c2 c3 c4 c5 c6
A 0.757715 0.918084 0.948077 0.742951 0.487620 NaN
B 0.588601 0.287319 0.812112 0.301332 0.672896 NaN
C 0.595671 0.151094 0.879628 0.494836 0.531921 NaN
D 0.743913 0.412434 0.675041 0.056901 0.242197 NaN
E 0.226964 0.213529 0.850152 0.606388 0.690059 NaN
df1.reindex(index=["A","B","C","D","E","F"],columns=["c1","c2","c3","c4","c5","c6"])
c1 c2 c3 c4 c5 c6
A 0.757715 0.918084 0.948077 0.742951 0.487620 NaN
B 0.588601 0.287319 0.812112 0.301332 0.672896 NaN
C 0.595671 0.151094 0.879628 0.494836 0.531921 NaN
D 0.743913 0.412434 0.675041 0.056901 0.242197 NaN
E 0.226964 0.213529 0.850152 0.606388 0.690059 NaN
F NaN NaN NaN NaN NaN NaN

上面都是对series或者dataframe进行添加index ,下面看看减少index会发生什么

s1

A 1
B 2
C 3
D 4
dtype: int64

s1.reindex(index=["A","B"])  #相当于切片

A 1
B 2
dtype: int64

df1
c1 c2 c3 c4 c5
A 0.757715 0.918084 0.948077 0.742951 0.487620
B 0.588601 0.287319 0.812112 0.301332 0.672896
C 0.595671 0.151094 0.879628 0.494836 0.531921
D 0.743913 0.412434 0.675041 0.056901 0.242197
E 0.226964 0.213529 0.850152 0.606388 0.690059
df1.reindex(index=["A","B"])
c1 c2 c3 c4 c5
A 0.757715 0.918084 0.948077 0.742951 0.487620
B 0.588601 0.287319 0.812112 0.301332 0.672896

更简单的删除某些index value的方式

s1

A 1
B 2
C 3
D 4
dtype: int64

s1.drop("A")

B 2
C 3
D 4
dtype: int64

df1
c1 c2 c3 c4 c5
A 0.757715 0.918084 0.948077 0.742951 0.487620
B 0.588601 0.287319 0.812112 0.301332 0.672896
C 0.595671 0.151094 0.879628 0.494836 0.531921
D 0.743913 0.412434 0.675041 0.056901 0.242197
E 0.226964 0.213529 0.850152 0.606388 0.690059
df1.drop("A",axis=0)    #axis=0表示删除行,axis=1表示删除列。这样设定是为了避免列名和index名称一样时候发生的冲突
c1 c2 c3 c4 c5
B 0.588601 0.287319 0.812112 0.301332 0.672896
C 0.595671 0.151094 0.879628 0.494836 0.531921
D 0.743913 0.412434 0.675041 0.056901 0.242197
E 0.226964 0.213529 0.850152 0.606388 0.690059
df1.drop("c1",axis=1)
c2 c3 c4 c5
A 0.918084 0.948077 0.742951 0.487620
B 0.287319 0.812112 0.301332 0.672896
C 0.151094 0.879628 0.494836 0.531921
D 0.412434 0.675041 0.056901 0.242197
E 0.213529 0.850152 0.606388 0.690059

你可能感兴趣的:(实战网课,python,python,数据分析,numpy)