首先创建一个多级的DataFrame
data = pd.DataFrame(np.arange(6).reshape((2, 3)),
index=pd.Index(['Ohio','Colorado'], name='state'),
columns=pd.Index(['one', 'two', 'three'],
name='number'))
out:
side left right
state number
Ohio one 0 5
two 1 6
three 2 7
Colorado one 3 8
two 4 9
three 5 10
对于这种多级索引的DataFrame,暂且将其多级的索引称为 堆。
而.unstack()和.stack()可以理解为前者解除行索引的堆,后者解除列索引的堆。先来看看 解除堆 具体的样子
Mul = pd.MultiIndex.from_product([['A','B'],['one','two','three']])
df = pd.Series(np.arange(6),
index=Mul)
print(df)
print('————')
print(df.unstack())
out:
A one 0
two 1
three 2
B one 3
two 4
three 5
dtype: int32
————
one three two
A 0 2 1
B 3 5 4
接下来借用书上创建的DataFrame来看看stack()和unstack()是如何操作的
先看看df的样子
print(df)
out:
side left right
state number
Ohio one 0 5
two 1 6
three 2 7
Colorado one 3 8
two 4 9
three 5 10
df2 = df.unstack('state') #解除df的state行索引
print(df2)
print('——————')
df3 = df2.stack('state') #解除df2的state列索引
print(df3)
out:
side left right
state Ohio Colorado Ohio Colorado
number
one 0 3 5 8
two 1 4 6 9
three 2 5 7 10
——————
side left right
number state
one Ohio 0 5
Colorado 3 8
two Ohio 1 6
Colorado 4 9
three Ohio 2 7
Colorado 5 10
如果没有传入参数,则默认对最内层的索引进行解除;同时,被解除的 行/列 索引会变成级别最低 列/行 索引
在执行时,unstack()可能会引入缺失数据,而stack()会忽略缺失数据
In [127]: s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
In [128]: s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
In [129]: data2 = pd.concat([s1, s2], keys=['one', 'two'])
In [130]: data2
Out[130]:
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
In [131]: data2.unstack()
Out[131]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
同时也可传入 参数dropna = False 不过滤空值
In [132]: data2.unstack()
Out[132]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
In [133]: data2.unstack().stack()
Out[133]:
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
In [134]: data2.unstack().stack(dropna=False)
Out[134]:
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64