pandas unstack()和stack()函数

.unstack()和.stack()

首先创建一个多级的DataFrame

data = pd.DataFrame(np.arange(6).reshape((2, 3)),
                     index=pd.Index(['Ohio','Colorado'], name='state'),
                     columns=pd.Index(['one', 'two', 'three'],
                     name='number'))

out:
side             left  right
state    number             
Ohio     one        0      5
         two        1      6
         three      2      7
Colorado one        3      8
         two        4      9
         three      5     10

对于这种多级索引的DataFrame,暂且将其多级的索引称为
而.unstack()和.stack()可以理解为前者解除行索引的堆,后者解除列索引的堆。先来看看 解除堆 具体的样子

Mul = pd.MultiIndex.from_product([['A','B'],['one','two','three']])
df = pd.Series(np.arange(6),
                 index=Mul)
print(df)
print('————')
print(df.unstack())

out:
A  one      0
   two      1
   three    2
B  one      3
   two      4
   three    5
dtype: int32
————
   one  three  two
A    0      2    1
B    3      5    4

接下来借用书上创建的DataFrame来看看stack()和unstack()是如何操作的
先看看df的样子

print(df)
out:
side             left  right
state    number             
Ohio     one        0      5
         two        1      6
         three      2      7
Colorado one        3      8
         two        4      9
         three      5     10
df2 = df.unstack('state')   #解除df的state行索引
print(df2)
print('——————')
df3 = df2.stack('state')     #解除df2的state列索引
print(df3)

out:
side   left          right         
state  Ohio Colorado  Ohio Colorado
number                             
one       0        3     5        8
two       1        4     6        9
three     2        5     7       10
——————
side             left  right
number state                
one    Ohio         0      5
       Colorado     3      8
two    Ohio         1      6
       Colorado     4      9
three  Ohio         2      7
       Colorado     5     10

如果没有传入参数,则默认对最内层的索引进行解除;同时,被解除的 行/列 索引会变成级别最低 列/行 索引

在执行时,unstack()可能会引入缺失数据,而stack()会忽略缺失数据

In [127]: s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])

In [128]: s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])

In [129]: data2 = pd.concat([s1, s2], keys=['one', 'two'])

In [130]: data2
Out[130]: 
one  a    0
     b    1
     c    2
     d    3
two  c    4
     d    5
     e    6
dtype: int64

In [131]: data2.unstack()
Out[131]: 
       a    b    c    d    e
one  0.0  1.0  2.0  3.0  NaN
two  NaN  NaN  4.0  5.0  6.0

同时也可传入 参数dropna = False 不过滤空值

In [132]: data2.unstack()
Out[132]: 
       a    b    c    d    e
one  0.0  1.0  2.0  3.0  NaN
two  NaN  NaN  4.0  5.0  6.0

In [133]: data2.unstack().stack()
Out[133]: 
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  c    4.0
     d    5.0
     e    6.0
dtype: float64

In [134]: data2.unstack().stack(dropna=False)
Out[134]: 
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64

你可能感兴趣的:(笔记)