# 参数
DataFrame.stack(self,level = -1,dropna = True )
#参数说明
level: 默认为-1,即列索引的最后一级
dropna: 布尔值,默认为True,即是否删除具有缺失值的行
# 我们创建一个多级列的列子来说明stack()
import pandas as pd
import numpy as np
# 这里分别使用两种方法生成多层索引第一种是元组法,第二种是列表法
col = pd.MultiIndex.from_tuples([('weight','kg'),('weight','ton'),('height','cm'),('height','m')])
dex = pd.MultiIndex.from_arrays([['China','China','USA','USA'],['Tom','Jack','Arfra','Switch']])
# 创建一个我们需要用到的DataFrame
data = pd.DataFrame([[None,0.08,180,1.8],[70,0.07,160,1.6],[50,0.05,170,1.7],[55,0.055,175,1.75]],
index=dex,
columns=col)
# 我们先来看一下DataFrame
print('\ndata:')
print(data)
========output========
data:
weight height
kg ton cm m
China Tom NaN 0.080 180 1.80
Jack 70 0.070 160 1.60
USA Arfra 50 0.050 170 1.70
Switch 55 0.055 175 1.75
result = data.stack()
print('\nresult:')
print(result)
==========output==========
result:
height weight
China Tom cm 180.00 NaN
m 1.80 NaN
ton NaN 0.080
Jack cm 160.00 NaN
kg NaN 70.000
m 1.60 NaN
ton NaN 0.070
USA Arfra cm 170.00 NaN
kg NaN 50.000
m 1.70 NaN
ton NaN 0.050
Switch cm 175.00 NaN
kg NaN 55.000
m 1.75 NaN
ton NaN 0.055
从结果可以看出来,由于level=-1,因此将第二层列索引透视到了行,而且由于dropna=True,与之对应的透视之后的Tom的kg这一行数据被drop掉。
result = data.stack(level=0,dropna=False)
print('\nresult:')
print(result)
========output========
result:
cm kg m ton
China Tom height 180.0 NaN 1.80 NaN
weight NaN NaN NaN 0.080
Jack height 160.0 NaN 1.60 NaN
weight NaN 70.0 NaN 0.070
USA Arfra height 170.0 NaN 1.70 NaN
weight NaN 50.0 NaN 0.050
Switch height 175.0 NaN 1.75 NaN
weight NaN 55.0 NaN 0.055
从结果可以看出来,由于level=0,因此将第一层列索引透视到了行,而且由于dropna=False,并没有drop掉缺失行,其实,这里就算dropna=True,也不会drop掉任何行。因为缺失值在第二次索引kg上,而我们将第一层索引透视到行上,因此这里不会有drop
# 参数
DataFrame.unstack(self,level = -1,fill_value = None)
#参数说明
level: 默认为-1,即行索引的最后一级
fill_value: 如果unstack 生成,则用此值替换NaN
缺失值
result = data.unstack()
print('\nresult:')
print(result)
==========output==========
result:
weight height
kg ton cm m
Arfra Jack Switch Tom Arfra Jack Switch Tom Arfra Jack Switch Tom Arfra Jack Switch Tom
China NaN 70.0 NaN NaN NaN 0.07 NaN 0.08 NaN 160.0 NaN 180.0 NaN 1.6 NaN 1.8
USA 50.0 NaN 55.0 NaN 0.05 NaN 0.055 NaN 170.0 NaN 175.0 NaN 1.7 NaN 1.75 NaN
这里应该很好理解,不多做解释。
result = data.unstack(level=0,fill_value='缺失')
print('\nresult:')
print(result)
==========output==========
result:
weight height
kg ton cm m
China USA China USA China USA China USA
Arfra 缺失 50 缺失 0.05 缺失 170 缺失 1.7
Jack 70 缺失 0.07 缺失 160 缺失 1.6 缺失
Switch 缺失 55 缺失 0.055 缺失 175 缺失 1.75
Tom NaN 缺失 0.08 缺失 180 缺失 1.8 缺失
可以看到,我们将NaN值用缺失代替。