①列转行方法
Signature: pandas.DataFrame.stack(self, level=-1, dropna=True)
Docstring:
Pivot a level of the (possibly hierarchical) column labels, returning a
DataFrame (or Series in the case of an object with a single level of
column labels) having a hierarchical index with a new inner-most level
of row labels.
The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接列索引转换到最内层行索引,生一个Series对象
In [16]: import pandas as pd
...: import numpy as np
...: df = pd.DataFrame(np.arange(6).reshape(2,3),index=['AA','BB'],columns=
...: ['three','two','one'])
...: df
...:
Out[16]:
three two one
AA 0 1 2
BB 3 4 5
In [17]: df.stack()
Out[17]:
AA three 0
two 1
one 2
BB three 3
two 4
one 5
dtype: int32
In [18]: df.stack(level=0)
Out[18]:
AA three 0
two 1
one 2
BB three 3
two 4
one 5
dtype: int32
In [19]: df.stack(level=-1)
Out[19]:
AA three 0
two 1
one 2
BB three 3
two 4
one 5
dtype: int32
b、对于层次化索引的DataFrame而言,可以将指定的索引层转换到行上,默认是将最内层的列索引转换到最内层行
In [31]: import pandas as pd
...: import numpy as np
...: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=
...: [['two','two','one','one'],['A','B','C','D']])
...: df
...:
Out[31]:
two one
A B C D
AA 0 1 2 3
BB 4 5 6 7
In [32]: df.stack()
Out[32]:
one two
AA A NaN 0.0
B NaN 1.0
C 2.0 NaN
D 3.0 NaN
BB A NaN 4.0
B NaN 5.0
C 6.0 NaN
D 7.0 NaN
In [33]: df.stack(level=0)
Out[33]:
A B C D
AA one NaN NaN 2.0 3.0
two 0.0 1.0 NaN NaN
BB one NaN NaN 6.0 7.0
two 4.0 5.0 NaN NaN
In [34]: df.stack(level=1)
Out[34]:
one two
AA A NaN 0.0
B NaN 1.0
C 2.0 NaN
D 3.0 NaN
BB A NaN 4.0
B NaN 5.0
C 6.0 NaN
D 7.0 NaN
In [35]: df.stack(level=-1)
Out[35]:
one two
AA A NaN 0.0
B NaN 1.0
C 2.0 NaN
D 3.0 NaN
BB A NaN 4.0
B NaN 5.0
C 6.0 NaN
D 7.0 NaN
In [36]: df.stack(level=[0,1])
Out[36]:
AA one C 2.0
D 3.0
two A 0.0
B 1.0
BB one C 6.0
D 7.0
two A 4.0
B 5.0
dtype: float64
Signature: pandas.DataFrame.unstack(self, level=-1, fill_value=None)
Docstring:
Pivot a level of the (necessarily hierarchical) index labels, returning
a DataFrame having a new level of column labels whose inner-most level
consists of the pivoted index labels. If the index is not a MultiIndex,
the output will be a Series (the analogue of stack when the columns are
not a MultiIndex).
The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接将列索引转换到行索引的最外层索引,生成一个Series对象
In [20]: df
Out[20]:
three two one
AA 0 1 2
BB 3 4 5
In [21]: df.unstack()
Out[21]:
three AA 0
BB 3
two AA 1
BB 4
one AA 2
BB 5
dtype: int32
In [22]: df.unstack(0)
Out[22]:
three AA 0
BB 3
two AA 1
BB 4
one AA 2
BB 5
dtype: int32
In [23]: df.unstack(-1)
Out[23]:
three AA 0
BB 3
two AA 1
BB 4
one AA 2
BB 5
dtype: int32
b、对于层次化索引的DataFrame而言,和stack函数类似,似乎把两层索引当作一个整体,当level为列表时报错
In [37]: df
Out[37]:
two one
A B C D
AA 0 1 2 3
BB 4 5 6 7
In [38]: df.unstack()
Out[38]:
two A AA 0
BB 4
B AA 1
BB 5
one C AA 2
BB 6
D AA 3
BB 7
dtype: int32
In [39]: df.unstack(0)
Out[39]:
two A AA 0
BB 4
B AA 1
BB 5
one C AA 2
BB 6
D AA 3
BB 7
dtype: int32
In [40]: df.unstack(1)
Out[40]:
two A AA 0
BB 4
B AA 1
BB 5
one C AA 2
BB 6
D AA 3
BB 7
dtype: int32
In [41]: df.unstack(-1)
Out[41]:
two A AA 0
BB 4
B AA 1
BB 5
one C AA 2
BB 6
D AA 3
BB 7
dtype: int32
In [42]: df.unstack(level=[0,1])
IndexError: Too many levels: Index has only 1 level, not 2
那再试下level=5,发现也正常,这里的level怎么理解?--遗留问题
In [44]: df
Out[44]:
two one
A B C D
AA 0 1 2 3
BB 4 5 6 7
In [45]: df.unstack(level=5)
Out[45]:
two A AA 0
BB 4
B AA 1
BB 5
one C AA 2
BB 6
D AA 3
BB 7
dtype: int32
Signature: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Docstring:
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
首先拿普通的DataFrame实验下,看看melt函数怎么转换的
In [46]: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=
...: ['A','B','C','D'])
...: df
...:
Out[46]:
A B C D
AA 0 1 2 3
BB 4 5 6 7
In [47]: pd.melt(df,id_vars=['A','C'],value_vars=['B','D'],var_name='B|D',value
...: _name='(B|D)_value')
Out[47]:
A C B|D (B|D)_value
0 0 2 B 1
1 4 6 B 5
2 0 2 D 3
3 4 6 D 7
In [48]: pd.melt(df,id_vars=['A'],value_vars=['B','D'],var_name='B|D',value_nam
...: e='(B|D)_value')
Out[48]:
A B|D (B|D)_value
0 0 B 1
1 4 B 5
2 0 D 3
3 4 D 7
In [49]: pd.melt(df,id_vars=['A'],value_vars=['B'],var_name='B',value_name='B_v
...: alue')
Out[49]:
A B B_value
0 0 B 1
1 4 B 5
结论:从上述结果可以看出,id_vars可以理解为结果需要保留的原始列,value_vars可以理解为需需要列转行的列名;var_name把列转行的列变量重新命名,默认为variable;value_name列转行对应变量的值的名称
In [50]: df1 = pd.DataFrame(np.arange(8).reshape(2,4),columns=[list('ABCD'),lis
...: t('EFGH')])
...: df1
...:
Out[50]:
A B C D
E F G H
0 0 1 2 3
1 4 5 6 7
In [51]: pd.melt(df1,col_level=0,id_vars=['A'],value_vars=['D'])
Out[51]:
A variable value
0 0 D 3
1 4 D 7
In [26]: df2=df.stack()
...: df2
...:
Out[26]:
AA three 0
two 1
one 2
BB three 3
two 4
one 5
dtype: int32
In [27]: df2.unstack()
Out[27]:
three two one
AA 0 1 2
BB 3 4 5
In [28]: df2.unstack(0)
Out[28]:
AA BB
three 0 3
two 1 4
one 2 5
In [29]: df2.unstack(1)
Out[29]:
three two one
AA 0 1 2
BB 3 4 5
In [30]: df2.unstack(-1)
Out[30]:
three two one
AA 0 1 2
BB 3 4 5