待我学有所成,结发与蕊可好。@夏瑾墨 by Jooey
3.数据的轴向连接
Nunpy 有一个用于合并串联原始Numpy数组的concatenation函数
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
arr=np.arange(12).reshape((3,4))
print (arr)
print (np.concatenate([arr,arr],axis=1))
输出结果:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 0 1 2 3 0 1 2 3]
[ 4 5 6 7 4 5 6 7]
[ 8 9 10 11 8 9 10 11]]
假设有三个没有重叠索引的Series
s1=Series([0,1],index=['a','b'])
s2=Series([2,3,4],index=['c','d','e'])
s3=Series([5,6],index=['f','g'])
print (pd.concat([s1,s2,s3]))
输出结果:
a 0
b 1
c 2
d 3
e 4
f 5
g 6
dtype: int64
默认情况下,concat是在axis=0上工作的,最终产生一个新的Series。如果传入axis=1,则结果就会变成一个DataFrame(axis=1是列)
print (pd.concat([s1,s2,s3],axis=1))
输出结果:
0 1 2
a 0.0 NaN NaN
b 1.0 NaN NaN
c NaN 2.0 NaN
d NaN 3.0 NaN
e NaN 4.0 NaN
f NaN NaN 5.0
g NaN NaN 6.0
这种情况下,另外一条轴上没有重叠,从索引的有序并集(外连接)上就可以看出来。传入join=‘inner’即可得到它们的交集
s4=pd.concat([s1*5,s3])
print (pd.concat([s1,s4],axis=1))
print (pd.concat([s1,s4],axis=1,join='inner'))
输出结果:
0 1
a 0.0 0
b 1.0 5
f NaN 5
g NaN 6
0 1
a 0 0
b 1 5
你可以通过join_axes指定要在其它轴上使用的索引
print (pd.concat([s1,s4],axis=1,join_axes=[['a','c','b','e']]))
输出结果:
0 1
a 0.0 0.0
c NaN NaN
b 1.0 5.0
e NaN NaN
Nan := Not A Number
有个问题,参与连接的片段在结果中区分不开。假设你想在连接轴上创建一个层次化索引。使用keys参数即可达到这个目的
result=pd.concat([s1,s2,s3],keys=['one','two','three'])
print (result)
print (result.unstack())
输出结果:
one a 0
b 1
two c 2
d 3
e 4
three f 5
g 6
dtype: int64
a b c d e f g
one 0.0 1.0 NaN NaN NaN NaN NaN
two NaN NaN 2.0 3.0 4.0 NaN NaN
three NaN NaN NaN NaN NaN 5.0 6.0
如果沿着axis=1对Series进行合并,则keys就会成为DataFrame的列头
print (pd.concat([s1,s2,s3],axis=1,keys=['one','two','three']))
输出结果:
one two three
a 0 NaN NaN
b 1 NaN NaN
c NaN 2 NaN
d NaN 3 NaN
e NaN 4 NaN
f NaN NaN 5
g NaN NaN 6
同样的逻辑对DataFrame对象也是一样
df5=DataFrame(np.arange(6).reshape(3,2),index=['a','b','c'],columns=['one','two'])
df6=DataFrame(5+np.arange(4).reshape(2,2),index=['a','c'],columns=['three','four'])
print (pd.concat([df5,df6],axis=1,keys=['level1','level2']))
输出结果:
level1 level2
one two three four
a 0 1 5 6
b 2 3 NaN NaN
c 4 5 7 8
如果传入的不是列表而是一个字典,则字典的键就会被当做keys选项的值
print (pd.concat({'level1':df5,'level2':df6},axis=1))
输出结果:
level1 level2
one two three four
a 0 1 5 6
b 2 3 NaN NaN
c 4 5 7 8
此外还有两个用于管理层次化索引创建方式的参数,见下表
print (pd.concat([df5,df6],axis=1,keys=['level1','level2'],names=['upper','lower']))
输出结果:
upper level1 level2
lower one two three four
a 0 1 5 6
b 2 3 NaN NaN
c 4 5 7 8
python3里面写函数的相关参数只需依次逗号分隔即可。
最后一个需要考虑的问题就是,跟当前分析工作无关的DataFrame行索引。传入ignore_index=True即可
df7=DataFrame(np.random.randn(3,4),columns=['a','b','c','d'])
df8=DataFrame(np.random.randn(2,3),columns=['b','d','a'])
print (df7)
print (df8)
print (pd.concat([df7,df8],ignore_index=True))
输出结果:
a b c d
0 -0.844224 0.593684 0.144469 0.729945
1 0.484216 -0.736679 -2.385474 0.004167
2 -0.007380 -0.129935 -0.014069 0.907947
b d a
0 -1.377938 -0.616348 0.936278
1 0.400851 2.066192 0.127229
a b c d
0 -0.844224 0.593684 0.144469 0.729945
1 0.484216 -0.736679 -2.385474 0.004167
2 -0.007380 -0.129935 -0.014069 0.907947
3 0.936278 -1.377938 NaN -0.616348
4 0.127229 0.400851 NaN 2.066192
待我学有所成,结发与蕊可好。@夏瑾墨 by Jooey