http://liao.cpython.org/pandas26/
http://liao.cpython.org/pandas25/
https://blog.csdn.net/weixin_37226516/article/details/64134643
concat(objs, axis=0, join=‘outer’, join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
s1 = pd.Series(np.arange(10,13))
s2 = pd.Series(np.arange(100,103))
pd.concat([s1,s2])
Out[13]:
0 10
1 11
2 12
0 100
1 101
2 102
dtype: int32
pd.concat([s1,s2], keys = [1,2])
Out[14]:
1 0 10
1 11
2 12
2 0 100
1 101
2 102
dtype: int32
pd.concat([s1,s2], keys = [1,2],names = ['from','ID'])
Out[16]:
from ID
1 0 10
1 11
2 12
2 0 100
1 101
2 102
dtype: int32
横向拼接 axis = 1
要在相接的时候在加上一个层次的key来识别数据源自于哪张表,可以增加key参数
s1 = pd.Series(np.arange(10,15))
s2 = pd.Series(np.arange(100,103))
pd.concat([s1,s2], axis = 1,keys = ['s1','s2'],names = ['from','ID'])
Out[21]:
s1 s2
0 10 100.0
1 11 101.0
2 12 102.0
3 13 NaN
4 14 NaN
DataFrame
objects with identical columns.练习创建df
idx = 'this is a fake data'.split()
df1 = pd.DataFrame({
'Country':['China','Japan','Germany','USA','UK'],'Team':['A','B','A','C','D']},index = idx)
col = 'Country Team'.split()
idx_2 = ['fake','world']
values = [['KLR',100],['abc',200]]
df2 = pd.DataFrame(values,index = idx_2, columns = col)
df1
Out[43]:
Country Team
this China A
is Japan B
a Germany A
fake USA C
data UK D
df2
Out[44]:
Country Team
fake KLR 100
world abc 200
默认纵向拼接:
pd.concat([df1,df2])
Out[45]:
Country Team
this China A
is Japan B
a Germany A
fake USA C
data UK D
fake KLR 100
world abc 200
添加axis = 1 后的拼接,横向拼接如果index 有相同的, 会默认拼接到相同的index 上
pd.concat([df1,df2],axis = 1)
Out[46]:
Country Team Country Team
a Germany A NaN NaN
data UK D NaN NaN
fake USA C KLR 100.0
is Japan B NaN NaN
this China A NaN NaN
world NaN NaN abc 200.0
创建一个不同列的df3:
col = ['Team','SBF']
idx_3= ['true','world']
values3 = [['red','pm'],['orange','pl']]
df3 = pd.DataFrame(values3,index = idx_3, columns = col)
df3
Out[51]:
Team SBF
true red pm
world orange pl
根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起
pd.concat([df1,df3])
Country SBF Team
this China NaN A
is Japan NaN B
a Germany NaN A
fake USA NaN C
data UK NaN D
true NaN pm red
world NaN pl orange
根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起,但是相同index的行不会在一起:
pd.concat([df2,df3])
Out[59]:
Country SBF Team
fake KLR NaN 100
world abc NaN 200
true NaN pm red
world NaN pl orange
当axis = 1时, index 相同的会拼接,columns 相同的不会,只是简单都左+右都放在一起
pd.concat([df2,df3],axis = 1)
Out[62]:
Country Team Team SBF
fake KLR 100.0 NaN NaN
true NaN NaN red pm
world abc 200.0 orange pl
抽取其中的一列做拼接:
pd.concat([df1.Team,df2.Team,df3.Team])
Out[64]:
this A
is B
a A
fake C
data D
fake 100
world 200
true red
world orange
Name: Team, dtype: object
如果这样写会报错:
pd.concat(df1['Team'],df2['Team'],df3['Team'])
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"
pd.concat(df1[['Team']],df2[['Team']],df3[['Team']])
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
pd.concat([df2,df3],join = 'inner')
Out[73]:
Team
fake 100
world 200
true red
world orange
默认的是join = ‘outer’:
pd.concat([df2,df3],join = 'outer')
pd.concat([df2,df3])
Out[74]:
Country SBF Team
fake KLR NaN 100
world abc NaN 200
true NaN pm red
world NaN pl orange
pd.concat([df2,df3], ignore_index = True)
Out[77]:
Country SBF Team
0 KLR NaN 100
1 abc NaN 200
2 NaN pm red
3 NaN pl orange