#创建4个DataFrame
columns=list("ABCDE")
data=pd.DataFrame([[char+str(i) for char in columns] for i in range(12)],columns=columns)
df1,df2,df3,df4=data.iloc[:4,:4],data.iloc[4:8,:4],data.iloc[8:,:4],data.loc[[2,3,6,7],list('BDE')]
result=pd.concat([df1,df2,df3],axis=0)
result
当axis = 1的时候,concat就是行对齐,然后将不同列名称的两张表合并
result = pd.concat([df1, df4], axis=1)
result
加上join参数的属性,如果为’inner’得到的是两表的交集,如果是outer,得到的是两表的并集。
result = pd.concat([df1, df4], axis=1, join='inner')
result
如果有join_axes的参数传入,可以指定根据那个轴来对齐数据
result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])
result
如果两个表的index都没有实际含义,使用ignore_index参数,置true,合并的两个表就会根据列字段对齐,然后合并。最后再重新整理一个新的index。
result = pd.concat([df1, df4], axis=0, ignore_index=True)
result
keys参数可以用来给合并后的表增加key来区分不同的表数据来源
直接用keys参数实现
result = pd.concat([df1,df2,df3], keys=['x', 'y', 'z'],axis=0)
result
pieces = {'x': df1, 'y': df2, 'z': df3}
result = pd.concat(pieces,axis=0)
result
append方法等价于pd.concat(axis=0)的情况
df1.append(df2)
Merge method | SQL Join Name | Description |
---|---|---|
left | LEFT OUTER JOI | Use keys from left frame only |
right | RIGHT OUTER JOIN | Use keys from right frame only |
outer | FULL OUTER JOIN | Use union of keys from both frames |
inner | INNER JOIN | Use intersection of keys from both frames |
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])
result = pd.merge(left, right, how='left', on=['key1', 'key2'])
result = pd.merge(left, right, how='right', on=['key1', 'key2'])
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])