python中merge与concat区别_python 合并(merge , concat , join , combine_first)

Merge

1 首先建立两个新的DataFrame

import pandas as pd

import numpy as np

df1=pd.DataFrame({'key':['a','b','c','d','e'],'data2':np.arange(5)})

df2=pd.DataFrame({'key':['a','b','c'],'data1':np.arange(3)})

In [238]: df1

Out[238]:

data1 key

0 0 a

1 1 b

2 2 c

3 3 d

4 4 e

In [239]: df2

Out[239]:

data1 key

0 0 a

1 1 b

2 2 c

2 使用merge函数合并

data=pd.merge(df1,df2,on='key',how='left')

示例如下:

In [30]: data=pd.merge(df1,df2,on='key',how='left')

In [31]: data

Out[31]:

data2 key data1

0 0 a 0.0

1 1 b 1.0

2 2 c 2.0

3 3 d NaN

4 4 e NaN

通过indicator表明merge的方式(这个功能日常工作中我比较少用)

In [32]: data=pd.merge(df1,df2,on='key',how='left',indicator=True)

In [33]: data

Out[33]:

data2 key data1 _merge

0 0 a 0.0 both

1 1 b 1.0 both

2 2 c 2.0 both

3 3 d NaN left_only

4 4 e NaN left_only

当两个数据集合并的列名不相同时用 left_on,right_on

df1=df1.rename(columns={'key':'key1'})

data=pd.merge(df1,df2,left_on='key1',right_on='key',how='left')

In [60]: data

Out[60]:

data2 key1 data1 key

0 0 a 0.0 a

1 1 b 1.0 b

2 2 c 2.0 c

3 3 d NaN NaN

4 4 e NaN NaN

data数据集将两个列名不相同的数据合并在一起了!

注:merge函数默认连接方式是inner,另外有left,right,outer等

3 多数据集合并

是针对合并后的数据再合并,不是一次性合并几个数据集

df1=pd.DataFrame({'key':['a','b','c','d','e'],'data1':np.arange(5)})

df2=pd.DataFrame({'key':['a','b','c'],'data2':np.arange(3)})

df3=pd.DataFrame({'key':['a','b','c','d'],'data3':np.arange(4)})

data=pd.merge(pd.merge(df1,df2,on='key',how='left'),df3,on='key',how='left')

In [55]: data

Out[55]:

data1 key data2 data3

0 0 a 0.0 0.0

1 1 b 1.0 1.0

2 2 c 2.0 2.0

3 3 d NaN 3.0

4 4 e NaN NaN

4 多条件合并

df1=pd.DataFrame({'key':['a','b','c','d','e'],'key1':

['one','one','two','one','two'],'data1':np.arange(5)})

df2=pd.DataFrame({'key':['a','b','c'],'key1':

['one','one','two'],'data2':np.arange(3)})

data=pd.merge(df1,df2,on=['key','key1'],how='left')

In [57]: data

Out[57]:

data1 key key1 data2

0 0 a one 0.0

1 1 b one 1.0

2 2 c two 2.0

3 3 d one NaN

4 4 e two NaN

备注:需要注意的是使用merge合并时,两个数据集的合并条件类型须一致。

Concat(类似numpy的concatenate)

合并两个数据集,可在行或者列上合并(axis)

#默认情况下axis=0

data=pd.concat([df1,df2])

data1=pd.concat([df1,df2],axis=1)

In [13]: data=pd.concat([df1,df2])

In [14]: data

Out[14]:

data1 data2 key

0 NaN 0.0 a

1 NaN 1.0 b

2 NaN 2.0 c

3 NaN 3.0 d

4 NaN 4.0 e

0 0.0 NaN a

1 1.0 NaN b

2 2.0 NaN c

ignore_index 不保留原来连接轴上的索引,生成一组新索引

In [74]: data=pd.concat([df1,df2],ignore_index=True)

In [75]: data

Out[75]:

data1 data2 key

0 NaN 0.0 a

1 NaN 1.0 b

2 NaN 2.0 c

3 NaN 3.0 d

4 NaN 4.0 e

5 0.0 NaN a

6 1.0 NaN b

7 2.0 NaN c

#纵向合并(axis=1是列)

In [15]: data1=pd.concat([df1,df2],axis=1)

In [16]: data1

Out[16]:

data2 key data1 key

0 0 a 0.0 a

1 1 b 1.0 b

2 2 c 2.0 c

3 3 d NaN NaN

4 4 e NaN NaN

多个数据集合并时 data1=pd.concat([df1,df2,….])

keys 可以判断数据来自哪个数据集,生成一个多重索引。

In [36]: data=pd.concat([df1,df2],keys=[0,1])

In [37]: data

Out[37]:

data1 data2 key

0 0 NaN 0.0 a

1 NaN 1.0 b

2 NaN 2.0 c

3 NaN 3.0 d

4 NaN 4.0 e

1 0 0.0 NaN a

1 1.0 NaN b

2 2.0 NaN c

Join

索引上的合并,是增加列而不是增加行

df3=pd.DataFrame([[1,2],[3,4],[5,6]],index=['a','b','c'],columns=['ao','bo'])

df4=pd.DataFrame([[7,8],[9,10],[10,12]],index=['e','b','c'],columns=['aoe','boe'])

df3.join(df4,how='outer')

In [38]: df3.join(df4,how='outer')

Out[38]:

ao bo aoe boe

a 1.0 2.0 NaN NaN

b 3.0 4.0 9.0 10.0

c 5.0 6.0 10.0 12.0

e NaN NaN 7.0 8.0

当合并的数据表列名字相同,通过lsuffix=”, rsuffix=” 区分相同列名的列

df5=pd.DataFrame([[7,8],[9,10],[10,12]],index=['e','b','c'],columns=['aoe','boe'])

df6=pd.DataFrame([[7,8],[9,10],[10,12]],index=['e','b','c'],columns=['aoe','boe'])

In [43]: df5.join(df6,how='outer',lsuffix='_l', rsuffix='_r')

Out[43]:

aoe_l boe_l aoe_r boe_r

e 7 8 7 8

b 9 10 9 10

c 10 12 10 12

Combine_first

若df7的数据缺失,则用df8的数据值填充df1的数据值

df7 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan],[np.nan, 7., np.nan]])

df8 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]], index=[1, 2])

In [61]: df7.combine_first(df8)

Out[61]:

0 1 2

0 NaN 3.0 5.0

1 -4.6 NaN -8.2

2 -5.0 7.0 4.0

你可能感兴趣的:(python中merge与concat区别_python 合并(merge , concat , join , combine_first))