merge( )合并需要指定连接键。
In [5]: df1=pd.DataFrame({'key':['b','b','a','a','b','a','c'],'data1':range(7)})
In [6]: df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)})
In [7]: df1
Out[7]:
data1 key
0 0 b
1 1 b
2 2 a
3 3 a
4 4 b
5 5 a
6 6 c
In [8]: df2
Out[8]:
data2 key
0 0 a
1 1 b
2 2 d
In [9]: pd.merge(df1,df2,on='key')
Out[9]:
data1 key data2
0 0 b 1
1 1 b 1
2 4 b 1
3 2 a 0
4 3 a 0
5 5 a 0
In [4]: df7=pd.DataFrame({'key1':['b','b','a','a','b','a','c'],'key2':['i','j','k','k','i','j','k'],'data1':range(7)})
In [5]: df8=pd.DataFrame({'key1':['a','b','d'],'key2':['k','j','i'],'data2':range(3)})
In [6]: df7
Out[6]:
key1 key2 data1
0 b i 0
1 b j 1
2 a k 2
3 a k 3
4 b i 4
5 a j 5
6 c k 6
In [7]: df8
Out[7]:
key1 key2 data2
0 a k 0
1 b j 1
2 d i 2
In [8]: pd.merge(df7,df8,on=['key1','key2'])
Out[8]:
key1 key2 data1 data2
0 b j 1 1
1 a k 2 0
2 a k 3 0
分别指明左右两侧的连接键
In [11]: df3=pd.DataFrame({'l_key':['b','b','a','a','b','a','c'],'data1':range(7)})
In [12]: df4=pd.DataFrame({'r_key':['a','b','d'],'data2':range(3)})
In [13]: pd.merge(df3,df4,left_on='l_key',right_on='r_key')
Out[13]:
data1 l_key data2 r_key
0 0 b 1 b
1 1 b 1 b
2 4 b 1 b
3 2 a 0 a
4 3 a 0 a
5 5 a 0 a
In [15]: df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)})
In [16]: pd.merge(df1,df2,on='key',how='outer')
Out[16]:
data1 key data2
0 0.0 b 1.0
1 1.0 b 1.0
2 4.0 b 1.0
3 2.0 a 0.0
4 3.0 a 0.0
5 5.0 a 0.0
6 6.0 c NaN
7 NaN d 2.0
只使用左边(或右边)中的DataFrame的键
In [17]: pd.merge(df1,df2,on='key',how='left')
Out[17]:
data1 key data2
0 0 b 1.0
1 1 b 1.0
2 2 a 0.0
3 3 a 0.0
4 4 b 1.0
5 5 a 0.0
6 6 c NaN
In [18]: pd.merge(df1,df2,on='key',how='right')
Out[18]:
data1 key data2
0 0.0 b 1
1 1.0 b 1
2 4.0 b 1
3 2.0 a 0
4 3.0 a 0
5 5.0 a 0
6 NaN d 2
In [24]: df7=pd.DataFrame({'key':['a','b','a','a','b','c'],'value':range(6)})
In [25]: df8=pd.DataFrame({'group_val':[3.5,7]},index=['a','b'])
In [26]: df7
Out[26]:
key value
0 a 0
1 b 1
2 a 2
3 a 3
4 b 4
5 c 5
In [27]: df8
Out[27]:
group_val
a 3.5
b 7.0
In [28]: pd.merge(df7,df8,left_on='key',right_index=True)
Out[28]:
key value group_val
0 a 0 3.5
2 a 2 3.5
3 a 3 3.5
1 b 1 7.0
4 b 4 7.0
产生的是行的笛卡尔积,由于左边的DataFrame有3个"b"行,右边的有两个,所以最终结果就有6个“b”行
In [19]: df5=pd.DataFrame({'key':['b','b','a','c','a','b'],'data1':range(6)})
In [20]: df6=pd.DataFrame({'key':['a','b','a','b','d'],'data2':range(5)})
In [21]: df5
Out[21]:
data1 key
0 0 b
1 1 b
2 2 a
3 3 c
4 4 a
5 5 b
In [22]: df6
Out[22]:
data2 key
0 0 a
1 1 b
2 2 a
3 3 b
4 4 d
In [23]: pd.merge(df5,df6,how='outer')
Out[23]:
data1 key data2
0 0.0 b 1.0
1 0.0 b 3.0
2 1.0 b 1.0
3 1.0 b 3.0
4 5.0 b 1.0
5 5.0 b 3.0
6 2.0 a 0.0
7 2.0 a 2.0
8 4.0 a 0.0
9 4.0 a 2.0
10 3.0 c NaN
11 NaN d 4.0
参数 | 说明 |
---|---|
left | 参与合并的左侧DataFrame |
right | 参与合并的右侧DataFrame |
how | “inner”,“outer”,“left”,“right"其中之一,默认为"inner” |
on | 用于连接的列名,必须存在于左右两个DataFrame |
left_on | 左侧DataFrame中用作连接键的列 |
right_on | 右侧DataFrame中用作连接键的列 |
left_index | 将左侧的行索引用作其连接键 |
right_index | 将右侧的行索引用作其连接键 |
sort | 根据连接键对合并后的数据进行排列,默认为True |
suffixes | 字符串值元组,用于追加到重叠列名的末尾,默认为(’_x’,‘_y’)。如果左右两个DataFrame对象都有“data”,则结果就会出现“data_x”和“data_y” |
copy | 默认为True。如果设置为False,可以避免将数据复制到结果数据结构中 |