pandas.DataFrame.merge() 参数详解

 

pandas.DataFrame.merge() 官方文档

Merge, join, and concatenate

pd.merge 是使用数据库风格的连接合并DataFrame或已命名的系列对象。

 

方法:

DataFrame.merge(self, right, how='inner', on=None, left_on=None, right_on=None,
                left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'),
                copy=True, indicator=False, validate=None)

主要参数:

right : DataFrame或命名的Series ,合并的对象。

how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’ 默认为合并两个frame的交集

Type of merge to be performed.  合并类型。

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • 仅使用左frame中的键,类似于SQL左外部联接;保留关键顺序
  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
  • 仅使用右frame中的键,类似于SQL右外部联接;保留关键顺序。
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • 使用两个frame中键的并集,类似于SQL完全外部联接;按字典顺序对键进行排序。
  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
  • 使用两个frame中关键点的交集,类似于SQL内部联接;保留左键的顺序。

on : label or list 

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

列名或索引 ,必须在两个DataFrame中都能找到。如果on为None且未用 索引 合并,则默认为两个DataFrame中列的交集

left_on : label or list, or array-like

Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

左DataFrame中的列名或索引。也可以是左DataFrame长度的数组或数组列表。

right_on : label or list, or array-like

Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

右DataFrame中的列名或索引。也可以是右DataFrame长度的数组或数组列表。

left_index : bool, default False

Use the index from the left DataFrame as the join key(s).   左DataFrame的索引作为连接键

right_index : bool, default False

Use the index from the right DataFrame as the join key.   右DataFrame的索引作为连接键

sort : bool, default False

Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).

在结果DataFrame中按字典顺序对连接键排序。如果为False,联接键的顺序取决于联接类型(how关键字)。

copy : bool, default True

If False, avoid copy if possible.  默认为True, 总是将数据复制到数据结构中。设为Fasle,尽可能避免复制。


举例:

 


1.1.  how='left' ,仅使用左 frame 中的键,例子中 age=39 的行,左右 frame 的 class 值不同,class 属性使用左 frame 的键值,同时右 frame 的 Marital 和 Income 在左 frame 没有所以显示NaN值。

result = pd.merge(left, right, how='left', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:      
  Age Marital Class     Age Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 39 Married b NaN
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 2596
5 24 Married b   5 24 4162 b   5 24 Married b 4162

1.2.  how='right' ,仅使用右 frame 中的键,例子中 age=39 的行,左右 frame 的 class 值不同,class 属性使用右 frame 的键值,同时左 frame 的 Gender 和 Ed 在右 frame 没有所以显示NaN值。

result = pd.merge(left, right, how='right', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:      
  Age Marital Class     Age Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 28 Divorced b 2596
4 28 Divorced b   4 28 2596 b   4 24 Married b 4162
5 24 Married b   5 24 4162 b   5 39 NaN a 12742

1.3.  how='inner', 使用两个frame中键的交集。默认值。

result = pd.merge(left, right, how='inner', on=None, left_on=None, 
                  right_on=None, left_index=False, right_index=False)
  left:         right:         result:      
  Age Marital Class     Age Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 28 Divorced b 2596
4 28 Divorced b   4 28 2596 b   4 24 Married b 4162
5 24 Married b   5 24 4162 b            

1.4.  how='outer' ,使用两个frame中关键点的并集。

result = pd.merge(left, right, how='outer', on=None, left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:      
  Age Marital Class     Age Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 39 Married b NaN
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 2596
5 24 Married b   5 24 4162 b   5 24 Married b 4162
                    6 39 NaN a 12742

2.1. on='Age' , 与 how 选择模式无关。on所选列名必须为左右 frame 相同列。

result = pd.merge(left, right, how='inner', on='Age', left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:        
  Age Marital Class     Age Income Class       Age Marital Class_x Income Class_y
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993 a
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502 a
2 34 Single a   2 34 6074 a   2 34 Single a 6074 a
3 39 Married b   3 39 12742 a   3 39 Married b 12742 a
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 2596 b
5 24 Married b   5 24 4162 b   5 24 Married b 4162 b

2.2. on='Class' , 与 how 选择模式无关,on所选列名必须为左右 frame 相同列。

result = pd.merge(left, right, how='inner', on='Class', left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:          
  Age Marital Class     Age Income Class       Age_x Marital Class   Age_y Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a   37 5993
1 54 Divorced a   1 54 10502 a   1 37 Divorced a   54 10502
2 34 Single a   2 34 6074 a   2 37 Divorced a   34 6074
3 39 Married b   3 39 12742 a   3 37 Divorced a   39 12742
4 28 Divorced b   4 28 2596 b   4 54 Divorced a   37 5993
5 24 Married b   5 24 4162 b   5 54 Divorced a   54 10502
                    6 54 Divorced a   34 6074
                    7 54 Divorced a   39 12742
                    8 34 Single a   37 5993
                    9 34 Single a   54 10502
                    10 34 Single a   34 6074
                    11 34 Single a   39 12742
                    12 39 Married b   28 2596
                    13 39 Married b   24 4162
                    14 28 Divorced b   28 2596
                    15 28 Divorced b   24 4162
                    16 24 Married b   28 2596
                    17 24 Married b   24 4162

2.3. on=['Age', 'Class'] , how='left'。on 所选列名为左右 frame 所有相同列名,效果与 on=None 相同。

result = pd.merge(left, right, how='left', on=['Age', 'Class'], 
                  left_on=None, right_on=None,
                  left_index=False, right_index=False)
  left:         right:         result:      
  Age Marital Class     Age Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 39 Married b NaN
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 2596
5 24 Married b   5 24 4162 b   5 24 Married b 4162

3.1 on='Age',  left_on='Age', right_on='Age'。

result = pd.merge(left, right, how='inner', on='Age', 
                  left_on='Age', right_on='Age',
                  left_index=False, right_index=False)

报错:"on" 和  "left_on" and "right_on", 不能同时使用。

    'Can only pass argument "on" OR "left_on" '
pandas.errors.MergeError: Can only pass argument "on" OR "left_on" and "right_on", not a combination of both.

3.2 left_on='Age', right_on='Age',  左右frame中 列名相同和数据类型相同。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age', right_on='Age',
                  left_index=False, right_index=False)
  Age Marital Class     Age Income Class       Age Marital Class_x Income Class_y
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993 a
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502 a
2 34 Single a   2 34 6074 a   2 34 Single a 6074 a
3 39 Married b   3 39 12742 a   3 39 Married b 12742 a
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 2596 b
5 24 Married b   5 24 4162 b   5 24 Married b 4162 b

3.3  left_on='Age', right_on='Income',  左右frame中 列名不同和数据类型相同。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age', right_on='Income',
                  left_index=False, right_index=False)
  left:         right:         result:          
  Age Marital Class     Age Income Class       Age_x Marital Class_x Age_y Income Class_y
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 37 5993 a
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 54 10502 a
2 34 Single a   2 34 6074 a   2 34 Single a 34 6074 a
3 39 Married b   3 39 12742 a   3 39 Married b 39 12742 a
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 28 2596 b
5 24 Married b   5 24 4162 b   5 24 Married b 24 4162 b

3.4  left_on='Age_1', right_on='Age_2',  左右frame中 列名不同和数据类型相同,数据值相同。

       suffixes=('_l', '_r'), 设置应用于左侧和右侧重叠列名的后缀。若要对重叠列引发异常,请使用(False, False)。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on='Age_1', right_on='Age_2',
                  left_index=False, right_index=False,
                  suffixes=('_l', '_r'))
  left:         right:         result:          
  Age_1 Marital Class     Age_2 Income Class       Age_1 Marital Class_l Age_2 Income Class_r
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 37 5993 a
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 54 10502 a
2 34 Single a   2 34 6074 a   2 34 Single a 34 6074 a
3 39 Married b   3 39 12742 a   3 39 Married b 39 12742 a
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 28 2596 b
5 24 Married b   5 24 4162 b   5 24 Married b 24 4162 b

4.1 left_index=False, right_index=False, 

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=False, right_index=False,
                  suffixes=('_l', '_r'))
  left:         right:         result:      
  Age_1 Marital Class     Age_2 Income Class       Age Marital Class Income
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 5993
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 10502
2 34 Single a   2 34 6074 a   2 34 Single a 6074
3 39 Married b   3 39 12742 a   3 28 Divorced b 2596
4 28 Divorced b   4 28 2596 b   4 24 Married b 4162
5 24 Married b   5 24 4162 b            

4.2 left_index=True, right_index=True,  使用来自左 右DataFrame的索引作为连接键。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=True, right_index=True,
                  suffixes=('_l', '_r'))
  left:         right:         result:          
  Age_1 Marital Class     Age_2 Income Class       Age_l Marital Class_l Age_r Income Class_r
0 37 Divorced a   0 37 5993 a   0 37 Divorced a 37 5993 a
1 54 Divorced a   1 54 10502 a   1 54 Divorced a 54 10502 a
2 34 Single a   2 34 6074 a   2 34 Single a 34 6074 a
3 39 Married b   3 39 12742 a   3 39 Married b 39 12742 a
4 28 Divorced b   4 28 2596 b   4 28 Divorced b 28 2596 b
5 24 Married b   5 24 4162 b   5 24 Married b 24 4162 b

4.3  left_index=True, right_index=False,  使用来自左 右DataFrame的索引作为连接键。

result = pd.merge(left, right, how='inner', on=None, 
                  left_on=None, right_on=None,
                  left_index=True, right_index=False,
                  suffixes=('_l', '_r'))

报错:

pandas.errors.MergeError: Must pass right_on or right_index=True

说明  必须right_on传参 或 left_index=True 必须 和 right_index=True 共同使用。

 

你可能感兴趣的:(python,pandas)