pandas入门——数据合并merge函数

数据合并merge函数

  • 创建数据集
# 导入pandas和numpy包
import pandas as pd
import numpy as np

# 创建两个数据框
df_left = pd.DataFrame(data=np.ones((5,6)),columns=["a","b","c","d","e","f"],index=["k1","k2","k3","k4","k5"])
df_right = pd.DataFrame(data=np.ones((5,6))*2,columns=["e","f","g","h","j","k"],index=["k3","k4","k5","k6","k7"])

df_left["key1"] = ["k1","k0","k0","k1","k1"]
df_left["key2"] = ["k0","k0","k1","k1","k0"]

df_right["key1"] = ["k1","k0","k0","k0","k1"]
df_right["key2"] = ["k0","k1","k1","k1","k0"]

print(df_right)
print(df_left)

    e   f   g   h   j   k   key1    key2
k3  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0
k4  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k5  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k6  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k7  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0


    a   b   c   d   e   f   key1    key2
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
  • merge默认的合并方式是inner
print(pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="inner"))


a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
  • merge的合并方式是outer 并显示出merge的方式
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="outer",indicator=True)

a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
7   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
8   1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  NaN NaN NaN NaN NaN NaN left_only
  • 使用left的方式进行合并 并指定索引位进行合并
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="left",left_index=True,right_index=True,indicator=True)

a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  NaN NaN NaN NaN NaN NaN left_only
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
  • 使用right的方式进行合并 并指定索引位进行合并 且对数据追加后缀
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="right",left_index=True,right_index=True,indicator=True,suffixes=("_left","_right"))

a   b   c   d   e_left  f_left  key1    key2    e_right f_right g   h   j   k   _merge
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
k6  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
k7  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only

你可能感兴趣的:(pandas入门,pandas)