数据合并merge函数
import pandas as pd
import numpy as np
df_left = pd.DataFrame(data=np.ones((5,6)),columns=["a","b","c","d","e","f"],index=["k1","k2","k3","k4","k5"])
df_right = pd.DataFrame(data=np.ones((5,6))*2,columns=["e","f","g","h","j","k"],index=["k3","k4","k5","k6","k7"])
df_left["key1"] = ["k1","k0","k0","k1","k1"]
df_left["key2"] = ["k0","k0","k1","k1","k0"]
df_right["key1"] = ["k1","k0","k0","k0","k1"]
df_right["key2"] = ["k0","k1","k1","k1","k0"]
print(df_right)
print(df_left)
e f g h j k key1 key2
k3 2.0 2.0 2.0 2.0 2.0 2.0 k1 k0
k4 2.0 2.0 2.0 2.0 2.0 2.0 k0 k1
k5 2.0 2.0 2.0 2.0 2.0 2.0 k0 k1
k6 2.0 2.0 2.0 2.0 2.0 2.0 k0 k1
k7 2.0 2.0 2.0 2.0 2.0 2.0 k1 k0
a b c d e f key1 key2
k1 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0
k2 1.0 1.0 1.0 1.0 1.0 1.0 k0 k0
k3 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1
k4 1.0 1.0 1.0 1.0 1.0 1.0 k1 k1
k5 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0
print(pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="inner"))
a b c d e_x f_x key1 key2 e_y f_y g h j k
0 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0
1 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0
2 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0
3 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0
4 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0
5 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0
6 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0
- merge的合并方式是outer 并显示出merge的方式
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="outer",indicator=True)
a b c d e_x f_x key1 key2 e_y f_y g h j k _merge
0 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
1 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
2 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
3 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
4 1.0 1.0 1.0 1.0 1.0 1.0 k0 k0 NaN NaN NaN NaN NaN NaN left_only
5 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
6 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
7 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
8 1.0 1.0 1.0 1.0 1.0 1.0 k1 k1 NaN NaN NaN NaN NaN NaN left_only
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="left",left_index=True,right_index=True,indicator=True)
a b c d e_x f_x key1 key2 e_y f_y g h j k _merge
k1 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 NaN NaN NaN NaN NaN NaN left_only
k2 1.0 1.0 1.0 1.0 1.0 1.0 k0 k0 NaN NaN NaN NaN NaN NaN left_only
k3 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
k4 1.0 1.0 1.0 1.0 1.0 1.0 k1 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
k5 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
- 使用right的方式进行合并 并指定索引位进行合并 且对数据追加后缀
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="right",left_index=True,right_index=True,indicator=True,suffixes=("_left","_right"))
a b c d e_left f_left key1 key2 e_right f_right g h j k _merge
k3 1.0 1.0 1.0 1.0 1.0 1.0 k0 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
k4 1.0 1.0 1.0 1.0 1.0 1.0 k1 k1 2.0 2.0 2.0 2.0 2.0 2.0 both
k5 1.0 1.0 1.0 1.0 1.0 1.0 k1 k0 2.0 2.0 2.0 2.0 2.0 2.0 both
k6 NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
k7 NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only