python 数据合并函数merge( )_Pandas的数据拼接-merge函数

27. Pandas的数据拼接-merge函数

concat函数可以实现内外连接,而pandas的merge函数可以真正实现数据库的内外连接,且外连接还可以有左右连接的特性。

merge函数默认拼接数据是inner join即内连接。下面以学生选课为例,设计两个DataFrame通过merge函数来拼接合并。

import pandas as pd

import numpy as np

col1 = "class_name class_id class_lecturer".split()

col2 = "class_id stu_id".split()

val1 = [["IT", 100, "Wangli"],["CS", 101, "WangMa"],["CAD", 102, "Liping"]]

val2 = [[100, 20181115],[100, 20181116],[101, 20181117]]

course = pd.DataFrame(val1, columns = col1)

print "***course", "*" * 38

print course

choose = pd.DataFrame(val2, columns = col2)

print "***choose", "*" * 38

print choose

print "***course merge choose", "*" * 25

print course.merge(choose)

print "***choose merge course", "*" * 25

print choose.merge(course)

程序的执行结果:

***course **************************************

class_name class_id class_lecturer

0 IT 100 Wangli

1 CS 101 WangMa

2 CAD 102 Liping

***choose **************************************

class_id stu_id

0 100 20181115

1 100 20181116

2 101 20181117

***course merge choose *************************

class_name class_id class_lecturer stu_id

0 IT 100 Wangli 20181115

1 IT 100 Wangli 20181116

2 CS 101 WangMa 20181117

***choose merge course *************************

class_id stu_id class_name class_lecturer

0 100 20181115 IT Wangli

1 100 20181116 IT Wangli

2 101 20181117 CS WangMa

merge的outer连接方式。结果是两个DataFrame均输出,未匹配上的用NaN填充。

import pandas as pd

import numpy as np

col1 = "class_name class_id class_lecturer".split()

col2 = "class_id stu_id".split()

val1 = [["IT", 100, "Wangli"],["CS", 101, "WangMa"],["CAD", 102, "Liping"], ["ME", 103, "Wufang"],["IT", 104, "Xiaomin"]]

val2 = [[100, 20181115],[100, 20181116],[101, 20181117]]

course = pd.DataFrame(val1, columns = col1)

print "***course", "*" * 38

print course

choose = pd.DataFrame(val2, columns = col2)

print "***choose", "*" * 38

print choose

print "***course merge choose in inner", "*" * 25

print course.merge(choose, how = "inner")

print "***course merge choose in outer", "*" * 25

print course.merge(choose, how = "outer")

print "***choose merge course in inner", "*" * 25

print choose.merge(course, how = "inner")

print "***choose merge course in outer", "*" * 25

print choose.merge(course, how = "outer")

程序执行结果:

***course **************************************

class_name class_id class_lecturer

0 IT 100 Wangli

1 CS 101 WangMa

2 CAD 102 Liping

3 ME 103 Wufang

4 IT 104 Xiaomin

***choose **************************************

class_id stu_id

0 100 20181115

1 100 20181116

2 101 20181117

***course merge choose in inner *************************

class_name class_id class_lecturer stu_id

0 IT 100 Wangli 20181115

1 IT 100 Wangli 20181116

2 CS 101 WangMa 20181117

***course merge choose in outer *************************

class_name class_id class_lecturer stu_id

0 IT 100 Wangli 20181115

1 IT 100 Wangli 20181116

2 CS 101 WangMa 20181117

3 CAD 102 Liping NaN

4 ME 103 Wufang NaN

5 IT 104 Xiaomin NaN

***choose merge course in inner *************************

class_id stu_id class_name class_lecturer

0 100 20181115 IT Wangli

1 100 20181116 IT Wangli

2 101 20181117 CS WangMa

***choose merge course in outer *************************

class_id stu_id class_name class_lecturer

0 100 20181115 IT Wangli

1 100 20181116 IT Wangli

2 101 20181117 CS WangMa

3 102 NaN CAD Liping

4 103 NaN ME Wufang

5 104 NaN IT Xiaomin

merge的左右连接,这里调用merge的Dataframe是“左表”而连接即作为形参的是DataFrame是“右表”。左连接左表全输出而右表能匹配的输出,匹配不上的填充NaN,同理右连接时“右表”全输出,而左表匹配上输出,匹配不上填充NaN。

import pandas as pd

import numpy as np

col1 = "class_name class_id class_lecturer".split()

col2 = "class_id stu_id".split()

val1 = [["IT", 100, "Wangli"],["CS", 101, "WangMa"],["CAD", 102, "Liping"], ["ME", 103, "Wufang"],["IT", 104, "Xiaomin"]]

val2 = [[100, 20181115],[100, 20181116],[101, 20181117],[100, 20181118],[101, 20181119], [200, 20181120]]

course = pd.DataFrame(val1, columns = col1)

print "***course", "*" * 38

print course

choose = pd.DataFrame(val2, columns = col2)

print "***choose", "*" * 38

print choose

print "***course merge choose in left", "*" * 25

print course.merge(choose, how = "left")

print "***course merge choose in right", "*" * 25

print course.merge(choose, how = "right")

print "***choose merge course in left", "*" * 25

print choose.merge(course, how = "left")

print "***choose merge course in right", "*" * 25

print choose.merge(course, how = "right")

程序执行结果:

***course **************************************

class_name class_id class_lecturer

0 IT 100 Wangli

1 CS 101 WangMa

2 CAD 102 Liping

3 ME 103 Wufang

4 IT 104 Xiaomin

***choose **************************************

class_id stu_id

0 100 20181115

1 100 20181116

2 101 20181117

3 100 20181118

4 101 20181119

5 200 20181120

***course merge choose in left *************************

class_name class_id class_lecturer stu_id

0 IT 100 Wangli 20181115

1 IT 100 Wangli 20181116

2 IT 100 Wangli 20181118

3 CS 101 WangMa 20181117

4 CS 101 WangMa 20181119

5 CAD 102 Liping NaN

6 ME 103 Wufang NaN

7 IT 104 Xiaomin NaN

***course merge choose in right *************************

class_name class_id class_lecturer stu_id

0 IT 100 Wangli 20181115

1 IT 100 Wangli 20181116

2 IT 100 Wangli 20181118

3 CS 101 WangMa 20181117

4 CS 101 WangMa 20181119

5 NaN 200 NaN 20181120

***choose merge course in left *************************

class_id stu_id class_name class_lecturer

0 100 20181115 IT Wangli

1 100 20181116 IT Wangli

2 101 20181117 CS WangMa

3 100 20181118 IT Wangli

4 101 20181119 CS WangMa

5 200 20181120 NaN NaN

***choose merge course in right *************************

class_id stu_id class_name class_lecturer

0 100 20181115 IT Wangli

1 100 20181116 IT Wangli

2 100 20181118 IT Wangli

3 101 20181117 CS WangMa

4 101 20181119 CS WangMa

5 102 NaN CAD Liping

6 103 NaN ME Wufang

7 104 NaN IT Xiaomin

请注意[200, 20181120]这条选课数据,课程id为200在course里并不存在。而["CAD", 102, "Liping"], ["ME", 103, "Wufang"],["IT", 104, "Xiaomin"]这三门课没有学生选。

由此可见,merge函数的left join、right join和数据库的表的left join、right join的概念完全匹配。

你可能感兴趣的:(python,数据合并函数merge(,))