Pandas之数据合并(concat、merge)详解

数据合并

#准备数据
import numpy as np
import pandas as pd
def make_df(cols,ind):
    '''生成一个简单的DataFrame数据'''
    data = {
     c: [str(c) + str(i) for i in ind] for c in cols}
    return pd.DataFrame(data,ind)
#测试函数
df = make_df("ABC", range(5))
print(df)

Pandas之数据合并(concat、merge)详解_第1张图片

concat

pd.concat实现Series的合并
import pandas as pd
#简单的合并
s1 = pd. Series(list("ABC"), index = [1,2,3])
s2 = pd. Series(list("DEF"), index =[4,5,6])
s = pd.concat([s1, s2] )
print("合并后: \n", s)

Pandas之数据合并(concat、merge)详解_第2张图片

pd.concat实现DataFrame的合并
import pandas as pd
def make_df(cols, index):
    """一个简单的DataFrame"""
    data = {
     c:[str(c)+str(i) for i in index] for c in cols}
    return pd.DataFrame(data, index)
# DF合并
df1 = make_df("ABC", [1,2,3] )
df2 = make_df("DEF", [4,5,6] )
print("df1 = \n", df1)
print("\n df2 = \n", df2)

Pandas之数据合并(concat、merge)详解_第3张图片

merge

import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})
#此数据结构意味着每个组需要掌握的专业技能,一 组需要有多个技能用重复值表示
df4 = pd. DataFrame({
     "group":["I", "I", "II","II", "II", "III","III"],"skills": ["Linux", "Python", "Java" ,"Math", "English", "C++", "PHP"]})
print("df1 = \n",df1)
print("\n df2 = \n", df2)
print("\n df3 = \n",df3)
print("\n df4 = \n",df4)

Pandas之数据合并(concat、merge)详解_第4张图片

一对一的连接
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "name": list ("ABCD"),"score": [61,78,74,98]})

df5=pd.merge(df1,df2)
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第5张图片

多对一的连接
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})

df5=pd.merge(df1,df3)
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第6张图片

多对多的连接
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})
#此数据结构意味着每个组需要掌握的专业技能,一 组需要有多个技能用重复值表示
df4 = pd. DataFrame({
     "group":["I", "I", "II","II", "II", "III","III"],"skills": ["Linux", "Python", "Java" ,"Math", "English", "C++", "PHP"]})

df5=pd.merge(df1,df4)
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第7张图片

on参数的应用

代表着起作用的那个

import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})
#此数据结构意味着每个组需要掌握的专业技能,一 组需要有多个技能用重复值表示
df4 = pd. DataFrame({
     "group":["I", "I", "II","II", "II", "III","III"],"skills": ["Linux", "Python", "Java" ,"Math", "English", "C++", "PHP"]})

df5=pd.merge(df1,df2,on="name")
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第8张图片

left_on和right_on参数的应用
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "my_name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})
#此数据结构意味着每个组需要掌握的专业技能,一 组需要有多个技能用重复值表示
df4 = pd. DataFrame({
     "group":["I", "I", "II","II", "II", "III","III"],"skills": ["Linux", "Python", "Java" ,"Math", "English", "C++", "PHP"]})

df5=pd.merge(df1,df2,left_on="name",right_on="my_name")
print("df5 = \n",df5)
print("\n df5.drop = \n",df5.drop("my_name",axis=1))

Pandas之数据合并(concat、merge)详解_第9张图片

left_index和right_index参数的说明
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":["I", "II", "III", "II"]})
df2 = pd. DataFrame( {
     "my_name": list ("ABCD"),"score": [61,78,74,98]})
df3 = pd. DataFrame({
     "group":["I", "II", "III"], "leader": ["Alice", "Bob", "Cindy"]})
#此数据结构意味着每个组需要掌握的专业技能,一 组需要有多个技能用重复值表示
df4 = pd. DataFrame({
     "group":["I", "I", "II","II", "II", "III","III"],"skills": ["Linux", "Python", "Java" ,"Math", "English", "C++", "PHP"]})

df5=pd.merge(df1,df2,left_index=True,right_index=True)
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第10张图片

how参数的使用
  • 内连接: how=‘inner’, 此时结果只保留交集
  • 外连接: how=‘outer’,此时结果保留的是两个数据集的并集
  • 左连接:how=‘left’,此时结果保留左侧全部内容,有连接的右侧内容也会保留
  • 右连接:how=‘right’,此时结果保留右侧全部内容,有链接的左侧内容也会保留
import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":[1,2,3,4]})
df2 = pd. DataFrame( {
     "name": list ("EFGH"),"score": [61,78,74,98]})

df5=pd.merge(df1,df2,how="outer")
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第11张图片

import pandas as pd
##准备数据
df1 = pd. DataFrame({
     "name":list("ABCD"),"group":[1,2,3,4]})
df2 = pd. DataFrame( {
     "name": list ("EFGH"),"score": [61,78,74,98]})

df5=pd.merge(df1,df2,how="inner")
print("df5 = \n",df5)

Pandas之数据合并(concat、merge)详解_第12张图片

你可能感兴趣的:(日常Python,python,数据分析,concat,merge)