14 Pandas实现数据的合并concat

14 Pandas实现数据的合并concat

使用场景:

批量合并相同格式的Excel、给DataFrame添加行、给DataFrame添加列

一句话说明concat语法:

  • 使用某种合并方式(inner/outer)
  • 沿着某个轴向(axis=0/1)
  • 把多个Pandas对象(DataFrame/Series)合并成一个。

concat语法:pandas.concat(objs, axis=0, join='outer', ignore_index=False)

  • objs:一个列表,内容可以是DataFrame或者Series,可以混合
  • axis:默认是0代表按行合并,如果等于1代表按列合并
  • join:合并的时候索引的对齐方式,默认是outer join,也可以是inner join
  • ignore_index:是否忽略掉原来的数据索引

append语法:DataFrame.append(other, ignore_index=False)

append只有按行合并,没有按列合并,相当于concat按行的简写形式

  • other:单个dataframe、series、dict,或者列表
  • ignore_index:是否忽略掉原来的数据索引

参考文档:

  • pandas.concat的api文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
  • pandas.concat的教程:https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
  • pandas.append的api文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
  
  
  
  
import pandas as pd import warnings warnings.filterwarnings('ignore')

一、使用pandas.concat合并数据

  
  
  
  
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3'], 'E': ['E0', 'E1', 'E2', 'E3'] }) df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
  
  
  
  
df2 = pd.DataFrame({ 'A': ['A4', 'A5', 'A6', 'A7'], 'B': ['B4', 'B5', 'B6', 'B7'], 'C': ['C4', 'C5', 'C6', 'C7'], 'D': ['D4', 'D5', 'D6', 'D7'], 'F': ['F4', 'F5', 'F6', 'F7'] }) df2
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D F
0 A4 B4 C4 D4 F4
1 A5 B5 C5 D5 F5
2 A6 B6 C6 D6 F6
3 A7 B7 C7 D7 F7

1、默认的concat,参数为axis=0、join=outer、ignore_index=False

  
  
  
  
pd.concat([df1,df2])
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E F
0 A0 B0 C0 D0 E0 NaN
1 A1 B1 C1 D1 E1 NaN
2 A2 B2 C2 D2 E2 NaN
3 A3 B3 C3 D3 E3 NaN
0 A4 B4 C4 D4 NaN F4
1 A5 B5 C5 D5 NaN F5
2 A6 B6 C6 D6 NaN F6
3 A7 B7 C7 D7 NaN F7

2、使用ignore_index=True可以忽略原来的索引

  
  
  
  
pd.concat([df1,df2], ignore_index=True)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E F
0 A0 B0 C0 D0 E0 NaN
1 A1 B1 C1 D1 E1 NaN
2 A2 B2 C2 D2 E2 NaN
3 A3 B3 C3 D3 E3 NaN
4 A4 B4 C4 D4 NaN F4
5 A5 B5 C5 D5 NaN F5
6 A6 B6 C6 D6 NaN F6
7 A7 B7 C7 D7 NaN F7

3、使用join=inner过滤掉不匹配的列

  
  
  
  
pd.concat([df1,df2], ignore_index=True, join="inner")
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7

4、使用axis=1相当于添加新列

  
  
  
  
df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3

A:添加一列Series

  
  
  
  
s1 = pd.Series(list(range(4)), name="F") pd.concat([df1,s1], axis=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E F
0 A0 B0 C0 D0 E0 0
1 A1 B1 C1 D1 E1 1
2 A2 B2 C2 D2 E2 2
3 A3 B3 C3 D3 E3 3

B:添加多列Series

  
  
  
  
s2 = df1.apply(lambda x:x["A"]+"_GG", axis=1) s2 0 A0_GG 1 A1_GG 2 A2_GG 3 A3_GG dtype: object s2.name="G" pd.concat([df1,s1,s2], axis=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B C D E F G
0 A0 B0 C0 D0 E0 0 A0_GG
1 A1 B1 C1 D1 E1 1 A1_GG
2 A2 B2 C2 D2 E2 2 A2_GG
3 A3 B3 C3 D3 E3 3 A3_GG
  
  
  
  
# 列表可以只有Series pd.concat([s1,s2], axis=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
F G
0 0 A0_GG
1 1 A1_GG
2 2 A2_GG
3 3 A3_GG
  
  
  
  
# 列表是可以混合顺序的 pd.concat([s1,df1,s2], axis=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
F A B C D E G
0 0 A0 B0 C0 D0 E0 A0_GG
1 1 A1 B1 C1 D1 E1 A1_GG
2 2 A2 B2 C2 D2 E2 A2_GG
3 3 A3 B3 C3 D3 E3 A3_GG

二、使用DataFrame.append按行合并数据

  
  
  
  
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B
0 1 2
1 3 4
  
  
  
  
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB')) df2
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B
0 5 6
1 7 8

1、给1个dataframe添加另一个dataframe

  
  
  
  
df1.append(df2)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B
0 1 2
1 3 4
0 5 6
1 7 8

2、忽略原来的索引ignore_index=True

  
  
  
  
df1.append(df2, ignore_index=True)
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A B
0 1 2
1 3 4
2 5 6
3 7 8

3、可以一行一行的给DataFrame添加数据

  
  
  
  
# 一个空的df df = pd.DataFrame(columns=['A']) df
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A

A:低性能版本

  
  
  
  
for i in range(5): # 注意这里每次都在复制 df = df.append({'A': i}, ignore_index=True) df
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A
0 0
1 1
2 2
3 3
4 4

B:性能好的版本

  
  
  
  
# 第一个入参是一个列表,避免了多次复制 pd.concat( [pd.DataFrame([i], columns=['A']) for i in range(5)], ignore_index=True )
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 
A
0 0
1 1
2 2
3 3
4 4

本文使用 文章同步助手 同步

你可能感兴趣的:(14 Pandas实现数据的合并concat)