pandas concat知识点总结。
df1 = pd.DataFrame(np.random.random((3, 3)), columns=list("ABC"))
df2 = pd.DataFrame(np.random.random((1, 3)), columns=list("ABD"))
df3 = pd.DataFrame(np.random.random((2, 3)), columns=list("ABC"))
结果如下:
A B C
0 0.169679 0.250973 0.033926
1 0.517431 0.069529 0.117868
2 0.035198 0.693947 0.791094
A B D
0 0.285279 0.944059 0.433911
A B C
0 0.585748 0.851175 0.810830
1 0.901091 0.910537 0.475615
所有参数默认
默认axis=0,即沿着列方向合并,非合并方向columns取并集;默认情况下索引值不累加,即保留原来的索引
result = pd.concat([df1, df2])
结果如下:
A B C D
0 0.156784 0.940447 0.522436 NaN
1 0.728026 0.669282 0.495065 NaN
2 0.947834 0.150463 0.804081 NaN
0 0.891067 0.020196 NaN 0.242114
按列合并
axis=0,水平方向columns取并集,竖直方向index合并,不累加
result = pd.concat([df1, df2], axis=0)
结果如下:
A B C D
0 0.790409 0.950151 0.125780 NaN
1 0.074662 0.856551 0.558453 NaN
2 0.558790 0.418458 0.553458 NaN
0 0.244919 0.550575 NaN 0.778046
axis=1,水平方向columns相加,有相同的列也保留,竖直方向index取并集
result = pd.concat([df1, df2], axis=1)
结果如下:
A B C A B D
0 0.899548 0.893985 0.600403 0.665124 0.494773 0.18973
1 0.296687 0.954922 0.507403 NaN NaN NaN
2 0.280254 0.267325 0.375680 NaN NaN NaN
默认ignore_index=False
axis=0时,ignore_index=True,使索引累加
result = pd.concat([df1, df2], axis=0, ignore_index=True)
结果如下:
A B C D
0 0.142223 0.171115 0.345506 NaN
1 0.868534 0.969604 0.561111 NaN
2 0.769472 0.141292 0.846930 NaN
3 0.209132 0.726342 NaN 0.460136
axis=1时,ignore_index=True,采用默认索引
result = pd.concat([df1, df2], axis=1, ignore_index=True)
结果如下:
0 1 2 3 4 5
0 0.613608 0.699028 0.710746 0.158601 0.214546 0.70234
1 0.071271 0.058034 0.445593 NaN NaN NaN
2 0.433755 0.516567 0.791369 NaN NaN NaN
concat默认合并后取两个dataframe的columns的并集,可以采用属性join_axes设置合并后的列名和索引。
默认join_axes=None,axis=0时,按竖直方向合并,设置join_axes=[df1.columns],表示合并后columns使用df1的
result = pd.concat([df1, df2], axis=0, join_axes=[df1.columns])
结果如下:
A B C
0 0.056351 0.774601 0.379272
1 0.946589 0.068344 0.200789
2 0.876588 0.506720 0.210272
0 0.512249 0.523099 NaN
axis=1时,按水平方向合并,设置join_axes=[df1.index],合并后的index使用df1的
result = pd.concat([df1, df2], axis=1, join_axes=[df1.index])
结果如下:
A B C A B D
0 0.536844 0.498911 0.374395 0.340025 0.640539 0.611227
1 0.321700 0.487316 0.829186 NaN NaN NaN
2 0.493442 0.368903 0.480279 NaN NaN NaN
在concat函数中,join参数的值只能是inner和outer,不能是left和right
join='inner'时,相同的列必须完全相同,相当于取交集
result = pd.concat([df1, df2], axis=1, join='inner')
结果如下:
A B C A B D
0 0.526198 0.231218 0.478691 0.682161 0.377862 0.722153
join='outer'时,会保留所有列,相当于取并集
result = pd.concat([df1, df2], axis=1, join='outer')
结果如下:
A B C A B D
0 0.655303 0.132546 0.967381 0.400043 0.160096 0.268971
1 0.079759 0.210028 0.904587 NaN NaN NaN
2 0.172952 0.604146 0.531020 NaN NaN NaN
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])
结果如下:
A B C
0 0.116892 0.321588 0.670490
1 0.670587 0.830011 0.467221
2 0.901773 0.857747 0.127813
0 0.354468 0.269192 NaN
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])
结果如下:
A B C
0 0.422407 0.191703 0.951058
1 0.772422 0.868453 0.528624
2 0.752645 0.164527 0.400265
3 0.104067 0.747079 NaN
4 0.916764 0.083018 0.049442
5 0.943200 0.317038 0.404493
参考:
https://www.cnblogs.com/guxh/p/9451532.html