Python 对比两个数据集 datacompy包 及报错 TypeError: ‘NoneType‘ object is not iterable的解决

目标:对比两个数据集是否完全相同

数据集:df1,df2

方法一:pandas

两个数据集相减

#df2减df1
import pandas as pd
set_diff_df = pd.concat([df1, df2, df1]).drop_duplicates(keep=False)
print(set_diff_df)

结果

Empty DataFrame

表示两个数据集相同

方法二:datacompy包

需要安装

这个包的详细说明https://capitalone.github.io/datacompy/install.html

Windows10 Python3环境 anaconda进行安装

conda install datacompy

成功后运行

import datacompy
compare=datacompy.Compare(df1,df2,abs_tol=0.000001)
print(compare.report())

报错

TypeError: 'NoneType' object is not iterable

后来发现是没有加入对比的连接列

以索引为连接列的代码如下

import datacompy
compare=datacompy.Compare(df1,df2,abs_tol=0.000001,on_index=True)
print(compare.report())

成功!

结果如下

ataComPy Comparison
--------------------

DataFrame Summary
-----------------

  DataFrame  Columns  Rows
0       df1       12  2000
1       df2       12  2000

Column Summary
--------------

Number of columns in common: 12
Number of columns in df1 but not in df2: 0
Number of columns in df2 but not in df1: 0

Row Summary
-----------

Matched on: index
Any duplicates on match values: No
Absolute Tolerance: 1e-06
Relative Tolerance: 0
Number of rows in common: 2,000
Number of rows in df1 but not in df2: 0
Number of rows in df2 but not in df1: 0

Number of rows with some compared columns unequal: 0
Number of rows with all compared columns equal: 2,000

Column Comparison
-----------------

Number of columns compared with some values unequal: 0
Number of columns compared with all values equal: 12
Total number of values which compare unequal: 0

也可以以数据的某个变量为连接列

语句中加入:

join_columns=['变量名']

你可能感兴趣的:(python)