python脚本编程:批量对比文本文件,根据具体字段比较差异

有时候又这样的需求,有两个文件(里面是表形式的数据,字段有重合也有不一样的),需要对比两个文件之间的差异数据记录并摘出来

文件示例

A文件表每条记录的格式:

03090000   00049993   9222100502392220106000000020000029000170124500019054                 20170124 12:30:01622908347435512917       00049996   

B文件表格式

01006530    00096900    000480 0124174505 6228480478369552177 000000004066 000000000000  00000000000 0200 000000 5411 00000021 100504754110404 003081009289 00 000000 01030000    000000 00 071 000000000005 000000000000 D00000000001 1 000 6 0 0124174510 01030000    0 03     00000000000  00010111001   

其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据

代码

# dates to be compared
dateArr = ["170124", 
           "170125", 
           "170130",
           "170206", 
           "170211", 
           "170228", 
           "170304", 
           "170309", 
           "170314",
           "170321", 
           "170325"]
# local path that contains data
src_dir = "./src_data"
res_dir = "./res_data"

# the exact merchant ID to be concerned
gMchtId = "100502392220106"

# read files and compare, then write as records
print "start to compare file..."

for dateStr in dateArr:
    print "comparing " + dateStr + " files"
    mic_file_name = "M_IC" + dateStr + "OTRAD100502392220106"
    acom_file_name = "no_chongzhengIND" + dateStr + "01ACOM"

    # define mic set at this date 
    micIndexSet = set()
    # read mic file and create index keys
    print "reading " + dateStr + " mic file"
    with open(src_dir + '/' + mic_file_name, 'r') as micFileStream:
        # process file line by line
        for micLineStr in micFileStream:
            # pass the empty line
            if len(micLineStr) == 0:
                print "empty mic line"
                break
            # slice strings 
            micLineDataArray = micLineStr.split()
            combinedInfo = micLineDataArray[2]
            micMchtId = combinedInfo[4:19] 
            # pass wrong merchant ids
            if micMchtId != gMchtId:
                continue
            # get query index
            micIndex = combinedInfo[-12:]
            # add to mic index set
            micIndexSet.add(micIndex)

    # define linestr array to save the result lines
    resultLineStr = list()
    # read acom file and compare index keys
    print "reading " + dateStr + " acom file"
    with open(src_dir + '/' + acom_file_name, 'r') as acomFileStream:
        # process file line by line
        for acomLineStr in acomFileStream:
            if len(acomLineStr) == 0:
                print "empty acom line"
                break
            acomLineDataArray = acomLineStr.split()
            acomMchtId = acomLineDataArray[12]
            if acomMchtId != gMchtId:
                continue
            acomIndex = acomLineDataArray[13]
            # save the diffed lines
            if acomIndex not in micIndexSet:
                resultLineStr.append(acomLineStr)

    # write the result lines to file
    print "write " + dateStr + " result file"
    with open(res_dir + '/' + dateStr + "_result", 'w') as resultFileStream:
        res_str = ""
        for line in resultLineStr:
            res_str += line + '\n'
        resultFileStream.write(res_str)

print "compare over"

截图

python脚本编程:批量对比文本文件,根据具体字段比较差异_第1张图片
根据文件夹里文件的日期去批量拼文件名,结果置于另一文件夹,python处理速度还是不错的

你可能感兴趣的:(python)