有时候又这样的需求,有两个文件(里面是表形式的数据,字段有重合也有不一样的),需要对比两个文件之间的差异数据记录并摘出来
A文件表每条记录的格式:
03090000 00049993 9222100502392220106000000020000029000170124500019054 20170124 12:30:01622908347435512917 00049996
B文件表格式
01006530 00096900 000480 0124174505 6228480478369552177 000000004066 000000000000 00000000000 0200 000000 5411 00000021 100504754110404 003081009289 00 000000 01030000 000000 00 071 000000000005 000000000000 D00000000001 1 000 6 0 0124174510 01030000 0 03 00000000000 00010111001
其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据
# dates to be compared
dateArr = ["170124",
"170125",
"170130",
"170206",
"170211",
"170228",
"170304",
"170309",
"170314",
"170321",
"170325"]
# local path that contains data
src_dir = "./src_data"
res_dir = "./res_data"
# the exact merchant ID to be concerned
gMchtId = "100502392220106"
# read files and compare, then write as records
print "start to compare file..."
for dateStr in dateArr:
print "comparing " + dateStr + " files"
mic_file_name = "M_IC" + dateStr + "OTRAD100502392220106"
acom_file_name = "no_chongzhengIND" + dateStr + "01ACOM"
# define mic set at this date
micIndexSet = set()
# read mic file and create index keys
print "reading " + dateStr + " mic file"
with open(src_dir + '/' + mic_file_name, 'r') as micFileStream:
# process file line by line
for micLineStr in micFileStream:
# pass the empty line
if len(micLineStr) == 0:
print "empty mic line"
break
# slice strings
micLineDataArray = micLineStr.split()
combinedInfo = micLineDataArray[2]
micMchtId = combinedInfo[4:19]
# pass wrong merchant ids
if micMchtId != gMchtId:
continue
# get query index
micIndex = combinedInfo[-12:]
# add to mic index set
micIndexSet.add(micIndex)
# define linestr array to save the result lines
resultLineStr = list()
# read acom file and compare index keys
print "reading " + dateStr + " acom file"
with open(src_dir + '/' + acom_file_name, 'r') as acomFileStream:
# process file line by line
for acomLineStr in acomFileStream:
if len(acomLineStr) == 0:
print "empty acom line"
break
acomLineDataArray = acomLineStr.split()
acomMchtId = acomLineDataArray[12]
if acomMchtId != gMchtId:
continue
acomIndex = acomLineDataArray[13]
# save the diffed lines
if acomIndex not in micIndexSet:
resultLineStr.append(acomLineStr)
# write the result lines to file
print "write " + dateStr + " result file"
with open(res_dir + '/' + dateStr + "_result", 'w') as resultFileStream:
res_str = ""
for line in resultLineStr:
res_str += line + '\n'
resultFileStream.write(res_str)
print "compare over"