日常工作中,时常会遇到因数据错误而引起的bug,这个问题,有时候会很头疼,费尽心力排查原因,最后发现非逻辑性问题,仅是由一个数据填错而引发的,真是觉得耗费时间,一次两次还能接受,但是对于一个数据量很大的工程来说,让你多次排查类似的错误,真是一件体力活。为了不再干这件苦差事,遂决定用python写一个数据检索的脚本,把错误提前暴露出来,解放了自己也提高了团队的工作效率。
环境
1. python3.6
2. win7 64位
python语法及函数使用请参照官方文档
python官方文档
python基础教程
定义windows控制台下的标准输出句柄和颜色宏
设置文本颜色,自定义格式化输出
脚本名称:data_review.py
import codecs
import ctypes
import sys
# 标准输出句柄
STD_INPUT_HANDLE = -10
STD_OUTPUT_HANDLE = -11
STD_ERROR_HANDLE = -12
# 前景色
FOREGROUND_BLACK = 0x0
FOREGROUND_BLUE = 0x01
FOREGROUND_GREEN = 0x02
FOREGROUND_RED = 0x04
FOREGROUND_YELLOW = 0x0e
FOREGROUND_INTENSITY = 0x08
# 背景色
BACKGROUND_BLUE = 0x10
BACKGROUND_GREEN = 0x20
BACKGROUND_RED = 0x40
BACKGROUND_YELLOW = 0xe0
BACKGROUND_INTENSITY = 0x80
# get handle
std_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
def set_text_color(color, handle = std_out_handle):
ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color)
def reset_color():
set_text_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
def print_red_msg(msg):
set_text_color(FOREGROUND_RED | FOREGROUND_INTENSITY)
print(msg)
reset_color()
def print_green_msg(msg):
set_text_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY)
print(msg)
reset_color()
def print_blue_msg(msg):
set_text_color(FOREGROUND_BLUE | FOREGROUND_INTENSITY)
print(msg)
reset_color()
def print_yellow_msg(msg):
set_text_color(FOREGROUND_YELLOW | FOREGROUND_INTENSITY)
print(msg)
reset_color()
定义数据字典
以任务表为例
# 数据字典
Quest_Dict = {}
定义数据检索函数
这里需要留意codecs.open的使用,参考官方文档说明
codecs.open(filename, mode='r', encoding=None, errors='strict', buffering=1)
Open an encoded file using the given mode and return an instance of StreamReaderWriter, providing transparent encoding/decoding. The default file mode is 'r', meaning to open the file in read mode.
Note: Underlying encoded files are always opened in binary mode. No automatic conversion of '\n' is done on reading and writing. The mode argument may be any binary mode acceptable to the built-in open() function; the 'b' is automatically added.
encoding specifies the encoding which is to be used for the file. Any encoding that encodes to and decodes from bytes is allowed, and the data types supported by the file methods depend on the codec used.
errors may be given to define the error handling. It defaults to 'strict' which causes a ValueError to be raised in case an encoding error occurs.
buffering has the same meaning as for the built-in open() function. It defaults to line buffered.
对于codecs.open所使用的文件编码格式参照官方文档 7.2.3. Standard Encodings 中所列出的。我们这里文件使用的带BOM的UTF8的编码格式。注意 utf_8 和 utf_8_sig 的区别。
7.2.7. encodings.utf_8_sig — UTF-8 codec with BOM signature
This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped.
如何检索数据
def data_review_func():
# start data review
print_green_msg('start data review ... ...')
# 检测结果
bRet = True
##############################################################
# 建立Quest字典
##############################################################
try:
fd = codecs.open('./data/Quest.csv','r','utf_8_sig')
except Exception as err:
print_red_msg(str(err))
return
try:
for line in fd:
szQuestID,szData = line.split(',',1)
key_ID = int(szQuestID)
if key_ID <= 0:
print_red_msg('Quest.csv: invaild index for -1')
continue
if key_ID in Quest_Dict:
print_yellow_msg('Quest.csv: {0} is repeat'.format(key_ID))
else:
Quest_Dict[key_ID] = szData
except Exception as err:
print_red_msg('Quest.csv: ' + str(err))
bRet = False
finally:
fd.close()
# 在RandQuest.csv中检索任务ID的有效性
try:
fd = codecs.open('./data/RandQuest.csv','r','utf_8_sig')
except Exception as err:
print_red_msg(str(err))
return
try:
for line in fd:
szDataList = line.split(',',-1)
for index in range(4,20,2):
if int(szDataList[index]) <= 0:
continue
if not int(szDataList[index]) in Quest_Dict:
print_red_msg('RandQuest.csv: Quest ID {0} is not found in Quest.csv'.format(szDataList[index]))
bRet = False
except Exception as err:
print_red_msg('RandQuest.csv: ' + str(err))
bRet = False
finally:
fd.close()
# check over
if bRet:
print_green_msg('everything is OK !')
else:
print_red_msg('there is something wrong !')
return
相关语法和函数说明参照
python官方文档
python基础教程
在data_review.py末尾添加执行函数
# 检查数据表
data_review_func()
window环境 可以新建一个bat脚本用于执行python命令,bat脚本内容如下
python data_review.py
pause
双击data_review.bat即开始执行data_review.py脚本
ps: 当输出的log很多时,终端无法一次性显示完全时,可以用管道符一页一页显示,具体做法:修改data_review.bat
python data_review.py | more
pause
输出结果按space键翻页 但log的文本颜色会被重置 (⊙o⊙)…
也可以将输出的log重定向到文件中,如:
python data_review.py >data_review_ret.txt
@echo off
echo.
echo 检查结果已保存在dat_review_ret.txt中
echo.
pause