巨坑: python re 正则化
需求:
从
I0616 02:07:06.236598 12705 solver.cpp:347] Iteration 58000, Testing net (#0)
I0616 02:07:07.477146 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:07:09.770082 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:08:18.208068 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:08:19.506160 12705 solver.cpp:414] Test net output #0: accuracy_top1 = 0.900094
I0616 02:08:19.506186 12705 solver.cpp:414] Test net output #1: loss = 0.324082 (* 1 = 0.324082 loss)
拿到: iteration和ac
with open(log_file, 'r') as log_file2:
log = log_file2.read()
accuracy_pattern = r"Iteration (?P\d+), Testing net \(#0\)[\d\D]*?Test net output #0: accuracy_top1 = (?P[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)"
for r in re.findall(accuracy_pattern, log):
iteration = int(r[0])
print iteration
accuracy = float(r[1]) * 100
print accuracy
坑点有两个:
1. 匹配中间所有不需要的字符
使用 [\d\D]*? 做数字+非数字的合集的0到任意次的最短匹配
2. 匹配Testing net (#0)失败
这个是大坑, 因为()会被分析为有语义的(我猜测是优先运算),导致了匹配的字符少了(
解决方法:
1. .#0. 替代(
2. 使用\(去匹配(