Python-re-正则化-识别特殊字符(

巨坑: python re 正则化

需求:

I0616 02:07:06.236598 12705 solver.cpp:347] Iteration 58000, Testing net (#0)
I0616 02:07:07.477146 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:07:09.770082 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:08:18.208068 12715 data_layer.cpp:73] Restarting data prefetching from start.
I0616 02:08:19.506160 12705 solver.cpp:414]     Test net output #0: accuracy_top1 = 0.900094
I0616 02:08:19.506186 12705 solver.cpp:414]     Test net output #1: loss = 0.324082 (* 1 = 0.324082 loss)

拿到: iteration和ac

with open(log_file, 'r') as log_file2:
        log = log_file2.read()

accuracy_pattern = r"Iteration (?P\d+), Testing net \(#0\)[\d\D]*?Test net output #0: accuracy_top1 = (?P[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)"
for r in re.findall(accuracy_pattern, log):
        iteration = int(r[0])
        print iteration
        accuracy = float(r[1]) * 100
        print accuracy

坑点有两个:

1. 匹配中间所有不需要的字符

使用     [\d\D]*?     做数字+非数字的合集的0到任意次的最短匹配

2. 匹配Testing net (#0)失败

这个是大坑, 因为()会被分析为有语义的(我猜测是优先运算),导致了匹配的字符少了(

解决方法:

1. .#0. 替代(

2. 使用\(去匹配(

你可能感兴趣的:(Python)