系统环境/System Environment:
ubuntu16
版本号/Version:Paddle:2.3.1 PaddleOCR:git clone version latest
问题相关组件/Related components:
运行SER+RE联合推理报错
运行指令/Command Code:
export CUDA_VISIBLE_DEVICES=0
python3 tools/infer_vqa_token_ser_re.py -c configs/vqa/re/layoutxlm.yml -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ Global.infer_img=doc/vqa/input/zh_val_21.jpg -c_ser configs/vqa/ser/layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
完整报错/Complete Error Message:
- Traceback (most recent call last):
File "tools/infer_vqa_token_ser_re.py", line 193, in
result = ser_re_engine(img_path)
File "tools/infer_vqa_token_ser_re.py", line 135, in __call__
ser_results, ser_inputs = self.ser_engine(img_path)
File "/data1/liushu/test_PPOCR/PaddleOCR/tools/infer_vqa_token_ser.py", line 98, in __call__
batch = transform(data, self.ops)
File "/data1/liushu/test_PPOCR/PaddleOCR/ppocr/data/imaug/__init__.py", line 51, in transform
data = op(data)
File "/data1/liushu/test_PPOCR/PaddleOCR/ppocr/data/imaug/label_ops.py", line 885, in __call__
ocr_info = self._load_ocr_info(data)
File "/data1/liushu/test_PPOCR/PaddleOCR/ppocr/data/imaug/label_ops.py", line 988, in _load_ocr_info
ocr_result = self.ocr_engine.ocr(data['image'], cls=False)
File "/data1/liushu/test_PPOCR/PaddleOCR/paddleocr.py", line 480, in ocr
dt_boxes, rec_res = self.__call__(img, cls)
File "/data1/liushu/test_PPOCR/PaddleOCR/tools/infer/predict_system.py", line 69, in __call__
dt_boxes, elapse = self.text_detector(img)
File "/data1/liushu/test_PPOCR/PaddleOCR/tools/infer/predict_det.py", line 218, in __call__
self.predictor.run()
OSError: In user code:
File "tools/export_model.py", line 172, in
main()
File "tools/export_model.py", line 165, in main
sub_model_save_path, logger)
File "tools/export_model.py", line 99, in export_single_model
paddle.jit.save(model, save_path)
File "", line 2, in save
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 744, in save
inner_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 517, in concrete_program_specify_input_spec
*desired_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 427, in get_concrete_program
concrete_program, partial_program_layer = self._program_cache[cache_key]
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 723, in __getitem__
self._caches[item] = self._build_once(item)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 714, in _build_once
**cache_key.kwargs)
File "", line 2, in from_func_spec
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 662, in from_func_spec
outputs = static_func(*inputs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 79, in forward
x = self.backbone(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 146, in forward
x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 179, in forward
x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 677, in forward
use_cudnn=self._use_cudnn)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/functional/conv.py", line 148, in _conv_nd
type=op_type, inputs=inputs, outputs=outputs, attrs=attrs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3184, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2224, in __init__
for frame in traceback.extract_stack():
ExternalError: CUDNN error(4), CUDNN_STATUS_INTERNAL_ERROR.
[Hint: 'CUDNN_STATUS_INTERNAL_ERROR'. An internal cuDNN operation failed. ] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:211)
[operator < conv2d_fusion > error]
之前并不存在这个情况,只运行OCR+SER推理运行正常,但运行OCR+SER+RE联合推理,就报错。
别看排查问题步骤写的很简单,但是花费了3个小时进行解决。太让人泪目了。
不过,这次问题解决也让我明白了,报错无非是由三个方面出现的,1)逻辑错误、矩阵运算错误,2)版本(环境配置),3)计算资源
当然只是简单的划分,其实每一类都存细小的分类。
后面,可以按照这个思路总结一下,自己遇到问题的类别,这样bug就会越来越少了,嘻嘻嘻嘻。