今年年初的时候曾经玩了一阵openvino yolov5量化,后来找到了这个github的大神教程完美解决GitHub - Chen-MingChang/pytorch_YOLO_OpenVINO_demoContribute to Chen-MingChang/pytorch_YOLO_OpenVINO_demo development by creating an account on GitHub.https://github.com/Chen-MingChang/pytorch_YOLO_OpenVINO_demo但是当时有几个问题不是很完美
最近OpenVINO 2021r4.1发布了,再回过头来看看2021r4.1版本的yolov5支持
安装windows版的openvino 2021.4.1 LTS, 按照官网的教程安装ac和pot的依赖项。
这里完全按照大神 https://github.com/Chen-MingChang/pytorch_YOLO_OpenVINO_demo里面的步骤走。
$ python3 models/export.py --weights yolov5l.pt --img-size 640
C:\temp\yolov5_ac_ov2021_4>python "c:\Program Files (x86)\intel\openvino_2021\deployment_tools\model_optimizer\mo.py" --input_model yolov5l_v4.onnx -s 255 --reverse_input_channels --output Conv_403,Conv_419,Conv_435
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: C:\temp\yolov5_ac_ov2021_4\yolov5l_v4.onnx
- Path for generated IR: C:\temp\yolov5_ac_ov2021_4\.
- IR output name: yolov5l_v4
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Conv_403,Conv_419,Conv_435
- Input shapes: Not specified, inherited from the model
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: 255.0
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: None
- Reverse input channels: True
ONNX specific parameters:
- Inference Engine found in: c:\Program Files (x86)\Intel\openvino_2021\python\python3.7\openvino
Inference Engine version: 2021.4.1-3926-14e67d86634-releases/2021/4
Model Optimizer version: 2021.4.1-3926-14e67d86634-releases/2021/4
[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: C:\temp\yolov5_ac_ov2021_4\yolov5l_v4.xml
[ SUCCESS ] BIN file: C:\temp\yolov5_ac_ov2021_4\yolov5l_v4.bin
[ SUCCESS ] Total execution time: 14.90 seconds.
准备yolov5_640_ac.yml配置文件
models:
- name: yolo-v5
launchers:
- framework: dlsdk
model: yolov5l_v4.xml
weights: yolov5l_v4.bin
adapter:
type: yolo_v3
anchors: "10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326"
num: 9
coords: 4
classes: 80
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2], ]
outputs:
- Conv_435
- Conv_419
- Conv_403
datasets:
- name: ms_coco_detection_80_class_without_background
data_source: val2017
annotation_conversion:
converter: mscoco_detection
annotation_file: instances_val2017.json
has_background: False
sort_annotations: True
use_full_label_map: False
annotation: mscoco_det_80.pickle
dataset_meta: mscoco_det_80.json
preprocessing:
- type: resize
size: 640
postprocessing:
- type: resize_prediction_boxes
- type: filter
apply_to: prediction
min_confidence: 0.001
remove_filtered: True
- type: nms
overlap: 0.5
- type: clip_boxes
apply_to: prediction
metrics:
- type: map
integral: 11point
ignore_difficult: true
presenter: print_scalar
- type: coco_precision
max_detections: 100
threshold: 0.5
运行ac命令
C:\temp\yolov5_ac_ov2021_4>accuracy_check -c yolov5_640_ac.yml -s ./ -td CPU
09:01:27 accuracy_checker WARNING: c:\Program Files (x86)\Intel\openvino_2021\python\python3.7\ngraph\utils\types.py:25: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
(NgraphType.boolean, np.bool),
Processing info:
model: yolo-v5
launcher: dlsdk
device: CPU
dataset: ms_coco_detection_80_class_without_background
OpenCV version: 4.5.2
Annotation for ms_coco_detection_80_class_without_background dataset will be loaded from mscoco_det_80.pickle
Loaded dataset info:
Dataset name: ms_coco_detection_80_class_without_background_1
Accuracy Checker version 0.8.7
Dataset size 256
Conversion parameters:
converter: mscoco_detection
annotation_file: PATH/instances_val2017.json
has_background: False
sort_annotations: True
use_full_label_map: False
ms_coco_detection_80_class_without_background dataset metadata will be loaded from mscoco_det_80.json
IE version: 2021.4.1-3926-14e67d86634-releases/2021/4
Loaded CPU plugin version:
CPU - MKLDNNPlugin: 2.1.2021.4.1-3926-14e67d86634-releases/2021/4
Found model yolov5l_v4.xml
Found weights yolov5l_v4.bin
Input info:
Layer name: images
precision: FP32
shape [1, 3, 640, 640]
Output info
Layer name: Conv_403
precision: FP32
shape: [1, 255, 80, 80]
Layer name: Conv_419
precision: FP32
shape: [1, 255, 40, 40]
Layer name: Conv_435
precision: FP32
shape: [1, 255, 20, 20]
09:01:28 accuracy_checker WARNING: c:\users\intel\anaconda3\lib\site-packages\accuracy_checker-0.8.7-py3.7.egg\accuracy_checker\metrics\metric_executor.py:168: DeprecationWarning: threshold option is deprecated. Please use abs_threshold instead
warnings.warn('threshold option is deprecated. Please use abs_threshold instead', DeprecationWarning)
256 objects processed in 133.234 seconds
map: 27.70%
coco_precision: 31.18%
这个精度有问题啊,用GitHub - Chen-MingChang/pytorch_YOLO_OpenVINO_demo的代码复现的精度有60%多呢... 但是看网上很多人已经用openvino+yolov5做推理了,精度应该没有大问题,还是accuracy checker自己有问题的可能性比较大
准备pot配置json文件, 量化方法选择了最简单的DefaultQuantization, 这种方法在量化时不做精度检测对比,只是把所有能转成int8的fp32操作全部转成int8类型
准备yolov5l_v4_int8_simple_mode.json配置文件
{
"model": {
"model_name": "yolov5l_v4_int8_cpu",
"model": "yolov5l_v4_640.xml",
"weights": "yolov5l_v4_640.bin"
},
"engine": {
"type": "simplified",
// you can specify path to directory with images or video file
// also you can specify template for file names to filter images to load
// templates are unix style
"data_source": "val2017"
},
"compression": {
"target_device": "CPU",
"algorithms": [
{
"name": "DefaultQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 128
}
}
]
}
}
运行pot命令量化
C:\temp\yolov5_ac_ov2021_4>pot -c yolov5l_v4_int8_simple_mode.json
10:12:41 accuracy_checker WARNING: c:\users\intel\appdata\roaming\python\python37\site-packages\defusedxml\__init__.py:30: DeprecationWarning: defusedxml.cElementTree is deprecated, import from defusedxml.ElementTree instead.
from . import cElementTree
10:12:41 accuracy_checker WARNING: c:\users\intel\anaconda3\lib\site-packages\pot-1.0-py3.7.egg\compression\algorithms\quantization\optimization\algorithm.py:39: UserWarning: Nevergrad package could not be imported. If you are planning to useany hyperparameter optimization algo, consider installing itusing pip. This implies advanced usage of the tool.Note that nevergrad is compatible only with Python 3.6+
'Nevergrad package could not be imported. If you are planning to use'
10:12:41 accuracy_checker WARNING: c:\users\intel\anaconda3\lib\site-packages\past\builtins\misc.py:4: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping
INFO:app.run:Output log dir: ./results\yolov5l_v4_int8_gpu_DefaultQuantization\2021-09-16_10-12-41
INFO:app.run:Creating pipeline:
Algorithm: DefaultQuantization
Parameters:
preset : performance
stat_subset_size : 128
target_device : CPU
model_type : None
dump_intermediate_model : False
exec_log_dir : ./results\yolov5l_v4_int8_gpu_DefaultQuantization\2021-09-16_10-12-41
===========================================================================
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithm : ActivationChannelAlignment
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
===========================================================================
C:\temp\yolov5_ac_ov2021_4>accuracy_check -c yolov5_int8_640_ac.yml -s ./ -td CPU
10:17:13 accuracy_checker WARNING: c:\Program Files (x86)\Intel\openvino_2021\python\python3.7\ngraph\utils\types.py:25: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
(NgraphType.boolean, np.bool),
Processing info:
model: yolo-v5-int8
launcher: dlsdk
device: CPU
dataset: ms_coco_detection_80_class_without_background
OpenCV version: 4.5.2
Annotation for ms_coco_detection_80_class_without_background dataset will be loaded from mscoco_det_80.pickle
Loaded dataset info:
Dataset name: ms_coco_detection_80_class_without_background_1
Accuracy Checker version 0.8.7
Dataset size 256
Conversion parameters:
converter: mscoco_detection
annotation_file: PATH/instances_val2017.json
has_background: False
sort_annotations: True
use_full_label_map: False
ms_coco_detection_80_class_without_background dataset metadata will be loaded from mscoco_det_80.json
IE version: 2021.4.1-3926-14e67d86634-releases/2021/4
Loaded CPU plugin version:
CPU - MKLDNNPlugin: 2.1.2021.4.1-3926-14e67d86634-releases/2021/4
Found model yolov5l_v4_int8_cpu.xml
Found weights yolov5l_v4_int8_cpu.bin
Input info:
Layer name: images
precision: FP32
shape [1, 3, 640, 640]
Output info
Layer name: Conv_403
precision: FP32
shape: [1, 255, 80, 80]
Layer name: Conv_419
precision: FP32
shape: [1, 255, 40, 40]
Layer name: Conv_435
precision: FP32
shape: [1, 255, 20, 20]
10:17:13 accuracy_checker WARNING: c:\users\intel\anaconda3\lib\site-packages\accuracy_checker-0.8.7-py3.7.egg\accuracy_checker\metrics\metric_executor.py:168: DeprecationWarning: threshold option is deprecated. Please use abs_threshold instead
warnings.warn('threshold option is deprecated. Please use abs_threshold instead', DeprecationWarning)
256 objects processed in 106.666 seconds
map: 28.67%
coco_precision: 33.07%
虽说精度不对,但是看上去int8模型的精度竟然比fp32的精度还好一点,这个有点意思
这部分相对就比较简单了,主要参考了大神的项目
c++实现yolov5的OpenVINO部署
代码的工作就是把OpenVINO推理输出的3个输出层的结果(窗口的座标,分类label和信心度)做一些处理,先筛掉信心度比较低的检测窗,再把同一物体上不同大小的检测窗再筛一遍,保留信息度最大的框
具体代码就不贴了,整个项目奉上 yolov5-ov2021: 基于c++/openvino 2021r4的yolov5推理实现
运行FP32模型,GPU推理
运行INT8模型,GPU推理
用OpenVINO2021r4自带的benchmarking工具测试
9900K 8核16线程
C:\Users\test\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m yolov5l_v4_640.xml -nireq 1 -nstreams 1 -b 1 -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
IE version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[ INFO ] Device info:
CPU
MKLDNNPlugin version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 151.36 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size was changed to: 1
[Step 6/11] Configuring input of the model
Network inputs:
images : U8 / NCHW
Network outputs:
Conv_403 : FP32 / NCHW
Conv_419 : FP32 / NCHW
Conv_435 : FP32 / NCHW
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 195.02 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'images' precision U8, dimensions (NCHW): 1 3 640 640
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'images' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[ INFO ] First inference took 182.24 ms
[Step 11/11] Dumping statistics report
Count: 337 iterations
Duration: 60327.10 ms
Latency: 177.38 ms
Throughput: 5.59 FPS
C:\Users\test\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m yolov5l_v4_640_int8_cpu.xml -nireq 1 -nstreams 1 -b 1 -d CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
IE version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[ INFO ] Device info:
CPU
MKLDNNPlugin version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 82.20 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size was changed to: 1
[Step 6/11] Configuring input of the model
Network inputs:
images : U8 / NCHW
Network outputs:
Conv_403 : FP32 / NCHW
Conv_419 : FP32 / NCHW
Conv_435 : FP32 / NCHW
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 409.73 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'images' precision U8, dimensions (NCHW): 1 3 640 640
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'images' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[ INFO ] First inference took 82.14 ms
[Step 11/11] Dumping statistics report
Count: 804 iterations
Duration: 60135.29 ms
Latency: 74.23 ms
Throughput: 13.37 FPS
TGL 集显Gen12 96EU
C:\Users\test\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m yolov5l_v4_640.xml -nireq 1 -nstreams 1 -b 1 -d GPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
IE version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[ INFO ] Device info:
GPU
clDNNPlugin version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 137.94 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size was changed to: 1
[Step 6/11] Configuring input of the model
Network inputs:
images : U8 / NCHW
Network outputs:
Conv_403 : FP32 / NCHW
Conv_419 : FP32 / NCHW
Conv_435 : FP32 / NCHW
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 57321.88 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'images' precision U8, dimensions (NCHW): 1 3 640 640
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'images' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests using 1 streams for GPU, limits: 60000 ms duration)
[ INFO ] First inference took 89.20 ms
[Step 11/11] Dumping statistics report
Count: 593 iterations
Duration: 60131.64 ms
Latency: 113.72 ms
Throughput: 9.86 FPS
C:\Users\test\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m yolov5l_v4_640_int8_cpu.xml -nireq 1 -nstreams 1 -b 1 -d GPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
IE version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[ INFO ] Device info:
GPU
clDNNPlugin version ......... 2021.4.1
Build ........... 2021.4.1-3926-14e67d86634-releases/2021/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 65.74 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size was changed to: 1
[Step 6/11] Configuring input of the model
Network inputs:
images : U8 / NCHW
Network outputs:
Conv_403 : FP32 / NCHW
Conv_419 : FP32 / NCHW
Conv_435 : FP32 / NCHW
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 61720.85 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'images' precision U8, dimensions (NCHW): 1 3 640 640
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'images' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests using 1 streams for GPU, limits: 60000 ms duration)
[ INFO ] First inference took 34.45 ms
[Step 11/11] Dumping statistics report
Count: 1856 iterations
Duration: 60050.95 ms
Latency: 31.02 ms
Throughput: 30.91 FPS
CPU INT8/FP32 = 13.37/5.59 = 2.4X
GPU INT8/FP32 = 30.91/9.86 = 3.13X
GPU/CPU FP32 = 9.86/5.59 = 1.76X
GPU/CPU INT8 = 30.91/13.37 = 2.31X
性能提升非常不错,在Intel集显上Yolov5l的V4版本终于到30FPS了 :)
最后照例分享一下踩过的小坑