使用OpenVINO模型优化器将YOLOv5模型转换为OpenVINO IR格式,以便在Intel硬件上进行推理。
下载yolov5代码 ultralytics/yolov5
python export.py --weights yolov5s.pt --include onnx
导出模型为onnx模型,接着使用mo导出openvino fp32和fp16模型
import nncf
from openvino.tools import mo
from openvino.runtime import serialize
MODEL_NAME = "yolov5s"
MODEL_PATH = f"weights/yolov5"
onnx_path = f"{MODEL_PATH}/{MODEL_NAME}.onnx"
# fp32 IR model
fp32_path = f"{MODEL_PATH}/FP32_openvino_model/{MODEL_NAME}_fp32.xml"
print(f"Export ONNX to OpenVINO FP32 IR to: {fp32_path}")
model = mo.convert_model(onnx_path)
serialize(model, fp32_path)
# fp16 IR model
fp16_path = f"{MODEL_PATH}/FP16_openvino_model/{MODEL_NAME}_fp16.xml"
print(f"Export ONNX to OpenVINO FP16 IR to: {fp16_path}")
model = mo.convert_model(onnx_path, compress_to_fp16=True)
serialize(model, fp16_path)
将训练数据集准备成可用于量化的格式。
from openvino.yolov5_dataloader import create_dataloader
from openvino.yolov5_general import check_dataset
DATASET_CONFIG = "./data/coco128.yaml"
def create_data_source():
"""
Creates COCO 2017 validation data loader. The method downloads COCO 2017
dataset if it does not exist.
"""
data = check_dataset(DATASET_CONFIG)
val_dataloader = create_dataloader(
data["val"], imgsz=640, batch_size=1, stride=32, pad=0.5, workers=1
)[0]
return val_dataloader
data_source = create_data_source()
# Define the transformation method. This method should take a data item returned
# per iteration through the `data_source` object and transform it into the model's
# expected input that can be used for the model inference.
def transform_fn(data_item):
# unpack input images tensor
images = data_item[0]
# convert input tensor into float format
images = images.float()
# scale input
images = images / 255
# convert torch tensor to numpy array
images = images.cpu().detach().numpy()
return images
# Wrap framework-specific data source into the `nncf.Dataset` object.
nncf_calibration_dataset = nncf.Dataset(data_source, transform_fn)
配置量化管道,例如选择适当的量化算法和设置目标精度。
在NNCF中,后训练量化管道由nncf.quantize函数(用于DefaultQuantization算法)和nncf.quantize_with_accuracy_control函数(用于AccuracyAwareQuantization算法)表示。量化参数preset,
model_type,
subset_size,
fast_bias_correction,
ignored_scope是函数参数。
subset_size = 300
preset = nncf.QuantizationPreset.MIXED
对模型进行针对Intel硬件推理的优化,例如应用后训练量化或修剪技术。
from openvino.runtime import Core
from openvino.runtime import serialize
core = Core()
ov_model = core.read_model(fp32_path)
quantized_model = nncf.quantize(
ov_model, nncf_calibration_dataset, preset=preset, subset_size=subset_size
)
nncf_int8_path = f"{MODEL_PATH}/NNCF_INT8_openvino_model/{MODEL_NAME}_int8.xml"
serialize(quantized_model, nncf_int8_path)
在验证数据集上比较FP32和INT8模型的准确性,以确定是否存在由于量化而导致的准确性损失。
from pathlib import Path
from yolov5_val import run as validation_fn
print("Checking the accuracy of the original model:")
fp32_metrics = validation_fn(
data=DATASET_CONFIG,
weights=Path(fp32_path).parent,
batch_size=1,
workers=1,
plots=False,
device="cpu",
iou_thres=0.65,
)
fp32_ap5 = fp32_metrics[0][2]
fp32_ap_full = fp32_metrics[0][3]
print(f"[email protected] = {fp32_ap5}")
print(f"[email protected]:.95 = {fp32_ap_full}")
print("Checking the accuracy of the FP16 model:")
fp16_metrics = validation_fn(
data=DATASET_CONFIG,
weights=Path(fp16_path).parent,
batch_size=1,
workers=1,
plots=False,
device="cpu",
iou_thres=0.65,
)
fp16_ap5 = fp16_metrics[0][2]
fp16_ap_full = fp16_metrics[0][3]
print(f"[email protected] = {fp16_ap5}")
print(f"[email protected]:.95 = {fp16_ap_full}")
print("Checking the accuracy of the NNCF int8 model:")
int8_metrics = validation_fn(
data=DATASET_CONFIG,
weights=Path(nncf_int8_path).parent,
batch_size=1,
workers=1,
plots=False,
device="cpu",
iou_thres=0.65,
)
nncf_int8_ap5 = int8_metrics[0][2]
nncf_int8_ap_full = int8_metrics[0][3]
print(f"[email protected] = {nncf_int8_ap5}")
print(f"[email protected]:.95 = {nncf_int8_ap_full}")
输出:
Checking the accuracy of the original model:
[email protected] = 0.7064319945599192
[email protected]:.95 = 0.4716138340017886
Checking the accuracy of the FP16 model:
[email protected] = 0.7064771913549115
[email protected]:.95 = 0.47165677301239517
Checking the accuracy of the NNCF int8 model:
[email protected] = 0.6900523281577972
[email protected]:.95 = 0.45860702355897537
比较FP32, FP16和INT8模型的性能,例如测量推理时间和内存使用情况。
benchmark_app -m weights/yolov5/FP32_openvino_model/yolov5s_fp32.xml -d CPU -api async -t 15
benchmark_app -m weights/yolov5/FP16_openvino_model/yolov5s_fp16.xml -d CPU -api async -t 15
benchmark_app -m weights/yolov5/FP32_openvino_model/yolov5s_fp32.xml -d CPU -api async -t 15
输出:
Inference FP32 model (OpenVINO IR) on CPU:
[Step 11/11] Dumping statistics report
[ INFO ] Count: 2504 iterations
[ INFO ] Duration: 15067.63 ms
[ INFO ] Latency:
[ INFO ] Median: 47.65 ms
[ INFO ] Average: 47.99 ms
[ INFO ] Min: 40.73 ms
[ INFO ] Max: 74.31 ms
[ INFO ] Throughput: 166.18 FPS
Inference FP16 model (OpenVINO IR) on CPU:
[Step 11/11] Dumping statistics report
[ INFO ] Count: 2536 iterations
[ INFO ] Duration: 15069.53 ms
[ INFO ] Latency:
[ INFO ] Median: 47.11 ms
[ INFO ] Average: 47.38 ms
[ INFO ] Min: 38.03 ms
[ INFO ] Max: 65.95 ms
[ INFO ] Throughput: 168.29 FPS
Inference NNCF INT8 model (OpenVINO IR) on CPU:
[Step 11/11] Dumping statistics report
[ INFO ] Count: 7872 iterations
[ INFO ] Duration: 15113.06 ms
[ INFO ] Latency:
[ INFO ] Median: 61.17 ms
[ INFO ] Average: 61.23 ms
[ INFO ] Min: 52.75 ms
[ INFO ] Max: 93.93 ms
[ INFO ] Throughput: 520.87 FPS