Using CUDA device0 _CudaDeviceProperties(name='NVIDIA GeForce RTX 3080', total_memory=10017MB)
Find Pytorch weight
Traceback (most recent call last):
File "export.py", line 243, in <module>
ckpt = torch.load(opt.weight, map_location=device)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 592, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 851, in _load
result = unpickler.load()
ModuleNotFoundError: No module named 'models'
直接先用yolov5自带的export.py转成.onnx模型,再通过onnx转trt,问题解决
Find ONNX weight
TensorRT: starting export with TensorRT 8.4.0.6...
[08/24/2023-18:57:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +359, GPU +0, now: CPU 426, GPU 401 (MiB)
[08/24/2023-18:57:26] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 444 MiB, GPU 401 MiB
[08/24/2023-18:57:27] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 819 MiB, GPU 523 MiB
[08/24/2023-18:57:27] [TRT] [I] ----------------------------------------------------------------
[08/24/2023-18:57:27] [TRT] [I] Input filename: ../best.onnx
[08/24/2023-18:57:27] [TRT] [I] ONNX IR version: 0.0.6
[08/24/2023-18:57:27] [TRT] [I] Opset version: 11
[08/24/2023-18:57:27] [TRT] [I] Producer name: pytorch
[08/24/2023-18:57:27] [TRT] [I] Producer version: 1.9
[08/24/2023-18:57:27] [TRT] [I] Domain:
[08/24/2023-18:57:27] [TRT] [I] Model version: 0
[08/24/2023-18:57:27] [TRT] [I] Doc string:
[08/24/2023-18:57:27] [TRT] [I] ----------------------------------------------------------------
[08/24/2023-18:57:27] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
TensorRT: Network Description:
TensorRT: input "images" with shape (1, 3, 640, 640) and dtype DataType.FLOAT
TensorRT: output "output" with shape (1, 25200, 20) and dtype DataType.FLOAT
TensorRT: building FP16 engine in ../best.engine
[08/24/2023-18:57:29] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.3.0
[08/24/2023-18:57:29] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +637, GPU +268, now: CPU 1545, GPU 791 (MiB)
[08/24/2023-18:57:29] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +356, GPU +258, now: CPU 1901, GPU 1049 (MiB)
[08/24/2023-18:57:29] [TRT] [W] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.0.5
[08/24/2023-18:57:29] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/24/2023-18:58:37] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[08/24/2023-19:06:05] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[08/24/2023-19:06:08] [TRT] [I] Total Host Persistent Memory: 218880
[08/24/2023-19:06:08] [TRT] [I] Total Device Persistent Memory: 1197056
[08/24/2023-19:06:08] [TRT] [I] Total Scratch Memory: 0
[08/24/2023-19:06:08] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 2470 MiB
[08/24/2023-19:06:08] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 29.1457ms to assign 9 blocks to 142 nodes requiring 25804804 bytes.
[08/24/2023-19:06:08] [TRT] [I] Total Activation Memory: 25804804
[08/24/2023-19:06:08] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +40, GPU +42, now: CPU 40, GPU 42 (MiB)
export.py:172: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography.fernet import Fernet
TensorRT: export success, saved as ../best.engine
网上查了一下,主要原因是在保存训练的模型时,使用的torch.save(model, path),而在加载时使用的model = torch.load(path);export.py中对pt的加载源码如下:
if pt:
logger.info("Find Pytorch weight")
ckpt = torch.load(opt.weight, map_location=device)
if opt.noema:
model = ckpt['model']
else:
model = ckpt['ema'] if ckpt.get('ema') else ckpt['model']
meta = get_meta_data(ckpt, model, meta)
if opt.int8:
zero_scale_fix(model, device)
if model.__name__ != "EfficentYolo":
for sub_fusion_list in op_concat_fusion_list[model.__name__]:
ops = [get_module(model, op_name) for op_name in sub_fusion_list]
concat_quant_amax_fuse(ops)
for sub_fusion_list in op_concat_fusion_list[model.type]:
ops = [get_module(model, op_name) for op_name in sub_fusion_list]
concat_quant_amax_fuse(ops)
model.float()
if not opt.int8:
model.fuse()
model.to(device)
model.eval()
if opt.int8:
quant_nn.TensorQuantizer.use_fb_fake_quant = True
im = torch.zeros(1, 3, *imgsz).to(device)
# 模型detect layer为了支持onnx的导出,所必须的更改
# model.detect.inplace = False
if not(hasattr(model, 'type') and model.type in ['anchorfree', 'anchorbase']):
model.type = 'anchorbase'
model.detect.dynamic = dynamic
model.detect.export = True # 减少输出数量
# 验证torch模型是否正常
for _ in range(2):
y = model(im) # dry runs
# 从模型中读取模型的labels,并保存到labels.txt下
labels = str({i:l for i,l in enumerate(model.labels)})
with open(file.parents[0]/'labels.txt','w') as f:
f.write(labels)
logger.info("the torch model is very successful, it's no possible!")
if 'onnx' in opt.include or 'trt' in opt.include:
try:
import tensorrt as trt
if model.type == 'anchorfree':
export_onnx(model, im, file, opt.opset, train=False, dynamic=False, simple=opt.simple)
elif model.type == 'anchorbase':
if int(trt.__version__[0]) == 7: # TensorRT 7 handling https://github.com/ultralytics/yolov5/issues/6012
model.detect.inplace = False
grid = model.detect.anchor_grid
model.detect.anchor_grid = [a[..., :1, :1, :] for a in grid]
export_onnx(model, im, file, opt.opset, train=False, dynamic=False, simple=opt.simple) # opset 12
model.detect.anchor_grid = grid
else: # TensorRT >= 8
export_onnx(model, im, file, opt.opset, train=False, dynamic=False, simple=opt.simple) # opset 13
except:
logger.info("TRT ERROR, will custom onnx!")
export_onnx(model, im, file, opt.opset, train=False, dynamic=False, simple=opt.simple)
onnx_file = file.with_suffix('.onnx')
add_meta_to_model(onnx_file, meta)
if opt.int8:
get_remove_qdq_onnx_and_cache(file.with_suffix('.onnx'))
add_meta_to_model(str(onnx_file).replace('.onnx', '_wo_qdq.onnx'), meta)
if 'trt' in opt.include:
if opt.old:
meta = False
export_engine(onnx_file, None, meta=meta, half=opt.half, int8=opt.int8, workspace=opt.worker, encode=opt.encode, verbose=opt.verbose)
else:
logger.info("Find ONNX weight")
if not opt.old:
meta = get_meta_data(file, None, meta)
meta['half'] = opt.half
meta['int8'] = opt.int8
meta['encode'] = opt.encode
if opt.old:
meta = False
猜测可能是模型在训练时,保存了一些其他参数信息,这些参数可能涉及到训练模型的位置等,模型迁移到其他机器上时,比如需要使用的机器上转trt时,找不到该位置了,可以先转成通用的onnx模型,再转trt。