分两步转是考虑了如下应用场景:
需要把模型部署到内存有限的嵌入式板
(1) pth转wts
参考如下代码
import torch
from torch import nn
#load你的模型
import os
import struct
def main():
net = torch.load('XXX.pth') #loadpth文件
net = net.to('cuda:0')
net.eval()
f = open("XXX.wts", 'w') #自己命名wts文件
f.write("{}\n".format(len(net.state_dict().keys()))) #保存所有keys的数量
for k,v in net.state_dict().items():
#print('key: ', k)
#print('value: ', v.shape)
vr = v.reshape(-1).cpu().numpy()
f.write("{} {}".format(k, len(vr))) #保存每一层名称和参数长度
for vv in vr:
f.write(" ")
f.write(struct.pack(">f", float(vv)).hex()) #使用struct把权重封装成字符串
f.write("\n")
if __name__ == '__main__':
main()
wts文件是如下格式
第一行:数字
下面分别是网络每一层的名称,参数个数和对应的参数
(2) wts转tensorrt
常用的模型转换可参考链接
但是如果是自定义模型,不在常用模型范围,比如我建了一个几层的小语义分割网络,但是现有的parser不支持,那只能手撕tensorrt的API了
tensorRT的API使用方法可参考链接
简要说下wts转tensorrt的原理
具体的做法:
std::map<std::string, Weights> loadWeights(const std::string file)
{
std::cout << "Loading weights: " << file << std::endl;
std::map<std::string, Weights> weightMap;
// Open weights file
std::ifstream input(file);
assert(input.is_open() && "Unable to load weight file.");
// Read number of weight blobs
int32_t count;
input >> count;
assert(count > 0 && "Invalid weight map file.");
while (count--)
{
Weights wt{DataType::kFLOAT, nullptr, 0};
uint32_t size;
// Read name and type of blob
std::string name;
input >> name >> std::dec >> size;
wt.type = DataType::kFLOAT;
// Load blob
uint32_t* val = reinterpret_cast<uint32_t*>(malloc(sizeof(val) * size));
for (uint32_t x = 0, y = size; x < y; ++x)
{
input >> std::hex >> val[x];
}
wt.values = val;
wt.count = size;
weightMap[name] = wt;
}
return weightMap;
}
INetworkDefinition* network = builder->createNetworkV2(0U);
输入层要定义出来,比如输入size是3xHxW
ITensor* data = network->addInput(INPUT_BLOB_NAME, DataType::kFLOAT, Dims3{3, INPUT_H, INPUT_W});
然后把输入传给conv层
IConvolutionLayer* conv1 = network->addConvolutionNd(*data, 8, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]); //map名称要改
一层一层传下去,直到指定output
deconv3->getOutput(0)->setName(OUTPUT_BLOB_NAME); //设置output名称
network->markOutput(*deconv3->getOutput(0)); //指定output
还要设置一个engine内存空间,太小的话会报错,尤其是报could not find implementation for node的error
builder->setMaxBatchSize(maxBatchSize);
config->setMaxWorkspaceSize(128*(1 << 20));
ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
这里说明一下反卷积层的写法,torch的deconvolution有padding和output_padding,如何设置
padding对应的是setPrePadding
output_padding对应的是setPostPadding
举个例子,反卷积层,output channel=16,kernel_size=5x5,
stride=2, padding=2, output_padding=1
IDeconvolutionLayer* deconv1 = network->addDeconvolutionNd(*relu3->getOutput(0), 16, DimsHW{5,5}, weightMap["deconv1.weight"], weightMap["deconv1.bias"]);
deconv1->setStrideNd(DimsHW{2, 2});
deconv1->setPrePadding(DimsHW{2, 2});
deconv1->setPostPadding(DimsHW{1, 1});
具体cpp和CMakeList.txt放在github上