为什么要使用TorchScript对模型进行转换?
a)、TorchScript代码可以在它自己的解释器中调用,它本质上是一个受限的Python解释器。这个解释器不获取全局解释器锁,因此可以在同一个实例上同时处理多个请求。
b)、这种格式允许我们将整个模型保存到磁盘上,并将其加载到另一个环境中,比如用Python以外的语言编写的服务器中
c)、TorchScript提供了一种表示方式,我们可以在其中对代码进行编译器优化,以提供更有效的执行
d)、TorchScript允许我们与许多后端/设备运行时进行交互
1、trace方法转换模型
trace方法首先使用输入数据执行一遍模型,并记录下模型执行过程中的参数,并创建一个torch.jit.ScriptModule实例。trace方法转换模型示例:
import torch
import numpy as np
class MyCell_v1(torch.nn.Module):
def __init__(self):
super(MyCell_v1, self).__init__()
self.linear = torch.nn.Linear(4, 4)
def forward(self, x, h):
new_h = torch.tanh(self.linear(x) + h)
return new_h, new_h
my_cell_v1 = MyCell_v1()
x = torch.ones(3, 4)
h = torch.ones(3, 4)
# 对模型对象进使用trace方法进行转换
traced_cell_1 = torch.jit.trace(my_cell_v1, (x, h))
# 查看转换后的代码
print(traced_cell_1.code)
# 转换后的模型进行推理
print(traced_cell_1(x, h))
# 原始模型进行推理
print(my_cell_v1(x, h))
输出结果如下:
def forward(self,
input: Tensor,
h: Tensor) -> Tuple[Tensor, Tensor]:
_0 = torch.add((self.linear).forward(input, ), h, alpha=1)
_1 = torch.tanh(_0)
return (_1, _1)
(tensor([[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349]], grad_fn=), tensor([[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349]], grad_fn=))
(tensor([[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349]], grad_fn=), tensor([[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349],
[ 0.3909, 0.9382, -0.1499, 0.8349]], grad_fn=))
trace弊端:由于要执行一遍模型,当模型中存在循环或者if语句时,不能覆盖所有的模型代码分支。
2、script方法转换模型
a)、trace方法的不足之处分析(Module中包含分支语句)
class MyDecisionGate(torch.nn.Module):
def forward(self, x):
if x.sum() > 0:
return x
else:
return -x
class MyCell_v2(torch.nn.Module):
def __init__(self, dg):
super(MyCell_v2, self).__init__()
self.dg = dg
self.linear = torch.nn.Linear(4, 4)
def forward(self, x, h):
new_h = torch.tanh(self.linear(self.dg(x)) + h)
return new_h, new_h
dg = MyDecisionGate()
my_cell_v2 = MyCell_v2(dg)
i. 使用Trace方法转换模型,转换后的模型覆盖if分支:
# x 是一个3 x 4的全1单位矩阵,所以 x .sum() > 0 必然成立
x = torch.ones(3, 4)
h = torch.ones(3, 4)
# 对模型进行trace转换,trace 方法把my_cell_v2模型在当前的x和h上执行一遍
# 因为 x .sum() > 0 成立,所以现在MyDecisionGate的forward执行if分支,返回x本身,else分支没有执行
traced_cell_2 = torch.jit.trace(my_cell_v2, (x, h))
# 查看转换后的MyDecisionGate模型对象,转换后的forward里面的if-else不见了,相当于少了一个分支,另一个分支判断肯定会出问题
# 这就是因为trace方法:Tracing does exactly what we said it would: run the code, record the operations that happen and construct a ScriptModule that does exactly that
# trace方法只记录执行过程中的操作,另一个分支没有执行到,所以记录不到
print(traced_cell_2.dg.code)
print(traced_cell_2.code)
# 查看模型推理结果,traced_cell_2与my_cell_v2的计算结果相同
print(traced_cell_2(x, h))
print(my_cell_v2(x, h))
输出结果如下:
def forward(self,
x: Tensor) -> None:
return None
def forward(self,
x: Tensor,
h: Tensor) -> Tuple[Tensor, Tensor]:
_0 = self.linear
_1 = (self.dg).forward(x, )
_2 = torch.add((_0).forward(x, ), h, alpha=1)
_3 = torch.tanh(_2)
return (_3, _3)
(tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=), tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=))
(tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=), tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=))
结论:对于输入的x和h,代码应该执行到MyDecisionGate中forward函数的if分支,trace记录到if分支的操作,trace后的模型输出与原模型一致。
ii. 使用Trace方法转后覆盖if分支的模型测试else分支:
# 现在将x乘以-1
# 查看模型推理结果, traced_cell_2与my_cell_v2的计算结果不同
# 由于trace方法生成的trace_cell_2对象只记录了MyDecisionGate中forward方法的if分支,丢弃了else分支
# 所以traced_cell_2把所有输入都当做满足if分支来处理,所以当应该是else处理时(x.sum < 0 时),计算结果就出错了
print(traced_cell_2(-x, h))
print(my_cell_v2(-x, h))
输出结果如下:
(tensor([[-0.2671, 0.1883, 0.2345, 0.6458],
[-0.2671, 0.1883, 0.2345, 0.6458],
[-0.2671, 0.1883, 0.2345, 0.6458]], grad_fn=), tensor([[-0.2671, 0.1883, 0.2345, 0.6458],
[-0.2671, 0.1883, 0.2345, 0.6458],
[-0.2671, 0.1883, 0.2345, 0.6458]], grad_fn=))
(tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=), tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=))
结论:对于输入的-x和h,代码应该执行到MyDecisionGate中forward函数的else分支,由于trace中记录到的是if分支的操作,所以trace后的模型对于-x的输出与原模型不一致。
iii. 使用Trace方法转换模型,转换后的模型覆盖else分支:
# 现在将x乘以-1
# 并同时在新的x和h的基础上重新进行trace
traced_cell_3 = torch.jit.trace(my_cell_v2, (-x, h))
print(traced_cell_3.dg.code)
# 查看模型推理结果, traced_cell_3与my_cell_v2的计算结果相同
print(traced_cell_3(-x, h))
print(my_cell_v2(-x, h))
输出结果如下:
def forward(self,
x: Tensor) -> Tensor:
return torch.neg(x)
(tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=), tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=))
(tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=), tensor([[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581],
[0.9647, 0.7493, 0.9096, 0.4581]], grad_fn=))
结论:对于输入的-x和h,代码应该执行到MyDecisionGate中forward函数的else分支,重新执行trace方法进行转换,转换后的模型记录到else分支的操作,trace后的模型输出与原模型一致。
总结:”Tracing does exactly what we said it would: run the code, record the operations that happen and construct a ScriptModule that does exactly that”。trace方法只记录执行过程中遇到的操作,另一个分支没有执行到,所以记录不到。所以trace方法不适用于Module中具有分支和循环结构的模型。
b)、Script方法转换模型
x = torch.ones(3, 4)
h = torch.ones(3, 4)
scripted_gate = torch.jit.script(MyDecisionGate())
my_cell_script = MyCell_v2(scripted_gate)
scripted_cell = torch.jit.script(my_cell_script)
print(scripted_gate.code)
print(scripted_cell.code)
输出结果如下:
def forward(self,
x: Tensor) -> Tensor:
_0 = bool(torch.gt(torch.sum(x, dtype=None), 0))
if _0:
_1 = x
else:
_1 = torch.neg(x)
return _1
def forward(self,
x: Tensor,
h: Tensor) -> Tuple[Tensor, Tensor]:
_0 = (self.linear).forward((self.dg).forward(x, ), )
new_h = torch.tanh(torch.add(_0, h, alpha=1))
return (new_h, new_h)
print(scripted_cell(x, h))
print(my_cell_script(x, h))
"""
(tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=), tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=))
(tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=), tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=))
"""
print(scripted_cell(-x, h))
print(my_cell_script(-x, h))
"""
(tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=), tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=))
(tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=), tensor([[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964],
[0.8947, 0.5101, 0.5769, 0.8964]], grad_fn=))
"""
总结:script能够处理Module模型中的控制流。
3、trace、script混合方法转换模型
a)、trace和script针对的都是torch.nn.Module类对象及其子类对象
b)、在创建Module对象时就可以使用trace或者script,并根据具体的Module代码逻辑选择使用trace还是script
c)、trace里面可以包含script,script里面也可以包含trace
d)、如果Module中包含if分支或者loop循环处理,则使用script进行转换,否则使用trace进行转换
"""
个人理解,由内而外:
1、MyDecisionGate是Module的子类,并且其中包含if分支语句,所以要使用script方法进行转换
2、MyCell_v2是Module的子类,虽然其中包含的MyDecisionGate中包含if分支,但是已经经过script方法转换了,所以对MyCell_v2使用trace方法进行转换
3、MyRNNLoop_TraceScript是Module的子类,并且其中包含for循环,所以要使用script方法进行转换
"""
class MyRNNLoop_TraceScript(torch.nn.Module):
def __init__(self):
super(MyRNNLoop_TraceScript, self).__init__()
self.cell = torch.jit.trace(MyCell_v2(torch.jit.script(MyDecisionGate())), (x, h))
def forward(self, xs):
h = torch.zeros(3, 4)
y = torch.zeros(3, 4)
for i in range(xs.size(0)):
y, h = self.cell(xs[i], h)
return y, h
rnn_loop = torch.jit.script(MyRNNLoop_TraceScript())
print(rnn_loop.code)
输出结果如下:
def forward(self,
xs: Tensor) -> Tuple[Tensor, Tensor]:
h = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
y = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
y0 = y
h0 = h
for i in range(torch.size(xs, 0)):
_0 = (self.cell).forward(torch.select(xs, 0, i), h0, )
y1, h1, = _0
y0, h0 = y1, h1
return (y0, h0)
4、trace、script转换模型性能测试对比
a)、使用trace/script转换的模型
class MyRNNLoop_TraceScript(torch.nn.Module):
def __init__(self):
super(MyRNNLoop_TraceScript, self).__init__()
self.cell = torch.jit.trace(MyCell_v2(torch.jit.script(MyDecisionGate())), (x, h))
def forward(self, xs):
h = torch.zeros(3, 4)
y = torch.zeros(3, 4)
for i in range(xs.size(0)):
y, h = self.cell(xs[i], h)
return y, h
rnn_loop = torch.jit.script(MyRNNLoop_TraceScript())
xs = torch.randn(100, 3, 4)
# 统计模型运行时间
print(rnn_loop(xs))
%timeit rnn_loop(xs)
输出结果如下:
(tensor([[-0.0442, -0.9005, 0.6578, 0.0710],
[ 0.7052, 0.8484, 0.6964, -0.8223],
[ 0.3576, -0.7521, 0.4776, 0.0805]], grad_fn=), tensor([[-0.0442, -0.9005, 0.6578, 0.0710],
[ 0.7052, 0.8484, 0.6964, -0.8223],
[ 0.3576, -0.7521, 0.4776, 0.0805]], grad_fn=))
13.6 ms ± 815 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
b)、不使用trace/script转换的模型
class MyRNNLoop_Normal(torch.nn.Module):
def __init__(self):
super(MyRNNLoop_Normal, self).__init__()
self.cell = MyCell_v2(MyDecisionGate())
def forward(self, xs):
h = torch.zeros(3, 4)
y = torch.zeros(3, 4)
for i in range(xs.size(0)):
y, h = self.cell(xs[i], h)
return y, h
rnn_loop_normal = MyRNNLoop_Normal()
# 统计模型运行时间
print(rnn_loop_normal(xs))
%timeit rnn_loop_normal(xs)
输出结果如下:
(tensor([[ 0.0571, -0.8888, -0.8796, -0.8841],
[ 0.8694, 0.3169, 0.6436, 0.4143],
[ 0.6243, -0.7954, -0.8708, -0.6556]], grad_fn=), tensor([[ 0.0571, -0.8888, -0.8796, -0.8841],
[ 0.8694, 0.3169, 0.6436, 0.4143],
[ 0.6243, -0.7954, -0.8708, -0.6556]], grad_fn=))
30.6 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
5、trace、script模型save和load
a)、保存的模型文件包含代码、参数、属性特征以及debug信息,也就是说保存的模型文件包含所有模型需要的信息
b)、所以这个模型文件可以在任何独立的进程中独立运行,甚至是和python运行环境完全不相关的环境中运行
i. 保存模型
# 保存模型
rnn_loop.save("./models/rnn_loop.pth")
ii. 加载模型
# 加载模型
loaded = torch.jit.load("./models/rnn_loop.pth")
iii. 测试模型
# 测试模型
print(loaded(xs))
%timeit loaded(xs)
输出结果如下:
(tensor([[-0.0442, -0.9005, 0.6578, 0.0710],
[ 0.7052, 0.8484, 0.6964, -0.8223],
[ 0.3576, -0.7521, 0.4776, 0.0805]], grad_fn=), tensor([[-0.0442, -0.9005, 0.6578, 0.0710],
[ 0.7052, 0.8484, 0.6964, -0.8223],
[ 0.3576, -0.7521, 0.4776, 0.0805]], grad_fn=))
13.5 ms ± 3.78 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
6、总结
script(obj[,optimize,_frames_up,_rcb]) |
Scripting a function or nn.Module will inspect the source code, compile it as TorchScript code using the TorchScript compiler, and return a ScriptModule or ScriptFunction. |
trace(func, example_inputs[, optimize, …]) |
Trace a function and return an executable or ScriptFunction that will be optimized using just-in-time compilation. |
script_if_tracing(fn) |
Compiles fn when it is first called during tracing. |
trace_module(mod, inputs[, optimize, …]) |
Trace a module and return an executable ScriptModule that will be optimized using just-in-time compilation. |
fork(func, *args, **kwargs) |
Creates an asynchronous task executing func and a reference to the value of the result of this execution. |
wait(future) |
Forces completion of a torch.jit.Future[T] asynchronous task, returning the result of the task. |
ScriptModule() |
A wrapper around C++ torch::jit::Module. |
ScriptFunction |
Functionally equivalent to a ScriptModule, but represents a single function and does not have any attributes or Parameters. |
freeze(mod[, preserved_attrs, optimize_numerics]) |
Freezing a ScriptModule will clone it and attempt to inline the cloned module’s submodules, parameters, and attributes as constants in the TorchScript IR Graph. |
save(m, f[, _extra_files]) |
Save an offline version of this module for use in a separate process. |
load(f[, map_location, _extra_files]) |
Load a ScriptModule or ScriptFunction previously saved with torch.jit.save |
ignore([drop]) |
This decorator indicates to the compiler that a function or method should be ignored and left as a Python function. |
unused(fn) |
This decorator indicates to the compiler that a function or method should be ignored and replaced with the raising of an exception. |
isinstance(obj, target_type) |
This function provides for conatiner type refinement in TorchScript. |
1)、torch.jit.trace
trace方法非常适合那些只操作单张量或张量的列表、字典和元组的代码。使用torch.jit.trace和torch.jit.trace_module ,你能将一个模型或python函数转为TorchScript中的ScriptModule或ScriptFunction。根据提供的输入样例,trace将会运行该函数并记录所有张量上执行的操作。Trace方法只会记录当前输入样例执行到的操作,以后不管遇到什么输入都会执行相同的操作,所以不适用于具有分支控制流的场景。
2)、torch.jit.script
可以编译一个带有控制流的function(使用装饰器方式或者函数调用方式)或者Module
3)、torch.jit.trace_module
当传入trace函数的是一个Module时,默认只执行并跟踪Module的forward方法,在trace_module方法中,通过传入一个字典参数(字典的key是Module中方法名称,字典的value是对应方法的输入参数),可以执行并跟踪多个方法。
4)、torch.jit.ignore
在编译时忽视某些方法,被torch.jit.ignore装饰器装饰的函数不会被TorchScript编译,被忽略的函数将直接使用Python解释器执行。如果一个Module中有被ignore忽略的方法,则这个经过TorchScript编译的模型不能执行sav方法导出模型。
5)、torch.jit.unused
在编译时忽视某些方法,被torch.jit.unused装饰器装饰的函数不会被编译到TorchScript中,并使用raise Exception来替代这个函数。
6)、torch.jit.export
编译一个不在forward中调用的方法以及递归地编译其内的所有方法,可在此方法上使用装饰器torch.jit.export
TORCHSCRIPT
INTRODUCTION TO TORCHSCRIPT
LOADING A TORCHSCRIPT MODEL IN C++
Pytorch C++部署 之 TorchScript 踩坑记录
在C++平台上部署PyTorch模型流程+踩坑实录
pytorch怎么使用c++调用部署模型?
TORCH.JIT.SCRIPT
TORCHSCRIPT LANGUAGE REFERENCE
TORCH.JIT.TRACE