【TVM帮助文档学习】使用Python接口编译和优化模型(AutoTVM)

本文翻译自:Compiling and Optimizing a Model with the Python Interface (AutoTVM) — tvm 0.9.dev0 documentation

 在Compiling and Optimizing a Model with TVMC — tvm 0.9.dev0 documentation中,我们介绍了如何使用TVM的命令行界面来编译、运行和调优预训练的视觉模型ResNet-50 v2。TVM不仅仅是一个命令行工具,它还是一个优化框架,带有许多不同语言的api,在使用机器学习模型时为您提供了极大的灵活性。

在本教程中,我们将讨论与TVMC相同的内容,但将展示如何使用Python API完成。本节完成后,我们将使用TVM的Python API完成以下任务:

  •  为TVM运行时编译一个预训练的ResNet-50 v2模型。
  • 使用编译后的模型中运行一张真实的图片,并解释输出和模型性能。
  • 使用TVM调优CPU上的模型。
  • 使用TVM收集的调优数据重新编译优化模型。
  • 使用优化后的模型推理图片,并比较输出和模型性能。

 本节的目标是向您概述TVM的功能,以及如何通过Python API使用它们。

TVM是一个深度学习编译器框架,具有许多不同的模块,可用于处理深度学习模型和算子。在本教程中,我们将学习如何使用Python API加载、编译和优化模型。

我们先导入依赖包,包括onnx加载和转换模型,下载测试数据的辅助工具,处理图像数据的Python Image Library,图像数据的预处理和后处理的numpy,TVM Relay框架, TVM图执行器等。

import onnx
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
import tvm.relay as relay
import tvm
from tvm.contrib import graph_executor

下载和加载ONNX模型

在本教程中,我们将使用ResNet-50 v2。ResNet-50是一个深度为50层的卷积神经网络,旨在对图像进行分类。我们将使用的模型已经在1000种不同分类的100多万张图像上进行了预先训练。该网络的输入图像尺寸为224x224。如果您对ResNet-50模型的结构感兴趣,我们建议下载Netron,这是一个免费的ML模型查看器。

TVM提供了一个辅助库来下载预先训练的模型。通过模块提供模型URL、文件名和模型类型,TVM将下载模型并保存到磁盘。对于某个ONNX模型的实例,你可以使用ONNX运行时将其加载到内存中。

model_url = (
    "https://github.com/onnx/models/raw/main/"
    "vision/classification/resnet/model/"
    "resnet50-v2-7.onnx"
)

model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
onnx_model = onnx.load(model_path)

 TVM支持许多流行的模型格式。可以在TVM文档的编译深度学习模型(Compile Deep Learning Models — tvm 0.9.dev0 documentation)部分找到一个列表。

 下载、预处理和加载测试图片

 每个模型都有特定的张量形状、格式和数据类型。所以大多数模型需要一些预处理和后处理,以确保输入正确,并解释输出。TVMC的输入和输出数据都采用了NumPy的.npz格式。这是一种支持良好的NumPy格式,可以将多个数组序列化存入到一个文件中。

作为本教程的输入,我们将使用一只猫的图像,您也可以替换为其他任何图像。

 【TVM帮助文档学习】使用Python接口编译和优化模型(AutoTVM)_第1张图片

 下载图片,并将它转换为numpy数组作为模型的输入:

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize it to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# Our input image is in HWC layout while ONNX expects CHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize according to the ImageNet input specification
imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev

# Add the batch dimension, as we are expecting 4-dimensional input: NCHW.
img_data = np.expand_dims(norm_img_data, axis=0)

 使用Relay编译模型

接下来是编译ResNet模型。我们首先使用from_onnx接口将模型导入到Relay。然后我们使用标准优化将模型构建为一个TVM库。最后,我们使用库创建一个TVM图形运行时模块。

target = "llvm"

 正确定义target:指定正确的目标可能会对编译模块的性能产生巨大影响,因为它可以利用目标上可用的硬件特性。有关更多信息,请参阅x86 CPU的卷积网络自动调优(Auto-tuning a Convolutional Network for x86 CPU — tvm 0.9.dev0 documentation)。我们建议确定您运行的是哪个CPU,以及可选的特性,并适当地设置目标。例如,对于某些具有AVX-512向量指令集的处理器,target = "llvm -mcpu=skylake",或者target = "llvm -mcpu=skylake-avx512"。

# The input name may vary across model types. You can use a tool
# like Netron to check input names
input_name = "data"
shape_dict = {input_name: img_data.shape}

mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)

with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

 在TVM运行时上执行

现在我们已经编译好了模型,我们可以使用TVM运行时对其进行预测。为了使用TVM运行模型并进行预测,我们需要两个条件:

  • 我们刚才编译好的模型。
  • 对模型进行预测的有效输入。 
dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

 收集基本性能数据

我们在这里收集一些基本的性能数据,用来与稍后调优后的模型进行比较。为了帮助解释CPU噪声,我们以多batch、多次重复运行计算,然后收集关于平均值、中值和标准偏差的一些基本统计信息。

import timeit

timing_number = 10
timing_repeat = 10
unoptimized = (
    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
    * 1000
    / timing_number
)
unoptimized = {
    "mean": np.mean(unoptimized),
    "median": np.median(unoptimized),
    "std": np.std(unoptimized),
}

print(unoptimized)

 输出:

{'mean': 496.2511969099978, 'median': 495.80396929999324, 'std': 0.7997811122746795}

输出后处理 

正如前面提到的,每个模型都有自己特定的输出张量

在我们的示例中,我们需要对esNet-50 v2的输出做一些后处理,使用为模型提供的查找表,使其呈现为更便于人类阅读的形式。

from scipy.special import softmax

# Download a list of labels
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

# Open the output and read the output tensor
scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

输出:

class='n02123045 tabby, tabby cat' with probability=0.621103
class='n02123159 tiger cat' with probability=0.356379
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

 调优

前吻是将模型编译为在TVM运行时上运行,但是不包括任何平台相关的优化。在本节中,我们将向您展示如何使用TVM构建一个针对您的工作平台的优化模型。

在某些情况下,当使用编译后的模块运行推断时,可能无法获得预期的性能。在这种情况下,我们可以使用自动调优器,为我们的模型找到更好的配置,从而提高性能。TVM中的调优是指对模型进行优化,使其在给定目标上运行得更快的过程。这与训练或微调不同,因为它不会影响模型的准确性,而只会影响运行时性能。作为调优过程的一部分,TVM将尝试运行算子的许多不同的实现变体,以查看哪一种性能最好。这些运行的结果存储在一个调优记录文件中。

 以最简单的形式来说,调优需要提供以下三件事:

  • 您打算在其上运行此模型的设备的目标规格
  • 输出文件的路径,调优记录将存储在该文件中
  • 要调优的模型的路径。
import tvm.auto_scheduler as auto_scheduler
from tvm.autotvm.tuner import XGBTuner
from tvm import autotvm

为运行器设置一些基本参数。运行器执行由一组特定参数编译生成的代码,并测量它的性能。number指定我们将测试的不同配置的数量,而repeat指定我们将对每个配置进行多少次测量。min_repeat_ms是一个值,用于指定运行配置测试所需的时间。如果重复次数低于这个时间,则会增加。这个选项对于精确的gpu调优是必需的,而对于CPU调优则不是必需的。将该值设置为0将禁用它。超时设置了每个测试配置运行训练代码的时间上限。

number = 10
repeat = 1
min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
timeout = 10  # in seconds

# create a TVM runner
runner = autotvm.LocalRunner(
    number=number,
    repeat=repeat,
    timeout=timeout,
    min_repeat_ms=min_repeat_ms,
    enable_cpu_cache_flush=True,
)

创建一个简单的结构来保存调优选项。我们使用XGBoost算法来指导搜索。对于生产作业,您需要将试验次数设置为大于此处使用的值10。对于CPU我们推荐1500,对于GPU我们推荐3000-4000。所需的试验次数可能取决于特定的模型和处理器,因此值得花一些时间综合一系列值评估性能,以找到调优时间和模型优化之间的最佳平衡。因为运行调优是时间密集型的,所以我们将试验次数设置为10次,但不建议设置这么小的值。early_stopping参数是在应用满足提前停止搜索的条件之前,要运行的最小实验次数。measure_option选项指示将在哪里构建实验代码,以及在哪里运行它。在本例中,我们使用刚刚创建的LocalRunner和一个LocalBuilder。tuning_records选项指定要将调优数据写入的文件。 

tuning_option = {
    "tuner": "xgb",
    "trials": 10,
    "early_stopping": 100,
    "measure_option": autotvm.measure_option(
        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
    ),
    "tuning_records": "resnet-50-v2-autotuning.json",
}

 定义调优搜索算法:默认情况下,使用XGBoost Grid算法引导搜索。根据模型的复杂性和可用时间,您可能想要选择不同的算法。

设置调优参数:在本例中,出于时间考虑,我们将试验次数(trails)和提前停止(early_stopping)的数量设置为10。如果将这些值设置得更大,您可能会看到更多的性能改进,但这是以调优时间为代价的。得到一个兼顾各种条件的结果所需的试验次数,将取决于模型和目标平台的具体情况。 

# begin by extracting the tasks from the onnx model
tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)

# Tune the extracted tasks sequentially.
for i, task in enumerate(tasks):
    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
    tuner_obj = XGBTuner(task, loss_type="rank")
    tuner_obj.tune(
        n_trial=min(tuning_option["trials"], len(task.config_space)),
        early_stopping=tuning_option["early_stopping"],
        measure_option=tuning_option["measure_option"],
        callbacks=[
            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
        ],
    )

输出:

[Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  1/25]  Current/Best:   15.08/  19.42 GFLOPS | Progress: (4/10) | 7.45 s
[Task  1/25]  Current/Best:   16.95/  19.42 GFLOPS | Progress: (8/10) | 11.76 s
[Task  1/25]  Current/Best:   17.05/  19.42 GFLOPS | Progress: (10/10) | 12.64 s Done.

[Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  2/25]  Current/Best:   12.61/  20.46 GFLOPS | Progress: (4/10) | 2.48 s
[Task  2/25]  Current/Best:   13.28/  20.46 GFLOPS | Progress: (8/10) | 3.59 s
[Task  2/25]  Current/Best:   13.12/  20.46 GFLOPS | Progress: (10/10) | 4.42 s Done.

[Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  3/25]  Current/Best:   17.04/  17.04 GFLOPS | Progress: (4/10) | 2.95 s
[Task  3/25]  Current/Best:   23.87/  23.87 GFLOPS | Progress: (8/10) | 6.39 s
[Task  3/25]  Current/Best:   17.69/  23.87 GFLOPS | Progress: (10/10) | 7.17 s Done.

[Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  4/25]  Current/Best:   17.23/  18.64 GFLOPS | Progress: (4/10) | 2.52 s
[Task  4/25]  Current/Best:   13.85/  22.40 GFLOPS | Progress: (8/10) | 4.04 s
[Task  4/25]  Current/Best:   10.79/  22.40 GFLOPS | Progress: (10/10) | 9.28 s Done.

[Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  5/25]  Current/Best:   12.48/  21.12 GFLOPS | Progress: (4/10) | 2.49 s
[Task  5/25]  Current/Best:   14.24/  21.12 GFLOPS | Progress: (8/10) | 4.88 s
[Task  5/25]  Current/Best:   17.85/  21.12 GFLOPS | Progress: (10/10) | 5.63 s Done.

[Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  6/25]  Current/Best:   11.62/  11.62 GFLOPS | Progress: (4/10) | 3.61 s
[Task  6/25]  Current/Best:   14.97/  19.32 GFLOPS | Progress: (8/10) | 5.40 s
[Task  6/25]  Current/Best:    4.89/  19.32 GFLOPS | Progress: (10/10) | 6.80 s Done.

[Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  7/25]  Current/Best:   15.88/  15.88 GFLOPS | Progress: (4/10) | 3.25 s
[Task  7/25]  Current/Best:   13.90/  15.88 GFLOPS | Progress: (8/10) | 5.41 s
[Task  7/25]  Current/Best:   16.98/  20.14 GFLOPS | Progress: (10/10) | 6.19 s Done.

[Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  8/25]  Current/Best:   15.27/  15.27 GFLOPS | Progress: (4/10) | 9.10 s
[Task  8/25]  Current/Best:    9.71/  15.27 GFLOPS | Progress: (8/10) | 12.85 s
[Task  8/25]  Current/Best:    5.26/  19.76 GFLOPS | Progress: (10/10) | 13.95 s Done.

[Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task  9/25]  Current/Best:   21.00/  23.28 GFLOPS | Progress: (4/10) | 2.26 s
[Task  9/25]  Current/Best:    6.83/  23.28 GFLOPS | Progress: (8/10) | 4.53 s
[Task  9/25]  Current/Best:    8.35/  23.28 GFLOPS | Progress: (10/10) | 5.16 s Done.

[Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 10/25]  Current/Best:    4.24/  13.50 GFLOPS | Progress: (4/10) | 2.65 s
[Task 10/25]  Current/Best:   18.35/  18.35 GFLOPS | Progress: (8/10) | 4.04 s
[Task 10/25]  Current/Best:    7.82/  18.35 GFLOPS | Progress: (10/10) | 4.73 s Done.

[Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 11/25]  Current/Best:   12.36/  15.69 GFLOPS | Progress: (4/10) | 3.37 s
[Task 11/25]  Current/Best:   15.19/  23.20 GFLOPS | Progress: (8/10) | 4.97 s
[Task 11/25]  Current/Best:   14.85/  23.33 GFLOPS | Progress: (10/10) | 5.73 s Done.

[Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 12/25]  Current/Best:   15.70/  20.95 GFLOPS | Progress: (4/10) | 3.11 s
[Task 12/25]  Current/Best:   13.96/  20.95 GFLOPS | Progress: (8/10) | 6.27 s
[Task 12/25]  Current/Best:    6.09/  20.95 GFLOPS | Progress: (10/10) | 7.64 s Done.

[Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 13/25]  Current/Best:   11.49/  19.01 GFLOPS | Progress: (4/10) | 3.13 s
[Task 13/25]  Current/Best:    9.63/  19.01 GFLOPS | Progress: (8/10) | 6.25 s
[Task 13/25]  Current/Best:   10.28/  19.01 GFLOPS | Progress: (10/10) | 7.64 s Done.

[Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 14/25]  Current/Best:   15.85/  15.85 GFLOPS | Progress: (4/10) | 3.17 s
[Task 14/25]  Current/Best:    5.76/  18.55 GFLOPS | Progress: (8/10) | 6.54 s
[Task 14/25]  Current/Best:   13.77/  18.55 GFLOPS | Progress: (10/10) | 7.30 s
[Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 15/25]  Current/Best:   18.00/  19.72 GFLOPS | Progress: (4/10) | 2.77 s
[Task 15/25]  Current/Best:    1.72/  23.47 GFLOPS | Progress: (8/10) | 4.84 s
[Task 15/25]  Current/Best:    7.08/  23.47 GFLOPS | Progress: (10/10) | 5.56 s
[Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 16/25]  Current/Best:    7.06/  11.45 GFLOPS | Progress: (4/10) | 4.34 s
[Task 16/25]  Current/Best:   17.36/  20.81 GFLOPS | Progress: (8/10) | 5.45 s
[Task 16/25]  Current/Best:   11.77/  20.81 GFLOPS | Progress: (10/10) | 7.54 s Done.

[Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 17/25]  Current/Best:   17.96/  21.18 GFLOPS | Progress: (4/10) | 3.15 s Done.
 Done.

[Task 17/25]  Current/Best:   16.94/  21.18 GFLOPS | Progress: (8/10) | 6.81 s
[Task 17/25]  Current/Best:   18.82/  21.18 GFLOPS | Progress: (10/10) | 7.71 s Done.

[Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 18/25]  Current/Best:    7.06/  21.85 GFLOPS | Progress: (4/10) | 6.66 s
[Task 18/25]  Current/Best:   13.46/  21.85 GFLOPS | Progress: (8/10) | 8.50 s
[Task 18/25]  Current/Best:    4.30/  21.85 GFLOPS | Progress: (10/10) | 10.63 s Done.

[Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 19/25]  Current/Best:   19.72/  19.72 GFLOPS | Progress: (4/10) | 3.58 s
[Task 19/25]  Current/Best:   11.06/  19.72 GFLOPS | Progress: (8/10) | 8.72 s
[Task 19/25]  Current/Best:   20.04/  20.04 GFLOPS | Progress: (10/10) | 10.02 s Done.

[Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 20/25]  Current/Best:   13.85/  17.00 GFLOPS | Progress: (4/10) | 2.22 s
[Task 20/25]  Current/Best:    6.31/  20.17 GFLOPS | Progress: (8/10) | 7.44 s
[Task 20/25]  Current/Best:   15.74/  20.17 GFLOPS | Progress: (10/10) | 8.15 s
[Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 21/25]  Current/Best:   14.35/  19.50 GFLOPS | Progress: (4/10) | 2.63 s
[Task 21/25]  Current/Best:   16.28/  19.50 GFLOPS | Progress: (8/10) | 5.50 s
[Task 21/25]  Current/Best:   10.70/  19.50 GFLOPS | Progress: (10/10) | 6.82 s
[Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 22/25]  Current/Best:   10.63/  19.50 GFLOPS | Progress: (4/10) | 3.26 s
[Task 22/25]  Current/Best:    2.71/  19.50 GFLOPS | Progress: (8/10) | 5.71 s
[Task 22/25]  Current/Best:   17.89/  19.50 GFLOPS | Progress: (10/10) | 6.49 s Done.

[Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 23/25]  Current/Best:   20.28/  20.28 GFLOPS | Progress: (4/10) | 4.05 s
[Task 23/25]  Current/Best:   22.31/  22.31 GFLOPS | Progress: (8/10) | 6.62 s
[Task 23/25]  Current/Best:   12.03/  22.31 GFLOPS | Progress: (10/10) | 7.65 s Done.

[Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s Done.
 Done.

[Task 24/25]  Current/Best:    3.68/   3.68 GFLOPS | Progress: (4/10) | 50.27 s
[Task 24/25]  Current/Best:    2.41/   9.14 GFLOPS | Progress: (8/10) | 73.54 s
[Task 24/25]  Current/Best:    5.75/   9.14 GFLOPS | Progress: (10/10) | 75.42 s
[Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/10) | 0.00 s
[Task 25/25]  Current/Best:    5.65/   5.65 GFLOPS | Progress: (4/10) | 23.04 s
[Task 25/25]  Current/Best:    3.50/   8.90 GFLOPS | Progress: (8/10) | 25.65 s
[Task 25/25]  Current/Best:    2.99/   8.90 GFLOPS | Progress: (10/10) | 26.52 s
T

 这个调优过程的输出如下所示:

# [Task  1/24]  Current/Best:   10.71/  21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done.
# [Task  1/24]  Current/Best:    9.32/  24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done.
# [Task  2/24]  Current/Best:   22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done.
# [Task  3/24]  Current/Best:   32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done.
# [Task  4/24]  Current/Best:   11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done.
# [Task  5/24]  Current/Best:   23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done.
# [Task  6/24]  Current/Best:   38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done.
# [Task  7/24]  Current/Best:    4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done.
# [Task  8/24]  Current/Best:   50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done.
# [Task  9/24]  Current/Best:   50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done.
# [Task 10/24]  Current/Best:   12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done.
# [Task 11/24]  Current/Best:   62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done.
# [Task 12/24]  Current/Best:   36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done.
# [Task 13/24]  Current/Best:    7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done.
# [Task 14/24]  Current/Best:   12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done.
# [Task 15/24]  Current/Best:   31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done.
# [Task 16/24]  Current/Best:   31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done.
# [Task 17/24]  Current/Best:   41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done.
# [Task 18/24]  Current/Best:   15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done.
# [Task 19/24]  Current/Best:   15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done.
# [Task 20/24]  Current/Best:   30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done.
# [Task 21/24]  Current/Best:   46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done.
# [Task 22/24]  Current/Best:   13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done.
# [Task 23/24]  Current/Best:   53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done.
# [Task 24/24]  Current/Best:   25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done.

 使用调优数据编译一个优化的模型

 上述调优过程的输出(即调优记录)存储在resnet-50-v2-autotuning.json中。编译器将使用它们为您指定的目标上的模型生成高性能代码。

现在已经收集了模型的调优数据,我们可以使用优化的算子重新编译模型,以加快计算速度。 

with autotvm.apply_history_best(tuning_option["tuning_records"]):
    with tvm.transform.PassContext(opt_level=3, config={}):
        lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

 输出:

Done.

运行优化后的模型,验证优化前后输出是一致的: 

dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

 输出:

class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

 可以看到输出结果和优化前一致:

# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

比较优化和未优化的模型

 我们希望收集当前优化模型的一些基本性能数据,以便与优化前进行比较。通过比较您应该会看到性能改进,提升多少取决于底层硬件、迭代次数以及其他因素。

import timeit

timing_number = 10
timing_repeat = 10
optimized = (
    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
    * 1000
    / timing_number
)
optimized = {"mean": np.mean(optimized), "median": np.median(optimized), "std": np.std(optimized)}


print("optimized: %s" % (optimized))
print("unoptimized: %s" % (unoptimized))

输出:

optimized: {'mean': 426.5695632400002, 'median': 426.31598235000183, 'std': 0.8991986364530805}
unoptimized: {'mean': 496.2511969099978, 'median': 495.80396929999324, 'std': 0.7997811122746795}

 小结

在本教程中,我们给出了一个关于如何使用TVM Python API来编译、运行和调优模型的简短示例。我们还讨论了对输入和输出进行预处理和后处理的必要性。在调优过程之后,我们演示了如何比较未优化模型和优化模型的性能。

这里我们给出了一个在本地使用ResNet-50 v2的简单示例。但是,TVM支持更多的特性,包括交叉编译、远程执行和分析/基准测试。

脚本的总运行时间:(7分钟48.959秒)

你可能感兴趣的:(TVM官方文档翻译,深度学习)