Pytorch Tutoriais (PROTOTYPE) FX GRAPH MODE POST TRAINING STATIC QUANTIZATION

(PROTOTYPE) FX GRAPH MODE POST TRAINING STATIC QUANTIZATION

Tutorials > (prototype) FX Graph Mode Post Training Static Quantization

doc : (prototype) FX Graph Mode Post Training Static Quantization — PyTorch Tutorials 1.11.0+cu102 documentation

Author: Jerry Zhang

2022年5月24日

tag : 翻译学习

topic : Pytorch 量化


(prototype) FX Graph Mode Post Training Static Quantization

​ 本教程介绍了在基于 torch.fx 的graph mode下执行PTQ训练后静态量化的步骤。FX graph mode量化的优点是我们可以在模型上完全自动地执行量化,尽管可能需要一些努力才能使模型与FX图形模式量化兼容(符号可追溯)。

  • 我们将有一个单独的教程来展示如何使模型的一部分与FX图形模式量化兼容。
  • 还有一个 FX Graph Mode Post Training Dynamic Quantization。

tldr;FX 图形模式 API 如下所示:torch.fx

import torch
from torch.quantization import get_default_qconfig
# Note that this is temporary, we'll expose these functions to torch.quantization after official releasee
from torch.quantization.quantize_fx import prepare_fx, convert_fx
float_model.eval()
qconfig = get_default_qconfig("fbgemm")
qconfig_dict = {"": qconfig}
def calibrate(model, data_loader):
    model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)
prepared_model = prepare_fx(float_model, qconfig_dict)  # fuse modules and insert observers
calibrate(prepared_model, data_loader_test)  # run calibration on sample data
quantized_model = convert_fx(prepared_model)  # convert the calibrated model to a quantized model

1. Motivation of FX Graph Mode Quantization

​ 目前 PyTorch 仅有 eager 模式量化: Static Quantization with Eager Mode in PyTorch.

可以看到该过程涉及多个手动步骤,包括:

  • Explicitly 显式量化和去量化激活,当模型中混合使用浮点运算和量化运算时,这很耗时。
  • 显式融合模块,这需要手动确定卷积序列、批量规范和 relus 以及其他融合模式。
  • pytorch张量操作需要特殊处理(如add,concat等)。
  • 函数没有一流的支持(functional.conv2d 和 functional.linear 不会被量化)

​ 这些所需的修改大多来自Eager Mode量化的潜在局限性。Eager 模式在模块级别工作,因为它无法检查实际运行的代码(在forward函数中),量化是通过module swapping实现的,并且我们不知道模块在Eager Mode下如何在forward函数中使用,因此它需要用户手动插入QuantStubDeQuantStub来标记他们想要量化或取消量化的点。在Graph Mode下,我们可以检查在forward中执行的实际代码(例如 aten 函数调用),并通过模块和图形操作实现量化。由于图形模式具有运行代码的完全可见性,因此我们的工具能够自动找出要融合的模块以及在哪里插入observers调用,量化/去量化函数等,我们能够自动执行整个量化过程。

FX图形模式量化的优点是:

  • 简单的量化流程,最少的手动步骤
  • 解锁了进行更高级别优化的可能性,例如自动精度选择

2. Define Helper Functions and Prepare Dataset

​ 首先执行必要的导入,定义一些帮助程序函数并准备数据。这些步骤与 Static Quantization with Eager Mode in PyTorch 相同。

​ 若要使用整个 ImageNet 数据集运行本教程中的代码,请先按照 ImageNet Data 中的说明下载 imagenet。将下载的文件解压缩到“data_path”文件夹中。

​ 下载 torchvision resnet18 model并将其重命名为 .data/resnet18_pretrained_float.pth

import numpy as np
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import datasets
import torchvision.transforms as transforms
import os
import time
import sys
import torch.quantization

# Setup warnings
import warnings
warnings.filterwarnings(
    action='ignore',
    category=DeprecationWarning,
    module=r'.*'
)
warnings.filterwarnings(
    action='default',
    module=r'torch.quantization'
)

# Specify random seed for repeatable results
_ = torch.manual_seed(191009)


from torchvision.models.resnet import resnet18
from torch.quantization import get_default_qconfig, quantize_jit

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)


def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res


def evaluate(model, criterion, data_loader):
    model.eval()
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    cnt = 0
    with torch.no_grad():
        for image, target in data_loader:
            output = model(image)
            loss = criterion(output, target)
            cnt += 1
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            top1.update(acc1[0], image.size(0))
            top5.update(acc5[0], image.size(0))
    print('')

    return top1, top5

def load_model(model_file):
    model = resnet18(pretrained=False)
    state_dict = torch.load(model_file)
    model.load_state_dict(state_dict)
    model.to("cpu")
    return model

def print_size_of_model(model):
    if isinstance(model, torch.jit.RecursiveScriptModule):
        torch.jit.save(model, "temp.p")
    else:
        torch.jit.save(torch.jit.script(model), "temp.p")
    print("Size (MB):", os.path.getsize("temp.p")/1e6)
    os.remove("temp.p")

def prepare_data_loaders(data_path):

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])
    dataset = torchvision.datasets.ImageNet(
           data_path, split="train",
         transforms.Compose([
                   transforms.RandomResizedCrop(224),
                   transforms.RandomHorizontalFlip(),
                   transforms.ToTensor(),
                   normalize,
               ]))
    dataset_test = torchvision.datasets.ImageNet(
          data_path, split="val",
              transforms.Compose([
                  transforms.Resize(256),
                  transforms.CenterCrop(224),
                  transforms.ToTensor(),
                  normalize,
              ]))

    train_sampler = torch.utils.data.RandomSampler(dataset)
    test_sampler = torch.utils.data.SequentialSampler(dataset_test)

    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=train_batch_size,
        sampler=train_sampler)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=eval_batch_size,
        sampler=test_sampler)

    return data_loader, data_loader_test

data_path = '~/.data/imagenet'
saved_model_dir = 'data/'
float_model_file = 'resnet18_pretrained_float.pth'

train_batch_size = 30
eval_batch_size = 50

data_loader, data_loader_test = prepare_data_loaders(data_path)
criterion = nn.CrossEntropyLoss()
float_model = load_model(saved_model_dir + float_model_file).to("cpu")
float_model.eval()

# deepcopy the model since we need to keep the original model around
import copy
model_to_quantize = copy.deepcopy(float_model)

3. Set model to eval mode

​ 对于训练后的量化,我们需要将模型设置为评估模式。

model_to_quantize.eval()

4. Specify how to quantize the model with qconfig_dict

qconfig_dict = {"" : default_qconfig}

​ 使用与Eager Mode量化相同的qconfig,只是用于激活和权重的observers的命名元组。 是具有以下配置的字典:qconfig qconfig_dict

qconfig = {
    " : qconfig_global,
    "sub" : qconfig_sub,
    "sub.fc" : qconfig_fc,
    "sub.conv": None
}
qconfig_dict = {
    # qconfig? means either a valid qconfig or None
    # optional, global config
    "": qconfig?,
    # optional, used for module and function types
    # could also be split into module_types and function_types if we prefer
    "object_type": [
        (torch.nn.Conv2d, qconfig?),
        (torch.nn.functional.add, qconfig?),
        ...,
    ],
    # optional, used for module names
    "module_name": [
        ("foo.bar", qconfig?)
        ...,
    ],
    # optional, matched in order, first match takes precedence
    "module_name_regex": [
        ("foo.*bar.*conv[0-9]+", qconfig?)
        ...,
    ],
    # priority (in increasing order): global, object_type, module_name_regex, module_name
    # qconfig == None means fusion and quantization should be skipped for anything
    # matching the rule

    # **api subject to change**
    # optional: specify the path for standalone modules
    # These modules are symbolically traced and quantized as one unit
    # so that the call to the submodule appears as one call_module
    # node in the forward graph of the GraphModule
    "standalone_module_name": [
        "submodule.standalone"
    ],
    "standalone_module_class": [
        StandaloneModuleClass
    ]
}

​ 相关的实用程序函数可以在 qconfig文件中找到。qconfig

qconfig = get_default_qconfig("fbgemm")
qconfig_dict = {"": qconfig}

5. Prepare the Model for Post Training Static Quantization

prepared_model = prepare_fx(model_to_quantize, qconfig_dict)

prepare_fx将 BatchNorm 模块融到前边的 Conv2d 模块中,并将observers插入模型中的适当位置。

prepared_model = prepare_fx(model_to_quantize, qconfig_dict)
print(prepared_model.graph)

6. Calibration

​ 校准功能在将observers插入模型后运行。校准的目的是运行一些代表工作负载的示例(例如训练数据集的样本),以便模型中的observers能够得到张量的统计数据,稍后可以使用此信息来计算量化参数。

def calibrate(model, data_loader):
    model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)
calibrate(prepared_model, data_loader_test)  # run calibration on sample data

7. Convert the Model to a Quantized Model

convert_fx采用经过校准的模型并生成量化模型。

quantized_model = convert_fx(prepared_model)
print(quantized_model)

8. Evaluation

现在,我们可以打印量化模型的大小和精度。

print("Size of model before quantization")
print_size_of_model(float_model)
print("Size of model after quantization")
print_size_of_model(quantized_model)
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
print("[before serilaization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))

fx_graph_mode_model_file_path = saved_model_dir + "resnet18_fx_graph_mode_quantized.pth"

# this does not run due to some erros loading convrelu module:
# ModuleAttributeError: 'ConvReLU2d' object has no attribute '_modules'
# save the whole model directly
# torch.save(quantized_model, fx_graph_mode_model_file_path)
# loaded_quantized_model = torch.load(fx_graph_mode_model_file_path)

# save with state_dict
# torch.save(quantized_model.state_dict(), fx_graph_mode_model_file_path)
# import copy
# model_to_quantize = copy.deepcopy(float_model)
# prepared_model = prepare_fx(model_to_quantize, {"": qconfig})
# loaded_quantized_model = convert_fx(prepared_model)
# loaded_quantized_model.load_state_dict(torch.load(fx_graph_mode_model_file_path))

# save with script
torch.jit.save(torch.jit.script(quantized_model), fx_graph_mode_model_file_path)
loaded_quantized_model = torch.jit.load(fx_graph_mode_model_file_path)

top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test)
print("[after serialization/deserialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))

​ 如果想获得更好的准确性或性能,请尝试更改qconfig_dict。我们计划在数值套件中添加对图形模式的支持,以便您可以轻松地确定模型中不同模块的量化灵敏度: PyTorch Numeric Suite Tutorial

9. Debugging Quantized Model

我们还可以打印量化未量化的卷积以查看差异,我们将首先显式调用 fuse 以融合模型中的卷积和 bn:请注意,仅在评估模式下工作。fuse_fx

fused = fuse_fx(float_model)

conv1_weight_after_fuse = fused.conv1[0].weight[0]
conv1_weight_after_quant = quantized_model.conv1.weight().dequantize()[0]

print(torch.max(abs(conv1_weight_after_fuse - conv1_weight_after_quant)))

10. Comparison with Baseline Float Model and Eager Mode Quantization

scripted_float_model_file = "resnet18_scripted.pth"

print("Size of baseline model")
print_size_of_model(float_model)

top1, top5 = evaluate(float_model, criterion, data_loader_test)
print("Baseline Float Model Evaluation accuracy: %2.2f, %2.2f"%(top1.avg, top5.avg))
torch.jit.save(torch.jit.script(float_model), saved_model_dir + scripted_float_model_file)

​ 在本节将使用FX图模式量化的模型与在Eager模式下量化的模型进行比较。FX图形模式和Eager模式产生非常相似的量化模型,因此期望精度和加速也是相似的。

print("Size of Fx graph mode quantized model")
print_size_of_model(quantized_model)
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
print("FX graph mode quantized model Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))

from torchvision.models.quantization.resnet import resnet18
eager_quantized_model = resnet18(pretrained=True, quantize=True).eval()
print("Size of eager mode quantized model")
eager_quantized_model = torch.jit.script(eager_quantized_model)
print_size_of_model(eager_quantized_model)
top1, top5 = evaluate(eager_quantized_model, criterion, data_loader_test)
print("eager mode quantized model Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
eager_mode_model_file = "resnet18_eager_mode_quantized.pth"
torch.jit.save(eager_quantized_model, saved_model_dir + eager_mode_model_file)

​ 可以看到FX图模式和Eager模式量化模型的模型大小和精度非常相似。

​ 在 AIBench 中运行模型(使用单线程)会得到以下结果:

Scripted Float Model:
Self CPU time total: 192.48ms

Scripted Eager Mode Quantized Model:
Self CPU time total: 50.76ms

Scripted FX Graph Mode Quantized Model:
Self CPU time total: 50.63ms

​ 正如在resnet18中看到的那样,FX图形模式和Eager模式量化模型在浮点模型上获得了类似的速度,浮点模型比浮点模型快2-4倍左右。但是,浮点模型的实际加速可能会因模型、设备、构建、输入批大小、线程化等而异。

你可能感兴趣的:(#,Pytorch,相关,pytorch,原型模式,深度学习)