Pytorch and Tensorflow are two widely used frameworks that have become today’s standard when it comes to deep learning. Tensorflow is Google’s child, released in 2015, and has been the most famous deep learning framework ever since, but now there is a new kid on the block. With its first stable version released a few weeks back, Facebook’s Pytorch is a deep learning framework that has won the hearts of many researchers and developers due to its simplicity, dynamic graphs, and overall more natural developing experience when compared to Tensorflow. When it comes to preferences, some may prefer Tensorflow, some Pytorch, and we’re not here to judge but to make the best out of both.
At Styria.ai, we have had a lot of experience with Tensorflow, utilizing it to prototype solutions, but also to deploy complete deep learning pipelines which can support thousands of requests per second. Our experience taught us that the Tensorflow deployment component, named Tensorflow Serving, proved to be the perfect solution for deploying our deep learning pipelines. Serving offers us some neat features including automatic model switching, integrated API, and blazing fast execution because it’s all written in an optimized C++ code.
During the past few months, we have been experimenting with Pytorch to train and evaluate our models. Anyone in our team who has tried developing in Pytorch has said it was superior to the Tensorflow development process. Pytorch code is more Pythonic and so simple to write that we have managed to recreate one of our categorization pipelines in just two weeks, a work that took months to do in Tensorflow!
Given the reasoning above, the conclusion was to use Pytorch for model development, training, and evaluation, while Tensorflow in the production (Pytorch has also become production compatible as of v1.0, but we haven’t tested it yet). The only problem remaining was the bridge between the two libraries, and here we opted for the ONNX, a universal format for deep learning models. ONNX enabled us to convert the Pytorch model to a frozen Tensorflow graph compatible with Tensorflow Serving. The full conversion script is here:
"""
Exports a pytorch model to an ONNX format, and then converts from the
ONNX to a Tensorflow serving protobuf file.
Running example:
python3 pytorch_to_tf_serving.py \
--onnx-file text.onnx \
--meta-file text.meta \
--export-dir serving_model/
"""
import logging
import argparse
import tensorflow as tf
from tensorflow.python.saved_model import utils as smutils
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from onnx_tf.backend import prepare
import onnx
import torch
import torchvision
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
def export_onnx(model, dummy_input, file, input_names, output_names,
num_inputs):
"""
Converts a Pytorch model to the ONNX format and saves the .onnx model file.
The first dimension of the input nodes are of size N, where N is the
minibatch size. This dimensions is here replaced by an arbitrary string
which the ONNX -> TF library interprets as the '?' dimension in Tensorflow.
This process is applied because the input minibatch size should be of an
arbitrary size.
:param model: Pytorch model instance with loaded weights
:param dummy_input: tuple, dummy input numpy arrays that the model
accepts in the inference time. E.g. for the Text+Image model, the
tuple would be (np.float32 array of N x W x H x 3, np.int64 array of
N x VocabDim). Actual numpy arrays values don't matter, only the shape
and the type must match the model input shape and type. N represents
the minibatch size and can be any positive integer. True batch size
is later handled when exporting the model from the ONNX to TF format.
:param file: string, Path to the exported .onnx model file
:param input_names: list of strings, Names assigned to the input nodes
:param output_names: list of strings, Names assigned to the output nodes
:param num_inputs: int, Number of model inputs (e.g. 2 for Text and Image)
"""
# List of onnx.export function arguments:
# https://github.com/pytorch/pytorch/blob/master/torch/onnx/utils.py
# ISSUE: https://github.com/pytorch/pytorch/issues/14698
torch.onnx.export(model, args=dummy_input, input_names=input_names,
output_names=output_names, f=file)
# Reload model to fix the batch size
model = onnx.load(file)
model = make_variable_batch_size(num_inputs, model)
onnx.save(model, file)
log.info("Exported ONNX model to {}".format(file))
def make_variable_batch_size(num_inputs, onnx_model):
"""
Changes the input batch dimension to a string, which makes it variable.
Tensorflow interpretes this as the "?" shape.
`num_inputs` must be specified because `onnx_model.graph.input` is a list
of inputs of all layers and not just model inputs.
:param num_inputs: int, Number of model inputs (e.g. 2 for Text and Image)
:param onnx_model: ONNX model instance
:return: ONNX model instance with variable input batch size
"""
for i in range(num_inputs):
onnx_model.graph.input[i].type.tensor_type.\
shape.dim[0].dim_param = 'batch_size'
return onnx_model
def export_tf_proto(onnx_file, meta_file):
"""
Exports the ONNX model to a Tensorflow Proto file.
The exported file will have a .meta extension.
:param onnx_file: string, Path to the .onnx model file
:param meta_file: string, Path to the exported Tensorflow .meta file
:return: tuple, input and output tensor dictionaries. Dictionaries have a
{tensor_name: TF_Tensor_op} structure.
"""
model = onnx.load(onnx_file)
# Convert the ONNX model to a Tensorflow graph
tf_rep = prepare(model)
output_keys = tf_rep.outputs
input_keys = tf_rep.inputs
tf_dict = tf_rep.tensor_dict
input_tensor_names = {key: tf_dict[key] for key in input_keys}
output_tensor_names = {key: tf_dict[key] for key in output_keys}
tf_rep.export_graph(meta_file)
log.info("Exported Tensorflow proto file to {}".format(meta_file))
return input_tensor_names, output_tensor_names
def export_for_serving(meta_path, export_dir, input_tensors, output_tensors):
"""
Exports the Tensorflow .meta model to a frozen .pb Tensorflow serving
format.
:param meta_path: string, Path to the .meta TF proto file.
:param export_dir: string, Path to directory where the serving model will
be exported.
:param input_tensor: dict, Input tensors dictionary of
{name: TF placeholder} structure.
:param output_tensors: dict, Output tensors dictionary of {name: TF tensor}
structure.
"""
g = tf.Graph()
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
graph_def = tf.GraphDef()
with g.as_default():
with open(meta_path, "rb") as f:
graph_def.ParseFromString(f.read())
# name argument must explicitly be set to an empty string, otherwise
# TF will prepend an `import` scope name on all operations
tf.import_graph_def(graph_def, name="")
tensor_info_inputs = {name: smutils.build_tensor_info(in_tensor)
for name, in_tensor in input_tensors.items()}
tensor_info_outputs = {name: smutils.build_tensor_info(out_tensor)
for name, out_tensor in output_tensors.items()}
prediction_signature = signature_def_utils.build_signature_def(
inputs=tensor_info_inputs,
outputs=tensor_info_outputs,
method_name=signature_constants.PREDICT_METHOD_NAME)
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={"predict_images": prediction_signature})
builder.save()
log.info("Input info:\n{}".format(tensor_info_inputs))
log.info("Output info:\n{}".format(tensor_info_outputs))
def main(args):
model = torchvision.models.alexnet(pretrained=True)
img_input = torch.randn(1, 3, 224, 224)
input_names = ['input_img']
output_names = ['confidences']
# Use a tuple if there are multiple model inputs
dummy_inputs = (img_input)
export_onnx(model, dummy_inputs, args.onnx_file,
input_names=input_names,
output_names=output_names,
num_inputs=len(dummy_inputs))
input_tensors, output_tensors = export_tf_proto(args.onnx_file,
args.meta_file)
export_for_serving(args.meta_file, args.export_dir, input_tensors,
output_tensors)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--onnx-file', help="File where to export the ONNX file", type=str,
required=True)
parser.add_argument(
'--meta-file', help="File where to export the Tensorflow meta file",
type=str, required=True)
parser.add_argument(
'--export-dir',
help="Folder where to export proto models for TF serving",
type=str, required=True)
args = parser.parse_args()
main(args)
Script will convert the pre-trained AlexNet model to a Tensorflow Serving format.
The idea is to first convert the Pytorch model to an ONNX format, followed by the conversion from ONNX to Tensorflow Serving.
export_onnx
is the function responsible for converting Ptorch models to a universal ONNX format. Most of the knowledge on the Pytorch -> ONNX conversion is here, so we won’t go into much detail, but we want to mention one detail not mentioned in the Pytorch documentation. Pytorch still does not support exporting ONNX models with a dynamic batch size so here we used a workaround to fix this deficiency. As Tensorflow itself supports dynamic batch sizes, the trick lies in replacing the batch dimension with an arbitrary string value. make_variable_batch_size
iterates over first num_inputs
nodes of the ONNX graph and replaces the first dimension with the string of value 'batch_size'
. This will instruct the ONNX -> Tensorflow converter to put ?
as the batch dimension, resulting in support for dynamic batches in the Tensorflow model. The Pytorch -> ONNX converter supports multiple inputs and outputs so we have also included code that handles this use case.
To convert the ONNX model to a Tensorflow one, we will use the onnx-tensorflow library.
Going through the code in more detail, you’ll notice how the conversion has an extra step we haven’t mentioned yet. The script will not make a direct ONNX -> Tensorflow Serving conversion, but will first convert the ONNX to a Tensorflow proto file. This extra step is required due to an inability of the direct ONNX -> Tensorflow Serving conversion in the onnx-tensorflow library. The final step is to generate the Tensorflow Serving format using the export_for_serving
function.
If everything went successfully, you will see an output similar to this one:
INFO:__main__:Input info: {'input_img': name: "input_img:0" dtype: DT_FLOAT tensor_shape { dim { size: -1 } dim { size: 3 } dim { size: 224 } dim { size: 224 } } } INFO:__main__:Output info: {'confidences': name: "add_8:0" dtype: DT_FLOAT tensor_shape { dim { size: -1 } dim { size: 1000 } } }
Here in the output, you can check if the input and the output dimensions match the expected values. Batch size dimension should be set to -1
, indicating the dynamic batch size support.
If the conversion was a success, you will find a newly created directory in which the Tensorflow Serving model was stored. saved_model_cli
is a handy script that comes with the Tensorflow library and it can be used to run model inference on the arbitrary input data. Our model expects a batch of normalized float32 images, stored in a form of a Numpy array and saved on a disk in the .npy
format:
import numpy as np x = np.random.randn(64, 3, 224, 224) np.save('input_img.npy', x)
To test the model inference, run:
saved_model_cli run --tag_set serve --signature_def predict_images --dir /path/to/tf_serving_model_dir/ --inputs input_img=/path/to/input_img.npy
The output should be a list of confidences for all categories.
Although the conversion is possible, there exist some limitations on which we ran into while producing the conversion script.
Pytorch docs lists many supported operations stating this list is enough to convert some of the famous deep learning models such as: ResNet, SuperResolution, word_language_model, etc. The range of operations is quite large, however, if you need an op that isn’t supported, Pytorch offers custom op implementation if you have time and skill.
We have discovered that some pooling operations do not support the NCHW input type when the model inference is running on CPU. The remedy for this issue is not simple because Pytorch currently does not support the NHWC input format on many operations.
Another limitation is tied to how Tensorflow implements padding in pooling and convolution operations. Tensorflow is limited to two modes of padding, VALID
and SAME
. On the other hand, Pytorch can accept any form of padding values which results in the incompatibility among Pytorch and Tensorflow layers. There are two solutions to this issue: