by Thalles Silva
由Thalles Silva
Putting Machine Learning (ML) models to production has become a popular, recurrent topic. Many companies and frameworks offer different solutions that aim to tackle this issue.
将机器学习(ML)模型投入生产已经成为一个流行的经常性话题。 许多公司和框架提供旨在解决此问题的不同解决方案。
To address this concern, Google released TensorFlow (TF) Serving in the hope of solving the problem of deploying ML models to production.
为了解决此问题,Google发布了TensorFlow(TF)服务,希望解决将ML模型部署到生产中的问题。
This piece offers a hands-on tutorial on serving a pre-trained Convolutional Semantic Segmentation Network. By the end of this article, you will be able to use TF Serving to deploy and make requests to a Deep CNN trained in TF. Also, I’ll present an overview of the main blocks of TF Serving, and I’ll discuss its APIs and how it all works.
这一部分提供了有关如何服务于预先训练的卷积语义分割网络的动手教程。 到本文结尾,您将能够使用TF Serving部署并向经过TF训练的Deep CNN提出请求。 另外,我将概述TF Serving的主要模块,并讨论其API以及它们的工作方式。
One thing you will notice right away is that it requires very little code to actually serve a TF model. If you want to go along with the tutorial and run the example on your machine, follow it as is. But, if you only want to know about TensorFlow Serving, you can concentrate on the first two sections.
您会立即注意到的一件事是,只需很少的代码即可真正为TF模型提供服务。 如果要继续学习本教程并在计算机上运行示例,请按原样进行操作。 但是,如果您只想了解TensorFlow Serving,则可以专注于前两个部分。
This piece emphasizes some of the work we are doing here at Daitan Group.
这部分强调了我们在大潭集团正在进行的一些工作。
Let’s take some time to understand how TF Serving handles the full life-cycle of serving ML models. Here, we’ll go over (at a high level) each of the main building blocks of TF Serving. The goal of this section is to provide a soft introduction to the TF Serving APIs. For an in-depth overview, please head to the TF Serving documentation page.
让我们花一些时间来了解TF Serving如何处理ML模型的整个生命周期。 在这里,我们将简要介绍TF服务的每个主要组成部分。 本部分的目的是对TF服务API进行软介绍。 有关深入的概述,请转到TF服务文档页面。
TensorFlow Serving is composed of a few abstractions. These abstractions implement APIs for different tasks. The most important ones are Servable, Loader, Source, and Manager. Let’s go over how they interact.
TensorFlow Serving由一些抽象组成。 这些抽象实现用于不同任务的API。 最重要的是Servable,Loader,Source和Manager。 让我们研究一下它们如何交互。
In a nutshell, the serving life-cycle starts when TF Serving identifies a model on disk. The Source component takes care of that. It is responsible for identifying new models that should be loaded. In practice, it keeps an eye on the file system to identify when a new model version arrives to the disk. When it sees a new version, it proceeds by creating a Loader for that specific version of the model.
简而言之,当TF服务识别磁盘上的模型时,服务生命周期就会开始。 源组件负责这一点。 它负责确定应加载的新模型。 实际上,它会监视文件系统以识别何时有新的模型版本到达磁盘。 当看到新版本时,将通过为该模型的特定版本创建一个Loader来继续。
In summary, the Loader knows almost everything about the model. It includes how to load it and how to estimate the model’s required resources, such as the requested RAM and GPU memory. The Loader has a pointer to the model on disk along with all the necessary meta-data for loading it. But there is a catch here: the Loader is not allowed to load the model just yet.
总而言之,Loader几乎了解有关模型的所有信息。 它包括如何加载它以及如何估计模型所需的资源,例如请求的RAM和GPU内存。 加载程序在磁盘上具有指向模型的指针,以及用于加载模型的所有必要元数据。 但是这里有一个陷阱:尚未允许Loader加载模型。
After creating the Loader, the Source sends it to the Manager as an Aspired Version.
创建加载程序后,源将其作为理想版本发送给Manager。
Upon receiving the model’s Aspired Version, the Manager proceeds with the serving process. Here, there are two possibilities. One is that the first model version is pushed for deployment. In this situation, the Manager will make sure that the required resources are available. Once they are, the Manager gives the Loader permission to load the model.
收到模型的理想版本后,管理器将继续进行服务过程。 在这里,有两种可能性。 一个是第一个模型版本被推送进行部署。 在这种情况下,管理中心将确保所需的资源可用。 一旦完成,管理器将授予加载程序许可以加载模型。
The second is that we are pushing a new version of an existing model. In this case, the Manager has to consult the Version Policy plugin before going further. The Version Policy determines how the process of loading a new model version takes place.
第二个是我们正在推动现有模型的新版本。 在这种情况下,管理中心必须先咨询版本策略插件,然后再继续操作。 版本策略确定了加载新模型版本的过程。
Specifically, when loading a new version of a model, we can choose between preserving (1) availability or (2) resources. In the first case, we are interested in making sure our system is always available for incoming clients’ requests. We know that the Manager allows the Loader to instantiate the new graph with the new weights.
具体来说,在加载模型的新版本时,我们可以选择保留(1)可用性还是(2)资源。 在第一种情况下,我们有兴趣确保我们的系统始终可用于传入客户的请求。 我们知道Manager允许Loader用新的权重实例化新图。
At this point, we have two model versions loaded at the same time. But the Manager unloads the older version only after loading is complete and it is safe to switch between models.
此时,我们同时加载了两个模型版本。 但是Manager仅在加载完成后才卸载旧版本,并且可以安全地在模型之间进行切换。
On the other hand, if we want to save resources by not having the extra buffer (for the new version), we can choose to preserve resources. It might be useful for very heavy models to have a little gap in availability, in exchange for saving memory.
另一方面,如果我们想通过没有多余的缓冲区(对于新版本)来节省资源,则可以选择保留资源。 对于非常笨重的模型,在可用性方面有一点差距可能会很有用,以换取节省的内存。
At the end, when a client requests a handle for the model, the Manager returns a handle to the Servable.
最后,当客户端请求模型的句柄时,管理器会将句柄返回给Servable。
With this overview, we are set to dive into a real-world application. In the next sections, we describe how to serve a Convolutional Neural Network (CNN) using TF Serving.
有了这个概述,我们将开始深入研究实际的应用程序。 在下一部分中,我们将介绍如何使用TF服务为卷积神经网络(CNN)服务。
The first step to serve an ML model built in TensorFlow is to make sure it is in the right format. To do that, TensorFlow provides the SavedModel class.
服务于TensorFlow中构建的ML模型的第一步是确保其格式正确。 为此,TensorFlow提供了SavedModel类。
SavedModel is the universal serialization format for TensorFlow models. If you are familiar with TF, you have probably used the TensorFlow Saver to persist your model’s variables.
SavedModel是TensorFlow模型的通用序列化格式。 如果您熟悉TF,则可能已使用TensorFlow Saver来持久化模型的变量。
The TensorFlow Saver provides functionalities to save/restore the model’s checkpoint files to/from disk. In fact, SavedModel wraps the TensorFlow Saver and it is meant to be the standard way of exporting TF models for serving.
TensorFlow Saver提供了将模型的检查点文件保存到磁盘或从磁盘恢复模型的功能。 实际上,SavedModel包装了TensorFlow Saver,这是导出用于服务的TF模型的标准方法。
The SavedModel object has some nice features.
SavedModel对象具有一些不错的功能。
First, it lets you save more than one meta-graph to a single SavedModel object. In other words, it allows us to have different graphs for different tasks.
首先,它使您可以将多个元图保存到单个SavedModel对象。 换句话说,它使我们可以针对不同的任务使用不同的图形。
For instance, suppose you just finished training your model. In most situations, to perform inference, your graph doesn’t need some training-specific operations. These ops might include the optimizer’s variables, learning rate scheduling tensors, extra pre-processing ops, and so on.
例如,假设您刚刚完成模型的训练。 在大多数情况下,要执行推理,您的图形不需要进行某些特定于训练的操作。 这些操作可能包括优化程序的变量,学习率调度张量,额外的预处理操作等。
Moreover, you might want to serve a quantized version of a graph for mobile deployment.
此外,您可能希望为移动部署提供图表的量化版本。
In this context, SavedModel allows you to save graphs with different configurations. In our example, we would have three different graphs with corresponding tags such as “training”, “inference”, and “mobile”. Also, these three graphs would all share the same set of variables — which emphasizes memory efficiency.
在这种情况下,SavedModel允许您保存具有不同配置的图形。 在我们的示例中,我们将拥有三个带有相应标签(例如“ training”,“ inference”和“ mobile”)的不同图形。 而且,这三个图都将共享同一组变量-强调内存效率。
Not so long ago, when we wanted to deploy TF models on mobile devices, we needed to know the names of the input and output tensors for feeding and getting data to/from the model. This need forced programmers to search for the tensor they needed among all tensors of the graph. If the tensors were not properly named, the task could be very tedious.
不久之前,当我们想在移动设备上部署TF模型时,我们需要知道用于输入和从模型获取数据的输入和输出张量的名称。 这种需求迫使程序员在图的所有张量中搜索所需的张量。 如果张量没有正确命名,则该任务可能非常繁琐。
To make things easier, SavedModel offers support for SignatureDefs. In summary, SignatureDefs define the signature of a computation supported by TensorFlow. It determines the proper input and output tensors for a computational graph. Simply put, with these signatures you can specify the exact nodes to use for input and output.
为了使事情变得容易,SavedModel提供了对SignatureDefs的支持。 总之,SignatureDefs定义了TensorFlow支持的计算的签名。 它为计算图确定适当的输入和输出张量。 简而言之,使用这些签名,您可以指定用于输入和输出的确切节点。
To use its built-in serving APIs, TF Serving requires models to include one or more SignatureDefs.
要使用其内置的服务API,TF服务需要模型包含一个或多个SignatureDef。
To create such signatures, we need to provide definitions for inputs, outputs, and the desired method name. Inputs and Outputs represent a mapping from string to TensorInfo objects (more on this latter). Here, we define the default tensors for feeding and receiving data to and from a graph. The method_name parameter targets one of the TF high-level serving APIs.
要创建这样的签名,我们需要提供输入,输出和所需方法名称的定义。 输入和输出表示从字符串到TensorInfo对象的映射(有关后者的更多信息)。 在此,我们定义了默认张量,用于将数据馈入和接收来自图的数据。 method_name参数针对TF高级服务API之一。
Currently, there are three serving APIs: Classification, Predict, and Regression. Each signature definition matches a specific RPC API. The Classification SegnatureDef is used for the Classify RPC API. The Predict SegnatureDef is used for the Predict RPC API, and on.
当前,有三种服务API:分类,预测和回归。 每个签名定义都与特定的RPC API匹配。 分类SegnatureDef用于Classify RPC API。 Predict SegnatureDef用于Predict RPC API等。
For the Classification signature, there must be an inputs tensor (to receive data) and at least one of two possible output tensors: classes and/or scores. The Regression SignatureDef requires exactly one tensor for input and another for output. Lastly, the Predict signature allows for a dynamic number of input and output tensors.
对于分类签名,必须有一个输入张量(以接收数据)和至少两个可能的输出张量之一:类和/或分数。 回归SignatureDef恰好需要一个张量用于输入,而另一个则用于输出。 最后,Predict签名允许动态数量的输入和输出张量。
In addition, SavedModel supports assets storage for cases where ops initialization depends on external files. Also, it has mechanisms for clearing devices before creating the SavedModel.
此外,对于ops初始化取决于外部文件的情况,SavedModel支持资产存储。 此外,它具有在创建SavedModel之前清除设备的机制。
Now, let’s see how can we do it in practice.
现在,让我们看看如何在实践中做到这一点。
Before we begin, clone this TensorFlow DeepLab-v3 implementation from Github.
在开始之前,请从Github克隆此TensorFlow DeepLab-v3实现 。
DeepLab is Google’s best semantic segmentation ConvNet. Basically, the network takes an image as input and outputs a mask-like image that separates certain objects from the background.
DeepLab是Google最好的语义分割ConvNet。 基本上,网络将图像作为输入,并输出将某些对象与背景分开的类似掩码的图像。
This version was trained on the Pascal VOC segmentation dataset. Thus, it can segment and recognize up to 20 classes. If you want to know more about Semantic Segmentation and DeepLab-v3, take a look at Diving into Deep Convolutional Semantic Segmentation Networks and Deeplab_V3.
此版本已在Pascal VOC细分数据集中进行了培训。 因此,它可以细分并识别多达20个类别。 如果您想了解更多有关语义分割和DeepLab-v3的信息,请看一下深入卷积语义分割网络和Deeplab_V3 。
All the files related to serving reside into: ./deeplab_v3/serving/. There, you will find two important files: deeplab_saved_model.py and deeplab_client.ipynb
与服务相关的所有文件都位于: ./deeplab_v3/serving/中 。 在那里,您将找到两个重要文件: deeplab_saved_model.py和deeplab_client.ipynb
Before going further, make sure to download the Deeplab-v3 pre-trained model. Head to the GitHub repository above, click on the checkpoints link, and download the folder named 16645/.
在继续之前,请确保下载Deeplab-v3预训练模型。 转到上方的GitHub存储库,单击检查点链接,然后下载名为16645 /的文件夹。
In the end, you should have a folder named tboard_logs/ with the 16645/ folder placed inside it.
最后,您应该有一个名为tboard_logs /的文件夹,其中放置了16645 /文件夹。
Now, we need to create two Python virtual environments. One for Python 3 and another for Python 2. For each env, make sure to install the necessary dependencies. You can find them in the serving_requirements.txt and client_requirements.txt files.
现在,我们需要创建两个Python虚拟环境。 一个用于Python 3,另一个用于Python2。对于每个环境,请确保安装必要的依赖项。 您可以在serving_requirements.txt和client_requirements.txt文件中找到它们。
We need two Python envs because our model, DeepLab-v3, was developed under Python 3. However, the TensorFlow Serving Python API is only published for Python 2. Therefore, to export the model and run TF serving, we use the Python 3 env. For running the client code using the TF Serving python API, we use the PIP package (only available for Python 2).
我们需要两个Python环境,因为我们的模型DeepLab-v3是在Python 3下开发的。但是,TensorFlow Serving Python API仅针对Python 2发布。因此,要导出模型并运行TF服务,我们使用Python 3 env 。 为了使用TF Serving python API运行客户端代码,我们使用PIP包(仅适用于Python 2)。
Note that you can forgo the Python 2 env by using the Serving APIs from bazel. Refer to the TF Serving Instalation for more details.
请注意,您可以使用bazel中的Serving API放弃Python 2 env。 有关更多详细信息,请参阅TF服务安装 。
With that step complete, let’s start with what really matters.
完成这一步后,让我们从真正重要的地方开始。
To use SavedModel, TensorFlow provides an easy to use high-level utility class called SavedModelBuilder. The SavedModelBuilder class provides functionalities to save multiple meta graphs, associated variables, and assets.
为了使用SavedModel,TensorFlow提供了一个易于使用的高级实用程序类,称为SavedModelBuilder 。 SavedModelBuilder类提供了保存多个元图,关联的变量和资产的功能。
Let’s go through a running example of how to export a Deep Segmentation CNN model for serving.
让我们来看一个如何导出深度细分CNN模型进行投放的运行示例。
As mentioned above, to export the model, we use the SavedModelBuilder class. It will generate a SavedModel protocol buffer file along with the model’s variables and assets (if necessary).
如上所述,要导出模型,我们使用SavedModelBuilder类。 它将生成SavedModel协议缓冲区文件以及模型的变量和资产(如果需要)。
Let’s dissect the code.
让我们剖析代码。
The SavedModelBuilder receives (as input) the directory where to save the model’s data. Here, the export_path variable is the concatenation of export_path_base and the model_version. As a result, different model versions will be saved in separate directories inside the export_path_base folder.
SavedModelBuilder接收(作为输入)用于保存模型数据的目录。 在这里, export_path变量是export_path_base和model_version的串联。 结果,不同的模型版本将保存在export_path_base文件夹内的单独目录中。
Let’s say we have a baseline version of our model in production, but we want to deploy a new version of it. We have improved our model’s accuracy and want to offer this new version to our clients.
假设我们在生产中拥有模型的基准版本,但是我们想部署它的新版本。 我们已经提高了模型的准确性,并希望将此新版本提供给我们的客户。
To export a different version of the same graph, we can just set FLAGS.model_version to a higher integer value. Then a different folder (holding the new version of our model) will be created inside the export_path_base folder.
要导出同一图形的不同版本,我们只需将FLAGS.model_version设置为更高的整数值即可。 然后,将在export_path_base文件夹内创建另一个文件夹(保存模型的新版本)。
Now, we need to specify the input and output Tensors of our model. To do that, we use SignatureDefs. Signatures define what type of model we want to export. It provides a mapping from strings (logical Tensor names) to TensorInfo objects. The idea is that, instead of referencing the actual tensor names for input/output, clients can refer to the logical names defined by the signatures.
现在,我们需要指定模型的输入和输出张量。 为此,我们使用SignatureDefs 。 签名定义了我们要导出的模型类型。 它提供了从字符串(逻辑Tensor名称)到TensorInfo对象的映射。 这个想法是,客户端可以引用签名定义的逻辑名称,而不是引用输入/输出的实际张量名称。
For serving a Semantic Segmentation CNN, we are going to create a Predict Signature. Note that the build_signature_def() function takes the mapping for input and output tensors as well as the desired API.
为了提供语义分割CNN,我们将创建一个Predict Signature 。 请注意, build_signature_def()函数采用输入和输出张量以及所需API的映射。
A SignatureDef requires specification of: inputs, outputs, and method name. Note that we expect three values for inputs — an image, and two more tensors specifying its dimensions (height and width). For the outputs, we defined just one outcome — the segmentation output mask.
SignatureDef需要指定以下内容:输入,输出和方法名称。 请注意,我们期望输入有三个值-一个图像,以及两个用于指定其尺寸(高度和宽度)的张量。 对于输出 ,我们仅定义了一个结果-细分输出掩码。
Note that the strings ‘image’, ‘height’, ‘width’ and ‘segmentation_map’ are not tensors. Instead, they are logical names that refer to the actual tensors input_tensor, image_height_tensor, and image_width_tensor. Thus, they can be any unique string you like.
请注意,字符串“ image”,“ height”,“ width”和“ segmentation_map”不是张量。 相反,它们是引用实际张量input_tensor , image_height_tensor和image_width_tensor的逻辑名称。 因此,它们可以是您喜欢的任何唯一字符串。
Also, the mappings in the SignatureDefs relates to TensorInfo protobuf objects, not actual tensors. To create TensorInfo objects, we use the utility function: tf.saved_model.utils.build_tensor_info(tensor).
同样,SignatureDefs中的映射与TensorInfo protobuf对象有关,而不是实际的张量。 要创建TensorInfo对象,我们使用实用程序功能: tf.saved_model.utils.build_tensor_info(tensor) 。
That is it. Now we call the add_meta_graph_and_variables() function to build the SavedModel protocol buffer object . Then we run the save() method and it will persist a snapshot of our model to the disk containing the model’s variables and assets.
这就对了。 现在,我们调用add_meta_graph_and_variables()函数来构建SavedModel协议缓冲区对象。 然后,我们运行save()方法,它将把我们的模型快照保存到包含模型变量和资产的磁盘上。
We can now run deeplab_saved_model.py to export our model.
现在,我们可以运行deeplab_saved_model.py导出模型。
If everything went well you will see the folder ./serving/versions/1. Note that the ‘1’ represents the current version of the model. Inside each version sub-directory, you will see the following files:
如果一切顺利,您将看到文件夹./serving/versions/1 。 请注意,“ 1”代表模型的当前版本。 在每个版本子目录中,您将看到以下文件:
saved_model.pb or saved_model.pbtxt. This is the serialized SavedModel file. It includes one or more graph definitions of the model, as well as the signature definitions.
saved_model.pb或saved_model.pbtxt。 这是序列化的SavedModel文件。 它包括模型的一个或多个图形定义以及签名定义。
Now, we are ready to launching our model server. To do that, run:
现在,我们准备启动模型服务器。 为此,请运行:
$ tensorflow_model_server --port=9000 --model_name=deeplab --model_base_path=
The model_base_path refers to where the exported model was saved. Also, we do not specify the version folder in the path. The model versioning control is handled by TF Serving.
model_base_path引用保存导出的模型的位置。 另外,我们不在路径中指定版本文件夹。 模型版本控制由TF Serving处理。
The client code is very straightforward. Take a look at it in: deeplab_client.ipynb.
客户端代码非常简单。 在以下位置进行查看: deeplab_client.ipynb 。
First, we read the image we want to send to the server and convert it to the right format.
首先,我们读取要发送到服务器的图像,并将其转换为正确的格式。
Next, we create a gRPC stub. The stub allows us to call the remote server’s methods. To do that, we instantiate the beta_create_PredictionService_stub class of the prediction_service_pb2 module. At this point, the stub holds the necessary logic for calling remote procedures (from the server) as if they were local.
接下来,我们创建一个gRPC存根。 存根允许我们调用远程服务器的方法。 要做到这一点,我们实例beta_create_PredictionService_stub类prediction_service_pb2模块。 在这一点上,存根具有必要的逻辑,以便从本地调用远程过程(从服务器)。
Now, we need to create and set the request object. Since our server implements the TensorFlow Predict API, we need to parse a Predict request. To issue a Predict request, first, we instantiate the PredictRequest class from the predict_pb2 module. We also need to specify the model_spec.name and model_spec.signature_name parameters. The name param is the ‘model_name’ argument that we defined when we launched the server. And the signature_name refers to the logical name assigned to the signature_def_map() parameter of the add_meta_graph() routine.
现在,我们需要创建并设置请求对象。 由于我们的服务器实现了TensorFlow Predict API,因此我们需要解析Predict请求。 要发出Predict请求,首先,我们从predict_pb2模块实例化PredictRequest类。 我们还需要指定model_spec.name和model_spec.signature_name参数。 名称参数是启动服务器时定义的“ model_name”参数。 signature_name是指分配给add_meta_graph ()例程的signature_def_map()参数的逻辑名称。
Next, we must supply the input data as defined in the server’s signature. Remember that, in the server, we defined a Predict API to expect an image as well as two scalars (the image’s height and width). To feed the input data into the request object, TensorFlow provides the utility tf.make_tensor_proto(). This method creates a TensorProto object from a numpy/Python object. We can use it to feed the image and its dimensions to the request object.
接下来,我们必须提供服务器签名中定义的输入数据。 请记住,在服务器中,我们定义了一个Predict API以期望一个图像以及两个标量(图像的高度和宽度)。 为了将输入数据输入到请求对象中,TensorFlow提供了实用程序tf.make_tensor_proto() 。 此方法从numpy / Python对象创建TensorProto对象。 我们可以使用它将图像及其尺寸提供给请求对象。
Looks like we are ready to call the server. To do that, we call the Predict() method (using the stub) and pass the request object as an argument.
看起来我们已经准备好调用服务器了。 为此,我们调用Predict()方法(使用存根)并将请求对象作为参数传递。
For requests that return a single response, gRPC supports both: synchronous and asynchronous calls. Thus, if you want to do some work while the request is being processed, we could call Predict.future() instead of Predict().
对于返回单个响应的请求,gRPC支持:同步和异步调用。 因此,如果您想在处理请求时做一些工作,我们可以调用Predict.future()而不是Predict() 。
Now we can fetch and enjoy the results.
现在我们可以获取并享受结果了。
Hope you liked this article. Thanks for reading!
希望您喜欢这篇文章。 谢谢阅读!
How to train your own FaceID ConvNet using TensorFlow Eager executionFaces are everywhere — from photos and videos on social media websites, to consumer security applications like the…medium.freecodecamp.orgDiving into Deep Convolutional Semantic Segmentation Networks and Deeplab_V3Deep Convolutional Neural Networks (DCNNs) have achieved remarkable success in various Computer Vision applications…medium.freecodecamp.org
如何使用TensorFlow Eager执行训练自己的FaceID ConvNet 面Kong无处不在-从社交媒体网站上的照片和视频,到消费者安全应用程序,例如... medium.freecodecamp.org 深入到深度卷积语义分割网络和Deeplab_V3 深度卷积神经网络( DCNNs)在各种计算机视觉应用中都取得了非凡的成功…… medium.freecodecamp.org
翻译自: https://www.freecodecamp.org/news/how-to-deploy-tensorflow-models-to-production-using-tf-serving-4b4b78d41700/