Ubuntu 16.04 搭建 Tensorflow Serving(Server 端:Docker Image,Client 端:gRPC 请求)

 

目录

前言:Tensorflow Serving Quick Run

详细步骤

安装/升级 Docker 到最新版本

前言

步骤

问题

安装/升级 Nvidia-Docker 2.0

前言

步骤

问题

Install / Run a GPU serving image

前言

步骤

实际项目部署

Server 端

Client 端

参考


前言:Tensorflow Serving Quick Run

# Download the TensorFlow Serving Docker image and repo
docker pull tensorflow/serving
git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"

# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \
   -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
   -e MODEL_NAME=half_plus_two \
   tensorflow/serving &

# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
   -X POST http://localhost:8501/v1/models/half_plus_two:predict

# Returns => { "predictions": [2.5, 3.0, 4.5] }

详细步骤

安装/升级 Docker 到最新版本

前言

官方有很多的 Docker 安装方法,这次我们用其推荐的 Install from the repository 方法。

步骤

  1. 从 stable repository 安装: https://docs.docker.com/install/linux/docker-ce/ubuntu/#extra-steps-for-aufs
  2. Post-installation steps for Linux: https://docs.docker.com/install/linux/linux-postinstall/

问题

1. 找不到 docker daemon

~$ docker run hello-world
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?. See 'docker run --help'.

原因: docker 服务未启动。启动 docker 服务:

~$ service docker start

或者设置 docker 开机自启动: 

systemctl enable docker # 开机自动启动docker
systemctl start docker # 启动docker
systemctl restart docker # 重启dokcer

安装/升级 Nvidia-Docker 2.0

前言

看好prerequisites,没有什么特别需要注意的。

步骤

  1. 从 NVIDIA/nvidia-docker 的 GitHub Wiki 中跟着走安装步骤: https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)#prerequisites
  2. Test nvidia-smi with the latest official CUDA image (https://github.com/NVIDIA/nvidia-docker#ubuntu-140416041804-debian-jessiestretch)
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

问题

目前还没有遇到问题。

Install / Run a GPU serving image

前言

在拉取 Docker Image 的时候,注意区分 CPU 版本和 GPU 版本

docker pull tensorflow/serving
docker pull tensorflow/serving:latest-gpu

步骤

拉取 Docker Image

docker pull tensorflow/serving:latest-gpu

 拉取 toy model

mkdir -p /tmp/tfserving
cd /tmp/tfserving
git clone https://github.com/tensorflow/serving

部署 toy model 到 Tensorflow Server 上

docker run --runtime=nvidia -p 8501:8501 \
  --mount type=bind,source=/media/kent/DISK2/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two \
  -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &

发送预测请求

curl -d '{"instances": [1.0, 2.0, 5.0]}' \
  -X POST http://localhost:8501/v1/models/half_plus_two:predict

Docker 常用操作 Tips

Command Description

docker container inspect

Display detailed information on one or more containers
docker ps -aq 列出所有容器ID
docker ps -a 查看所有运行或者不运行容器
docker stop $(docker ps -aq) 停止所有的container(容器)
docker rm $(docker ps -aq) 删除所有container(容器)
docker container logs Fetch the logs of a container
docker container ls 列出所有container(容器)的详细信息
docker image ls 列出所有image(镜像)的详细信息

实际项目部署

前面的介绍是试验了一个toy model。它并没有涉及到如何将训练好的模型保存为Server可用的模型。这一部分介绍一下如何部署自己训练好的模型,并通过Client端发送预测请求。

Server 端

1. 模型上线

#!/usr/bin/env bash
# Start the Docker image of the Tensorflow Server
SERVER_MDL_PATH=./path/to/exported/model/root/
MODEL_NAME=model_name

# Note: for the port parameter it should be in format like: -p local_ip:local_port:docker_port

# - Option 1: Run Model on GPU:
docker run --runtime=nvidia \
-p 10.0.0.1:5501:8500 \
--mount type=bind,source=${SERVER_MDL_PATH},target=/models/${MODEL_NAME} \
-e MODEL_NAME=${MODEL_NAME} -t tensorflow/serving:latest-gpu &

# - Option 2: Run Model on CPU:
docker run --runtime=nvidia \
-p 10.0.0.1:5501:8500 \
--mount type=bind,source=${SERVER_MDL_PATH},target=/models/${MODEL_NAME} \
-e MODEL_NAME=${MODEL_NAME} -t tensorflow/serving:latest &

2. 获取 Server IP 

# 2. Get the server IP address
# (Note: this is not straightly applied to everyone, you might need more steps to
#  set up your router first. Example tutorial: https://www.youtube.com/watch?v=7gVHVERECu4)

curl ifconfig.me

问题 

1. docker 在 GPU 上部署模型的时候无法部署 int32 类型的 op

报错: 

Invalid argument: Cannot assign a device for operation Variable: Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

同时会看到 int32 的类似字眼。

办法1:

在导出模型模型的时候,在所有 with tf.device(device_str): 的地方将 device_str 设置为 device_str = '/cpu:0'。

办法2:

在训练和生成模型的时候,把所有 int32 的 op 和 placeholder 都改为 int64。

Client 端

1. 安装环境和依赖包

# _____ Prerequisites _____
# 1. install Python 2.7 (Python 3 for Windows 7 or later)
# 2. install virtualenv
# 3. activate the virtualenv
# (you can find the commands of the above steps @ https://www.tensorflow.org/install/pip)
# 4. pip install tensorflow-serving-api
# (NOTE: this will also install the tensorflow-cpu module.
#        If you already have tensorflow-gpu on your computer, create a new virtualenv to install this package, 
#        do NOT install the tensorflow-serving-api and tensorflow-gpu in the same virtualenv.)
# (NOTE: if you want to install a specific version (like v1.5 for old machine),
#        use command like
#            pip install 'tensorflow-serving-api-python3~=1.5.0'
#        for more commands, see https://pypi.org/project/tensorflow-serving-api-python3/)

2. 修改下面的 script 以适应你的 server model (修改 # todo 的地方)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 1/12/2018
# @Author  : Fangliang Bai
# @Software: PyCharm Professional

# _____ Prerequisites _____
# 1. install Python 2.7 (Python 3 for Windows 7 or later)
# 2. install virtualenv
# 3. activate the virtualenv
# (you can find the commands of the above steps @ https://www.tensorflow.org/install/pip)
# 4. pip install tensorflow-serving-api

import json
import numpy as np
from grpc.beta import implementations
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc, prediction_service_pb2

# 1. Create gRPC stub
server_ip = '0.0.0.0'       # Running gRPC ModelServer at 0.0.0.0:8500. # todo
server_port = int(8500)     # gRPC port (default value, no need to change)
channel = implementations.insecure_channel(server_ip, server_port)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

# 2. Initial request variables
INPUT_WIDE_KEY = 'x_wide'   # input tensor1 name of the network # todo
INPUT_DEEP_KEY = 'x_deep'   # input tensor2 name of the network # todo
OUTPUT_KEY = 'output'       # output tensor name of the network # todo
WIDE_DIM = 10   # todo
DEEP_DIM = 10   # todo
x_wide_data = np.random.rand(100).reshape(-1, 10)   # input data1 # todo
x_deep_data = np.random.rand(100).reshape(-1, 10)   # input data2 # todo

# 3. Initial request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'model_name'  # the model_name is the name registered with docker image. # todo
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.inputs[INPUT_WIDE_KEY].CopyFrom(
    tf.contrib.util.make_tensor_proto(x_wide_data, shape=[10, WIDE_DIM], dtype=tf.float32))
request.inputs[INPUT_DEEP_KEY].CopyFrom(
    tf.contrib.util.make_tensor_proto(x_deep_data, shape=[10, DEEP_DIM], dtype=tf.float32))

# 4. Send request
res = stub.Predict(request, 10.0)   # 10s timeout
print(res.outputs[OUTPUT_KEY])

# # Request method 2 (NOT verified)
# # _____ NOTE _____ The jason has serialization problem when dealing with numpy array data. So need to convert
# #                  numpy array to list using the following class
#
#
# class NumpyEncoder(json.JSONEncoder):
#     def default(self, obj):
#         if isinstance(obj, np.ndarray):
#             return obj.tolist()
#         return json.JSONEncoder.default(self, obj)
#
#
# data1 = {"keep_prob": 1.0, "input_x": x_test[0]}
# data2 = {"keep_prob": 1.0, "input_x": x_test[1]}
# data3 = {"keep_prob": 1.0, "input_x": x_test[2]}
# param = {"instances": [data1, data2, data3]}
# param = json.dumps(param, cls=NumpyEncoder)
# res = request.post('http://localhost:8501/v1/models/find_lemma_category:predict', data=param)

关于外网 prediction 请求

思路一:重定向

1. [重置]https://wiki.archlinux.org/index.php/Iptables_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)#%E9%87%8D%E7%BD%AE%E8%A7%84%E5%88%99

2. [重置] https://www.cnblogs.com/hongchenok/p/3577354.html

3. [设置] http://www.voidcn.com/article/p-mgnvivhg-gr.html

4. [设置] https://stackoverrun.com/cn/q/3981105

5. [设置] https://www.oschina.net/question/141942_2237299

6. [设置] https://blog.csdn.net/javaee_ssh/article/details/22167149

参考

  1. https://github.com/aws/sagemaker-tensorflow-container/blob/master/test/integ/container_tests/layers_prediction.py
  2. https://sthalles.github.io/serving_tensorflow_models/

你可能感兴趣的:(Ubuntu 16.04 搭建 Tensorflow Serving(Server 端:Docker Image,Client 端:gRPC 请求))