deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)

训练

Deeplab项目安装以及测试

首先为了确保版本支持,先得确认你的tensorflow的版本是1.10以上的。我的linux系统上装的是1.14的tensorflow,因为我一直用的这个版本。

克隆deeplab项目

git clone https://gitee.com/yujiahao123/models2

这是我为了速度fork的deeplab,因为tensorflow的github clone实在太慢了,老是中断,特别难受,当开始我还不知道有码云这种神器,让同事Roland兄弟帮我从加拿大下载然后发邮件给我才拿到,哈哈。

添加项目依赖路径

sudo gedit ~/.bashr

在这个文件最后加上一句

export
PYTHONPATH=/home/william/models/research/slim:/home/william/models/research:$PYTHONPATH

这边把路径改成你自己的
然后激活一下环境

source ~/.bashrc

测试deeplab

cd /home/william/models/research/deeplab

执行

python model_test.py

如果最后输出了OK就说明安装成功了

数据集处理

我的数据是同事Roland兄弟给我的(顺便一提加拿大到中国的文件传输实在太恶心了,下载几个小时百分之八十的时候下载失败伤不起啊),但是要做处理才能使用,于是我写了一些代码来处理数据,来保证和官方提供的pascal voc格式一致

将数据转换成TFRecord

创建 tfrecord

mkdir tfrecord

将上述制作的数据集打包成TFRecord,使用的是build_voc2012_data.py 在目录/home/bai/models/research/deeplab/datasets下执行

python build_voc2012_data.py 
--image_folder="/home/william/dataset/yoho/Images" --semantic_segmentation_folder="/home/william/dataset/yoho/masks" --list_folder="/home/william/dataset/yoho/index" 
--image_format="jpg" 
--output_dir="/home/william/dataset/yoho/tfrecord"
  • image_folder :数据集中原输入数据的文件目录地址
  • semantic_segmentation_folder:数据集中标签的文件目录地址
  • list_folder : 将数据集分类成训练集、验证集等的指引目录文件目录
  • image_format : 输入图片数据的格式,CamVid的是png格式
  • output_dir:制作的TFRecord存放的目录地址(自己创建)

网络训练

在datasets/data_generator.py文件中,添加camvid数据集描述:

_YOHO_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
     
        'train': 256,  # num of samples in images/training
        'val': 49,  # num of samples in images/validation
    },
    num_classes=3,
    ignore_label=255,
)

因为yoho共有3个classes

注册数据集

同时在datasets/data_generator.py文件,添加对应数据集的名称:

_DATASETS_INFORMATION = {
         
    'cityscapes': _CITYSCAPES_INFORMATION,    
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,    
    'ade20k': _ADE20K_INFORMATION,
    'yoho':_YOHO_INFORMATION, #自己的数据集 
    }

修改代码

因为是在DeepLab的基础上fine-tune我们自己的数据集,所以需要修改一些代码

修改train.py

其中有一些选项:

  • 使用预训练的所有权重,设置initialize_last_layer=True
  • 只使用网络的backbone,设置initialize_last_layer=False和 last_layers_contain_logits_only=False
  • 使用所有的预训练权重,除了logits以外。因为如果是自己的数据集,对应的classes不同(这个我们前面已经设置不加载logits), 可设置initialize_last_layer=False和 last_layers_contain_logits_only=True

修改train_utils.py

对应的utils/train_utils.py中,将关于 exclude_list 的设置修改,作用是在使用预训练权重时候,不加载该 logit 层:

exclude_list = ['global_step','logits'] 
if not initialize_last_layer:
 exclude_list.extend(last_layers)

下载预训练权重

因为我们的数据比较少,除了数据增强之外,使用别人训练好的模型做fineturn是一个好的选择
在model_zoo上下载预训练模型:
这里因为我打算将模型部署到移动端,所以我选择了专门给移动端设计的mobilenetv2
下载地址:https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md
我用的是deeplabv3_mnv2_cityscapes_train这个预训练模型

训练

我们现在基本需要的都准备好了,那就可以开始炼丹啦!
训练我们需要运行deeplab下的train.py这个脚本

python train.py
--logtostderr
--train_logdir=/home/william/model/models-master/models-master/research/deeplab/exp/yoho/train3
--dataset_dir=/home/william/dataset/yoho/tfrecord
--training_number_of_steps=100
--train_split="train"
--model_variant="mobilenet_v2"
--output_stride=16
--base_learning_rate=3e-5
--train_crop_size=513,513
--train_batch_size=2
--dataset="yoho"
--tf_initial_checkpoint=/home/william/model/models-master/models-master/research/deeplab/exp/yoho/train2/model.ckpt-520 

测试结果可视化

训练我们需要运行deeplab下的vis.py这个脚本

python vis.py
--vis_split="val"
--model_variant="mobilenet_v2"
--vis_crop_size=3000,2000
--output_stride=16
--checkpoint_dir="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/train3"
--dataset="yoho"
--colormap_type="pascal"
--vis_logdir="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/vis"
--dataset_dir="/home/william/dataset/yoho/tfrecord"

跑完了之后我们再/home/william/model/models-master/models-master/research/deeplab/exp/yoho/vis这个目录下就能看一下我们的模型的效果
deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第1张图片
大致看一下效果还可以

性能评估

评估我们需要运行deeplab下的vis.py这个脚本

python vis.py
--eval_split="val"
--model_variant="mobilenet_v2"
--eval_crop_size=3000,2000
--output_stride=16
--dataset="yoho"
--checkpoint_dir="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/train3"
--eval_logdir="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/eval"
--dataset_dir="/home/william/dataset/yoho/tfrecord"
--max_number_of_evaluations=1

最后会输出一个mloU值,这个值越高表示效果越好,我跑了一下mloU值能达到85左右还行

模型输出

模型输出我们需要运行deeplab下的export_model.py这个脚本

python export_model.py
--logtostderr
--checkpoint_path="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/train3/model.ckpt-100"
--export_path="/home/william/model/models-master/models-master/research/deeplab/exp/yoho/save/frozen_inference_graph.pb"
--model_variant="mobilenet_v2"
--num_classes=3

这个脚本会把tensor的变量转成常量,生成传说中的冻结图,就是frozen_inference_graph.pb这玩意

部署

我们有了frozen_inference_graph.pb模型文件之后我们就可以将我们的语义分割网络部署在各个平台了,接下来我会从安卓,linux,windos等各个平台来说怎么把一个论文提出的东西变成一个实实在在的产品

移动端

模型转换

移动端tensorflow的官网提供了适合手机的轻量封装库tensorflow lite,要使用这个tensorflow lite必须先将模型从.pb转换为.tfrecord,对于模型转换,tensorflow官网提供了一个工具。

模型转换的史诗级大坑

转换模型的时候有一个大坑,让我踩了快两礼拜,伤不起啊。
刚开始我模型是这样直接转换的

tflite_convert \
  --output_file=test.lite \
  --graph_def_file=frozen_inference_graph.pb \
  --input_arrays=ImageTensor \
  --output_arrays=SemanticPredictions \
  --input_shapes=1,3000,2000,3 \
  --inference_input_type=QUANTIZED_UINT8 \
  --inference_type=FLOAT \
  --mean_values=128 \
  --std_dev_values=128

先不说转换模型能不能转换成功,就算转换成功了,真正部署到各种平台的时候会报错
比如安卓端
deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第2张图片

这个问题卡了我半天,问同事,问主管,问tensorflow作者都没有解决我的问题
这个是我提的issue
https://github.com/tensorflow/tensorflow/issues/42622
找了很长时间都没有相关的资料
这边卡了很久
靠人不如靠自己
最后解决方法无意间看到了这个issue才解决了我的问题
https://github.com/tensorflow/tensorflow/issues/23747
主要原因是因为tensorflow的前操作和后处理tensorflow lite并不支持
所以如果直接转换模型的话会出现奇奇怪怪的错误
这个issue上是这么转换,sub_2是第二层而ResizeBilinear_2是倒数第二层,跳过了前处理和后处理
然后自己实现了前处理uint转float32和后处理argmax函数,问题解决

tflite_convert \
    --output_file=./deeplabv3_513.tflite \
    --graph_def_file=frozen_inference_graph.pb \
    --input_arrays=sub_2 \
    --output_arrays=ResizeBilinear_2 \
    --input_shapes=1,3000,2000,3 \
    --inference_type=FLOAT

最终转换方式

最后写一下我的转换方式,我是先给输入输出添加了一些signature,然后存成了SaveModel格式,其实和这个冻结图是一样的,不过带上了一些输入输出的信息

def export_saved_model(sess, input, output):
    output_path = 'seg/'
    print('Exporting trained model to ', output_path)
    builder = tf.saved_model.builder.SavedModelBuilder(output_path)
    input_tensor_info = tf.saved_model.utils.build_tensor_info(input)
    tensor_info_input = {
     'images': input_tensor_info}
    output_tensor_info = tf.saved_model.utils.build_tensor_info(output)
    tensor_info_outputs = {
     'output': output_tensor_info}

    preditction_signature = (
        tf.saved_model.signature_def_utils.build_signature_def(
            inputs=tensor_info_input,
            outputs=tensor_info_outputs,
            method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
        ))
    legcy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')

    builder.add_meta_graph_and_variables(
        sess, [tf.saved_model.tag_constants.SERVING],
        signature_def_map={
     
            'predict_images': preditction_signature
        })
    builder.save()
    print('Successfully export model to %s' % output_path)

然后再转成tensroflow lite可以使用的tfcord模型

tflite_convert \
    --output_file=./deeplabv3_513.tflite \
    --saved_model_dir==seg \
    --input_arrays=sub_2 \
    --output_arrays=ResizeBilinear_2 \
    --input_shapes=1,3000,2000,3 \
    --inference_type=FLOAT

量化(可选)

转成SaveModel之后转成tfcord模型之前可以做量化,可以使你的模型更小,跑的速度更快
量化代码

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_quantized_model)

编译tensorflow lite

首先我们得先把tensorflow的项目下载下来,为了速度我们还是先把项目fork到码云上

git clone https://gitee.com/yujiahao123/tensorflow   

这样就能较快的拿到tensorflow的源码
然后先下载tensorflow lite的依赖
这里有一些依赖是国外的资源所以可能得挂VPN有可能下载不了
运行tensorflow/tensorflow/lite/tools/make路径下的download_dependencies.sh脚本,这个脚本会把lite所需要的依赖下载到make下的download文件夹里
然后我们再下载bazel,这是一个类似于CMake一样的编译工具,我们需要这个东西来编译我们的tensorflow lite的库
我们运行下面这行脚本可以得到tensorflow lite的.so动态库
32bit armeabi-v7a:

bazel build -c opt --config=android_arm //tensorflow/lite:libtensorflowlite.so

64bit arm64-v8a:

bazel build -c opt --config=android_arm64 //tensorflow/lite:libtensorflowlite.so

有了这个动态库我们就可以快乐的使用他的接口啦!
我们需要将他的头文件添加进来,下面是官方的说明
Currently, there is no straightforward way to extract all header files needed,
so you must include all header files in tensorflow/lite/ from the TensorFlow
repository. Additionally, you will need header files from
FlatBuffers and
Abseil.
由此可见,我们需要tensorflow\lite还有FlatBuffers和Abseil的头文件
也就是我们之前download的目录下的downloads\flatbuffers\include和downloads\absl这两个目录

安卓端c++代码

#include 
#include 
#include
#include 
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/kernels/register.h"
#include "opencv2/opencv.hpp"
using namespace std;
using namespace cv;

string jstringTostring(JNIEnv* env, jstring jstr) {
     
    char *rtn = NULL;
    jclass clsstring = env->FindClass("java/lang/String");
    jstring strencode = env->NewStringUTF("GB2312");
    jmethodID mid = env->GetMethodID(clsstring, "getBytes", "(Ljava/lang/String;)[B");
    jbyteArray barr = (jbyteArray) env->CallObjectMethod(jstr, mid, strencode);
    jsize alen = env->GetArrayLength(barr);
    jbyte *ba = env->GetByteArrayElements(barr, JNI_FALSE);
    if (alen > 0) {
     
        rtn = (char *) malloc(alen + 1);
        memcpy(rtn, ba, alen);
        rtn[alen] = 0;
    }
    env->ReleaseByteArrayElements(barr, ba, 0);
    string stemp(rtn);
    free(rtn);
    return stemp;
}

//后处理找到softmax之后的最大概率值
static int Argmax(float* array, int size) {
     
    float max_value = -10000;
    int max_index = 0;
    for (int32_t i = 0; i < size; i++) {
     
        if (array[i] > max_value) {
     
            max_value = array[i];
            max_index = i;
        }
    }
    return max_index;
}

Mat RunInference(string picname){
     
    unique_ptr<tflite::FlatBufferModel> model;
    unique_ptr<tflite::Interpreter> interpreter;
    model = tflite::FlatBufferModel::BuildFromFile("/sdcard/model/deeplabv3_513.tflite");//我们转换好的模型,我这边直接拷贝到手机里面了,你们可以放到asset文件夹下面
    model->error_reporter();

    tflite::ops::builtin::BuiltinOpResolver resolver;
    tflite::InterpreterBuilder(*model,resolver)(& interpreter);

    interpreter->AllocateTensors();

    __android_log_print(ANDROID_LOG_INFO, "mydebug", "Success\n");


    int input = interpreter->inputs()[0];
    TfLiteIntArray* dims = interpreter->tensor(input)->dims;

    int height = dims->data[1];
    int width = dims->data[2];
    int channels = dims->data[3];

    Mat img = imread(picname);

    //__android_log_print(ANDROID_LOG_INFO, "mydebug", "height %d",img.rows);
    auto img_inputs = interpreter->typed_tensor<float>(input);
    //前处理,加赋值给tensor
    for(int i = 0;i<img.cols*img.rows*3;i++){
     
        img_inputs[i] = (img.data[i]- 128.0)/128.0;
    }
    interpreter->Invoke();
    int output = interpreter->outputs()[0];
    TfLiteIntArray* output_dims = interpreter->tensor(output)->dims;

    for(int i = 0;i<4;i++){
     
        cout<<output_dims->data[i]<<endl;
    }
    float* outputsoftmax = interpreter->typed_output_tensor<float>(0);
    int* outputlabel = new int[3000*2000];
    for(int i = 0;i<height*width;i++){
     
        outputlabel[i] = Argmax(outputsoftmax+3*i,3);
    }

    Mat mat = cv::Mat(3000, 2000, CV_8UC1);
    for (int i = 0; i < mat.rows; i++)
    {
     
        for (int j = 0; j < mat.cols; j++)
        {
     
            mat.at<uchar>(i, j) = (uchar)1.0*outputlabel[b] * 100;
        }
    }
    delete[] outputlabel;
    Mat im_color;
    applyColorMap(mat, im_color, cv::COLORMAP_JET); //采用colormap
    resize(im_color,im_color,cv::Size(320,480));
    return im_color;
}

extern "C" JNIEXPORT jintArray JNICALL
Java_com_example_myapplication_MainActivity_decodeFile(
        JNIEnv* env,
        jobject /* this */,jstring picname) {
     
    string picnam = jstringTostring(env,picname);
    Mat im_color = RunInference(picnam);
    int size = im_color.cols * im_color.rows *4;

    jbyte * outImage = new jbyte[size];

    __android_log_print(ANDROID_LOG_INFO, "mydebug", "h: %d",im_color.rows);
    cvtColor(im_color,im_color,CV_RGB2BGRA);
    jintArray result  = env->NewIntArray(im_color.cols * im_color.rows);
    env->SetIntArrayRegion(result,0,im_color.cols * im_color.rows,(jint *)im_color.data);
    return result;
}

安卓java代码

protected void onCreate(Bundle savedInstanceState) {
     
        super.onCreate(savedInstanceState);
        verifyStoragePermissions(this);
        // Example of a call to a native method
        setContentView(R.layout.activity_main);
        int[] test = decodeFile("/sdcard/pic/000014_image.png");//我们上面写好的的c++函数,通过JNI调用接口
        Bitmap result = Bitmap.createBitmap(320,480, Bitmap.Config.RGB_565);
        result.setPixels(test, 0, 320, 0, 0,320, 480);
        ImageView imageView = (ImageView) findViewById(R.id.imt_photo);
        imageView.setImageBitmap(result);//显示返回结果
    }

好了我们可以看一下效果
这是原图
deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第3张图片

这个是安卓端的效果图
deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第4张图片

部署到IOS端

需要MAC环境,在装虚拟机中,未完待续

桌面/PC

linux和windows通用的python

直接贴代码

import tflite_runtime.interpreter as tflite
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt # plt 用于显示图片
import cv2 as cv
interpreter = tf.lite.Interpreter(model_path='/home/william/CLionProjects/DEMO/model/deeplabv3_513.tflite')
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
testpic = cv.imread("000014_image.png")
testpic = (1.0*testpic-128)/128
testpic = testpic.reshape(1,3000,2000,3)
testpic = testpic.astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], testpic)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
test=output_data.argmax(axis=3)
test = test.reshape(3000,2000)
plt.imshow(test) # 显示图片
plt.axis('off') # 不显示坐标轴
plt.show()

效果

deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第5张图片

python可以用来实验模型效果,但是实际在工程中使用的时候为了效率还是得用c/c++

部署到linux端

部署到linux比部署到移动端更简单,运行了download_dependencies.sh这个脚本之后,再运行make目录下的build_lib.sh就可以生成再linux上可以运行的.so动态库了
linux端代码和安卓端代码基本上一模一样,安卓端多了一层JNI,linux端不用JNI是纯c++的,代码我就不放出来了,有需要的私聊
这是linux端的效果

deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第6张图片

部署到windows端

使用Tensorflow lite windows端c++接口是很麻烦的,因为我找了好几天没有找到相关的资料
部署到windows端我并没有使用tensorflow lite,因为刚开始的时候没有尝试移动端和linux端的编译,而是使用的windows作为实验,但是tensorflow官网里面也没有任何tensorflow lite在windows端源码如何编译的资料
虽然windows端的python是可以跑通的,但是如何用c++来调用python又是另一个问题了,而且tensorflow lite的代码本来就是c/c++的,用c++调用python再调用c++这样也有问题,无奈之下,我开始尝试直接使用tensorflow库,这边直接贴代码
在tensorflow/cc的文件夹下面新建一个myinteface的文件夹,然后封装代码和外层接口代码都放在里面
这是tensorflow封装层接口

//这个是tensorflow的推理封装库
#include "direct.h"
#include "tf_inference_lib.h"

TfInferenceLib::TfInferenceLib(model_params_t* model_params,
    tensor_array_t* input_tensors,
    tensor_array_t* output_tensors)
    : m_input_tensors(input_tensors)
    , m_output_tensors(output_tensors)
{
     
    if (NULL == model_params) {
     
        MY_ERROR("model params is null \n");
    }
    m_model_params = new model_params_t;
    if (NULL == m_model_params) {
     
        MY_ERROR("Alloc local model params error!!!\n");
    }
    memcpy(m_model_params, model_params, sizeof(model_params_t));
}

TfInferenceLib::~TfInferenceLib()
{
     
    delete m_model_params;
}

std::string TfInferenceLib::getCurrentModelDir()
{
     
    char buffer[256];
    getcwd(buffer, 256);
    std::string strDir = buffer;
    std::cout << "Currrent dir is " << strDir << std::endl;
    return strDir;
}

void TfInferenceLib::TensorDebugInfo(std::string tensor_name, tensorflow::Tensor& tensor)
{
     
    std::cout << "tensor name"
              << ":" << tensor_name << std::endl;
    std::cout << "shape:[";
    int rank = tensor.dims();
    for (int i = 0; i < rank; i++) {
     
        std::cout << tensor.dim_size(i) << ",";
    }
    std::cout << "]" << std::endl;
    tensorflow::DataType eTensorType = tensor.dtype();
    std::cout << "Tensor type: " << eTensorType << std::endl;
    std::cout << "Tensor data size: " << tensor.tensor_data().size() << std::endl;
    std::cout << tensor.SummarizeValue(tensor.NumElements(), true);
    std::cout << std::endl
              << std::endl;
}

result_t TfInferenceLib::parseOutputTensors(std::vector<tensorflow::Tensor>& tVecOutputs,
    tensor_array_t* output_tensor_array)
{
     
    assert(tVecOutputs.size() == output_tensor_array->nArraySize);
    for (int i = 0; i < output_tensor_array->nArraySize; i++) {
     
        tensorflow::Tensor cur_tf_tensor = tVecOutputs[i];
        tensor_t* cur_tensor = &(output_tensor_array->pTensorArray[i]);
        tensor_params_t* cur_tensor_info = cur_tensor->pTensorInfo;
        cur_tensor_info->type = (tensor_types_t)cur_tf_tensor.dtype();
        cur_tensor_info->nElementSize = cur_tf_tensor.NumElements();
        cur_tensor_info->nDims = cur_tf_tensor.dims();
        for (int j = 0; j < cur_tensor_info->nDims; j++) {
     
            cur_tensor_info->pShape[j] = cur_tf_tensor.dim_size(j);
        }
        assert(cur_tensor_info->nLength == cur_tf_tensor.tensor_data().size());
        memcpy(cur_tensor->pValue, cur_tf_tensor.tensor_data().data(), cur_tensor_info->nLength);
    }
    return SUCCESS;
}

result_t TfInferenceLib::tfLoadSavedModel()
{
     
    tensorflow::Status load_status;
    std::string strModelAbsolutePath = getCurrentModelDir() + "/" + m_model_params->model_path;
    if (tensorflow::MaybeSavedModelDirectory(strModelAbsolutePath)) {
     
        std::cout << "Saved model is exists, path is " << strModelAbsolutePath << std::endl;
    } else {
     
        std::cout << "Saved model is not exists! path is " << strModelAbsolutePath << std::endl;
        return FAILED;
    }
    //设定显存的参数
    if (m_model_params->gpu_memory_faction < 0.0001 || m_model_params->gpu_memory_faction > 1.0) {
     
        m_model_params->gpu_memory_faction = 0.99;
    }
    m_session_options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(
        m_model_params->gpu_memory_faction);
    m_session_options.config.mutable_gpu_options()->set_allow_growth(true);
    //Load model
    MY_DEBUG("Begin to load mode , path is  %s\n", strModelAbsolutePath.c_str());
    //load_status = tensorflow::LoadEncSavedModel(
    //    m_session_options, m_run_options, strModelAbsolutePath,
    //    { m_model_params->paModelTagSet }, &m_bundle,
    //    m_model_params->bIsCipher,
    //    m_model_params->gpu_id);
    load_status = tensorflow::LoadSavedModel(
        m_session_options, m_run_options, strModelAbsolutePath,
        {
      m_model_params->paModelTagSet }, &m_bundle);
    if (load_status.ok()) {
     
        std::cout << "Load model succeed!" << std::endl;
        return SUCCESS;
    } else {
     
        std::cout << "Error load model :" << load_status << std::endl;
        return MODEL_LOAD_FAILED;
    }
    return SUCCESS;
}

result_t TfInferenceLib::tfInferenceTensors()
{
     
    tensorflow::Status run_status;
    //拿到模型的函数签名
    const auto signature_def_map = m_bundle.meta_graph_def.signature_def();
    const auto signature_def = signature_def_map.at(m_input_tensors->pcSignatureDef);
    //准备输入Tensor矢量
    std::vector<tensorflow::Tensor> input_tensor_vec;
    std::vector<std::string> vecInputTensorNames;
    for (int j = 0; j < m_input_tensors->nArraySize; ++j) {
     
        tensor_t* cur_tensor = &(m_input_tensors->pTensorArray[j]);
        tensor_params_t* cur_tensor_info = cur_tensor->pTensorInfo;
        tensorflow::TensorShape cur_tensor_shape;
        for (int k = 0; k < cur_tensor_info->nDims; ++k) {
     
            cur_tensor_shape.AddDim(cur_tensor_info->pShape[k]);
        }
        if (cur_tensor_info->type == DT_UINT8) {
     
            tensorflow::Tensor input_tensor(tensorflow::DT_UINT8, cur_tensor_shape);
            memcpy(input_tensor.flat<u8>().data(), cur_tensor->pValue, cur_tensor_info->nLength);
            input_tensor_vec.push_back(input_tensor);
        } else if (cur_tensor_info->type == DT_INT32) {
     
            tensorflow::Tensor input_tensor(tensorflow::DT_INT32, cur_tensor_shape);
            memcpy(input_tensor.flat<int>().data(), cur_tensor->pValue, cur_tensor_info->nLength);
            input_tensor_vec.push_back(input_tensor);
        } else if (cur_tensor_info->type == DT_FLOAT) {
     
            tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, cur_tensor_shape);
            memcpy(input_tensor.flat<float>().data(), cur_tensor->pValue, cur_tensor_info->nLength);
            input_tensor_vec.push_back(input_tensor);
        } else if (cur_tensor_info->type == DT_STRING) {
     
            tensorflow::Tensor input_tensor;
            std::unique_ptr<tensorflow::TensorProto> upProto(new tensorflow::TensorProto());
            upProto->set_dtype(tensorflow::DataType::DT_STRING);
            upProto->add_string_val(cur_tensor->pValue, cur_tensor_info->nLength);
            upProto->mutable_tensor_shape()->add_dim()->set_size(1);
            if (!input_tensor.FromProto(*(std::move(upProto)))) {
     
                printf("Tensor[%d]::FromProto failed\n", j);
                return FAILED;
            }
            input_tensor_vec.push_back(input_tensor);
        }
        vecInputTensorNames.push_back(signature_def.inputs().at(cur_tensor_info->aTensorName).name());
    }
    //构造输入pair的矢量
    std::vector<std::pair<std::string, tensorflow::Tensor>> inputs;
    for (int i = 0; i < vecInputTensorNames.size(); i++) {
     
        inputs.push_back(std::make_pair(vecInputTensorNames[i], input_tensor_vec[i]));
    }
    //拿到输出Tensor真实的名字
    std::vector<std::string> vecWantOutTensorNames;
    std::vector<tensorflow::Tensor> tVecOutputs;
    for (int i = 0; i < m_output_tensors->nArraySize; i++) {
     
        std::string outputTensorName = m_output_tensors->pTensorArray[i].pTensorInfo->aTensorName;
        std::string outputTensorInternalName = signature_def.outputs().at(outputTensorName).name();
        vecWantOutTensorNames.push_back(outputTensorInternalName);
    }
    //进行推理
    MY_DEBUG("Begin to run session!\n");
    run_status = m_bundle.session->Run(inputs, vecWantOutTensorNames, {
     }, &tVecOutputs);
    if (run_status.ok()) {
     
        std::cout << "Session run succeed!!!\n";
    } else {
     
        std::cout << "Session run error: " << run_status << std::endl;
        return FAILED;
    }
    //解析输出
    result_t post_status = parseOutputTensors(tVecOutputs, m_output_tensors);
    return post_status;
}

这是外层接口

#define DLL_IMPLEMENT
#include 
#include "my_interface.h"
#include "tf_inference_lib.h"
result_t my_alloc_tensors(tensor_params_array_t* tensors_params,
    tensor_array_t** tensors)
{
     
    MY_CHECK_NULL(tensors_params, PARAM_NULL);
    tensor_array_t* ptTensorArray = new tensor_array_t;
    MY_CHECK_NULL(ptTensorArray, MEMORY_MALLOC_FAILED);
    ptTensorArray->nArraySize = tensors_params->nArraySize;
    ptTensorArray->pTensorArray = new tensor_t[tensors_params->nArraySize];
    MY_CHECK_NULL(ptTensorArray->pTensorArray, MEMORY_MALLOC_FAILED);
    for (int i = 0; i < ptTensorArray->nArraySize; i++) {
     
        int nDataSize = 0;
        tensor_params_t* curTensorParam = &(tensors_params->pTensorParamArray[i]);
        tensor_t* curTensor = &(ptTensorArray->pTensorArray[i]);
        switch (curTensorParam->type) {
     
        case DT_FLOAT:
            nDataSize = sizeof(float);
            break;
        case DT_DOUBLE:
            nDataSize = sizeof(double);
            break;
        case DT_INT32:
            nDataSize = sizeof(int);
            break;
        case DT_UINT8:
            nDataSize = sizeof(u8);
            break;
        case DT_STRING:
            nDataSize = sizeof(s8);
            break;
        case DT_BOOL:
            nDataSize = sizeof(bool);
            break;
        default:
            nDataSize = 1;
            break;
        }
        curTensorParam->nElementSize = 1;
        for (int j = 0; j < curTensorParam->nDims; j++) {
     
            curTensorParam->nElementSize *= curTensorParam->pShape[j];
        }
        curTensorParam->nLength = curTensorParam->nElementSize * nDataSize;
        curTensor->pValue = new u8[curTensorParam->nLength];
        MY_CHECK_NULL(curTensor->pValue, MEMORY_MALLOC_FAILED);
        curTensor->pTensorInfo = new tensor_params_t;
        memcpy(curTensor->pTensorInfo, curTensorParam, sizeof(tensor_params_t));
    }
    strcpy(ptTensorArray->pcSignatureDef, tensors_params->pcSignatureDef);
    *tensors = ptTensorArray;
    return SUCCESS;
}
result_t release_tensors(tensor_array_t* tensors)
{
     
    MY_CHECK_NULL(tensors, PARAM_NULL);
    for (int i = 0; i < tensors->nArraySize; i++) {
     
        tensor_t* curTensor = &(tensors->pTensorArray[i]);
        delete[]((u8*)(curTensor->pValue));
        delete (curTensor->pTensorInfo);
    }
    delete[] tensors->pTensorArray;
    delete tensors;
    return SUCCESS;
}
/**
 * 功能: 申请输入/输出tensor array的内存
 * 参数:
 *     input_tensors_params(in) : 输入的tensor参数结构体;
 *     output_tensors_params(in) : 输出的tensor参数结构体;
 *     input_tensors(out) :        申请的输入tensor数组;
 *     output_tensors(out) :       申请的输出tensor数组;
 **/
DLL_API result_t init_tensors(tensor_params_array_t* input_tensors_params,
    tensor_params_array_t* output_tensors_params,
    tensor_array_t** input_tensors,
    tensor_array_t** output_tensors)
{
     
    result_t result = SUCCESS;
    //分配输入Tensor数组内存
    result = my_alloc_tensors(input_tensors_params, input_tensors);
    if (result != SUCCESS) {
     
        MY_ERROR("Alloc input tensors error !!!\n");
    }
    //分配输出Tensor数组内存
    result = my_alloc_tensors(output_tensors_params, output_tensors);
    if (result != SUCCESS) {
     
        MY_ERROR("Alloc input tensors error !!!\n");
    }
    return result;
}
/**
 * 功能: 释放申请的tensor array的内存
 * 参数
 *     input_tensors(in) :        申请的输入tensor数组指针;
 *     output_tensors(in) :       申请的输出tensor数组指针;
 **/
DLL_API result_t deinit_tensors(tensor_array_t* input_tensors,
    tensor_array_t* output_tensors)
{
     
    result_t res = SUCCESS;
    res = release_tensors(input_tensors);
    if (res != SUCCESS) {
     
        MY_ERROR("Release input tensors error!!!\n");
    }
    res = release_tensors(output_tensors);
    if (res != SUCCESS) {
     
        MY_ERROR("Release output tensors error!!!\n");
    }
    return SUCCESS;
}
result_t gpu_card_visible(char* visible_card_id_list)
{
     
    char env_name[] = "CUDA_VISIBLE_DEVICES";
    const char* new_cuda_value = visible_card_id_list;
    char* old_cuda_value = getenv(env_name);
    if (NULL == old_cuda_value) {
     
        old_cuda_value = "";
    }
    std::cout << "old cuda value is :" << old_cuda_value << std::endl;
#ifdef _WIN32
    char tmp_env_exp[256] = {
      0 };
    sprintf(tmp_env_exp, "%s=%s", env_name, new_cuda_value);
    putenv(tmp_env_exp);
#else
    setenv(env_name, new_cuda_value, 1);
#endif
    char* new_seted_cuda_value = getenv(env_name);
    std::cout << "New setted cuda value is :" << new_seted_cuda_value;
    return SUCCESS;
}
/**
 * 功能: 根据模型参数装载tensorflow模型
 * 参数:
 *     model_param(in) : 模型的输入参数
 *     input_tensors(in) :        输入tensor数组指针;
 *     output_tensors(in) :       输出tensor数组指针;
 *     model_handle(out) :装载好的模型句柄 
 **/
DLL_API result_t load_model(model_params_t* load_model_param,
    tensor_array_t* input_tensors,
    tensor_array_t* output_tensors,
    model_handle_t* load_model_handle)
{
     
    TfInferenceLib* tfInferenceLib = new TfInferenceLib(load_model_param, input_tensors, output_tensors);
    gpu_card_visible(load_model_param->visibleCard);
    tfInferenceLib->tfLoadSavedModel();
    load_model_handle->model_handle = tfInferenceLib;
    return SUCCESS;
}
/**
 * 功能: 释放模型的内存
 * 参数:
 *      model_handle(in):要释放的模型句柄
**/
DLL_API result_t release_model(model_handle_t* load_model_handle)
{
     
    TfInferenceLib* tfInferenceLib = (TfInferenceLib*)load_model_handle->model_handle;
    delete tfInferenceLib;
    return SUCCESS;
}
/**
 *  功能:进行推理,推理后的结果放到output_tensors中
 *  参数:
 *       model_handle(in) 模型句柄
 **/
DLL_API result_t inference_tensors(model_handle_t* load_model_handle)
{
     
    TfInferenceLib* tfInferenceLib = (TfInferenceLib*)load_model_handle->model_handle;
    result_t res = tfInferenceLib->tfInferenceTensors();

    return res;
}

然后修改tensorflow/cc文件夹下面的BUILD文件
在最后加上

c_library(
    name = "my_tensorflow",
    srcs = [
        "my_inference/common.h",
        "my_inference/my_interface.cc",
        "my_inference/my_interface.h",
        "my_inference/tf_inference_lib.cc",
        "my_inference/tf_inference_lib.h",
    ],
    #linkshared = 1,
    deps = [
        ":cc_ops",
        ":client_session",
        ":coordinator",
        ":queue_runner",
        ":scope",
        "//tensorflow/cc/saved_model:constants",
        "//tensorflow/cc/saved_model:loader",
        "//tensorflow/cc/saved_model:signature_constants",
        "//tensorflow/cc/saved_model:tag_constants",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_internal",
        "//tensorflow/core:protos_all_cc",
        "//tensorflow/core:tensorflow",
    ],
)

然后再修改tensorflow的BUILD文件

tf_cc_shared_object(
    name = "tensorflow_cc",
    linkopts = select({
        "//tensorflow:macos": [
            "-Wl,-exported_symbols_list,$(location //tensorflow:tf_exported_symbols.lds)",
        ],
        "//tensorflow:windows": [],
        "//conditions:default": [
            "-z defs",
            "-Wl,--version-script,$(location //tensorflow:tf_version_script.lds)",
        ],
    }),
    per_os_targets = True,
    soversion = VERSION,
    visibility = ["//visibility:public"],
    # add win_def_file for tensorflow_cc
    win_def_file = select({
        # We need this DEF file to properly export symbols on Windows
        "//tensorflow:windows": ":tensorflow_filtered_def_file",
        "//conditions:default": None,
    }),
    deps = [
        "//tensorflow:tf_exported_symbols.lds",
        "//tensorflow:tf_version_script.lds",
        "//tensorflow/c:c_api",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/cc:scope",
        "//tensorflow/cc/profiler",
        "//tensorflow/cc:my_tensorflow",             #这是我们添加的接口
        "//tensorflow/core:tensorflow",
    ] + if_ngraph(["@ngraph_tf//:ngraph_tf"]),
)

用bazel编译tensorflow的tensorflow_cc,顺利的话我们就可以编译出来tensorflow_cc.dll和tensorflow_cc.lib

然后我们在我们自己的工程里就可以引入动态库和静态库调用我们的接口了
最外一层还得加一个总的decodeFile的接口我还没时间写

调用接口实现,下面是我的一个例子

#include 
#include 
#include 
#include "opencv2/opencv.hpp"
#include "DCS_DeepLearningRegionDetection.h"

#define BATH_SIZE 1

int main(int argc, char **argv)
{
     
	//申请输入输出内存
	tensor_params_array_t in_tensor_params_ar = {
      0 };
	tensor_params_array_t out_tensor_params_ar = {
      0 };
	tensor_array_t *input_tensor_array = NULL;
	tensor_array_t *output_tensor_array = NULL;

	//输入Tensor数组参数设置
	in_tensor_params_ar.nArraySize = 1;
	strcpy(in_tensor_params_ar.pcSignatureDef, "predict_images");
	in_tensor_params_ar.pTensorParamArray = (tensor_params_t *)malloc(
		in_tensor_params_ar.nArraySize * sizeof(tensor_params_t));

	tensor_params_t *cur_in_tensor_params = &(in_tensor_params_ar.pTensorParamArray[0]);
	cur_in_tensor_params->nDims = 4;
	cur_in_tensor_params->type = DT_UINT8;
	cur_in_tensor_params->pShape[0] = BATH_SIZE;
	cur_in_tensor_params->pShape[1] = 3000;  //H
	cur_in_tensor_params->pShape[2] = 2000; //W
	cur_in_tensor_params->pShape[3] = 3;    //channel
	strcpy(cur_in_tensor_params->aTensorName, "ImageTensor");

	//输出Tensor数组参数设置
	out_tensor_params_ar.nArraySize = 1;
	strcpy(out_tensor_params_ar.pcSignatureDef, "predict_images");
	out_tensor_params_ar.pTensorParamArray = (tensor_params_t *)malloc(
		out_tensor_params_ar.nArraySize * sizeof(tensor_params_t));
	tensor_params_t *cur_out_tensor_params0 = &(out_tensor_params_ar.pTensorParamArray[0]);
	cur_out_tensor_params0->type = DT_INT32;
	cur_out_tensor_params0->nDims = 3;
	cur_out_tensor_params0->pShape[0] = BATH_SIZE;
	cur_out_tensor_params0->pShape[1] = 3000;
	cur_out_tensor_params0->pShape[2] = 2000;
	strcpy(cur_out_tensor_params0->aTensorName, "SemanticPredictions");


	//调用API申请Tensor数组内存
	if (SUCCESS != init_tensors(&in_tensor_params_ar, &out_tensor_params_ar,
		&input_tensor_array, &output_tensor_array))
	{
     
		printf("Open tensor memory error\n");
	}

	//设置模型加载参数
	model_params_t tModelParams = {
      0 };
	model_handle_t tModelHandel = {
      0 };
	tModelParams.cpu_or_gpu = 0;

	strcpy(tModelParams.visibleCard, "0");
	//strcpy(tModelParam.visibleCard, "0,1");
	tModelParams.gpu_id = 0;

	tModelParams.gpu_memory_faction = 0.9;

	//tModelParams.bIsCipher = true;
	//strcpy(tModelParams.model_path, "models/object_detection_enc/1");

	tModelParams.bIsCipher = false;
	strcpy(tModelParams.model_path, "models/test/1");
	strcpy(tModelParams.paModelTagSet, "serve");

	//调用API装载模型
	if (load_model(&tModelParams, input_tensor_array, output_tensor_array, &tModelHandel) != SUCCESS)
	{
     
		printf("Load Model error!!!\n");
	}


	cv::Mat bgrImage, rgbImage;
	bgrImage = cv::imread("test_data/000014_image.png");
	cv::cvtColor(bgrImage, rgbImage, cv::COLOR_BGR2RGB);
	int img_size = rgbImage.rows * rgbImage.cols * rgbImage.channels();

	tensor_t *cur_input_tensor = &(input_tensor_array->pTensorArray[0]);
	tensor_params_t *cur_input_tensor_info = cur_input_tensor->pTensorInfo;
	std::cout << "Cur tensor value length: " << cur_input_tensor_info->nLength << std::endl;
	assert(img_size == cur_input_tensor_info->nLength);
	memcpy(cur_input_tensor->pValue, rgbImage.ptr<unsigned char>(0), img_size);

	printf("Call api to inferencing.....\n");
	inference_tensors(&tModelHandel);
	printf("End inference!!!\n");

	//打印推理结果
	tensor_t * cur_output_tensor_class = &(output_tensor_array->pTensorArray[0]);

	int *seg = (int*)cur_output_tensor_class->pValue;

	cv::Mat mat = cv::Mat(3000, 2000, CV_8UC1);
	int b = 0;
	for (int i = 0; i < mat.rows; i++)
	{
     
		for (int j = 0; j < mat.cols; j++)
		{
     
			mat.at<uchar>(i, j) = (uchar)1.0*seg[b] * 100;
			b++;
		}
	}
	cv::Mat im_color;
	cv::applyColorMap(mat, im_color, cv::COLORMAP_JET);
	//释放申请的Tensor数组内存
	deinit_tensors(input_tensor_array, output_tensor_array);

	release_model(&tModelHandel);

	free(in_tensor_params_ar.pTensorParamArray);
	free(out_tensor_params_ar.pTensorParamArray);

	system("pause");
}

这是window端Debug下用imagewatch看的效果
deeplabv3 + mobilenetv2 做语义分割并封装成c++部署到移动端,linux,windows等平台(史上最详细)_第7张图片

window端 tflite

tensorflow官方没有提供tflite的windows版本,在GitHub上找到了别人实现的window版本
https://github.com/qintao97/tensorflow_lite

未完待续,最后还剩IOS了

你可能感兴趣的:(tensorflow,lite,语义分割,图像,tensorflow,深度学习)