自上而下解读ncnn系列(1):加载param模型和bin文件前向传播

由于这段时间着手实现tensorflow到ncnn的转换,开发过程中对ncnn框架有了一定的认识,特此分享。

关于tensorflow2ncnn的具体细节和步骤,可以参考我的github:

https://github.com/hanzy88/tensorflow2ncnn

目前已支持CNN+FC的转换,并且基于full cnn/mobilenetv2的yolov3已测试成功,由于一定的精度损失结果基本正确。测试在jetson nano上运行成功(因为full cnn的yolov3太大导致运行非常!卡顿)。

言归正传,之前看ncnn源码的时候从mat这些着手,虽然看了一些但实在没有具体的概念,于是这个系列文章主要自上而下从模型和权重的传入开始解读ncnn的前向传播。

目录

    • 一. 模型和权重文件介绍
      • 模型加载及提取输出demo
    • 二. param模型读入
    • 三. bin文件读入
    • 四. 前向传递

一. 模型和权重文件介绍

在完成ncnn模型转换后,一般会得到两个文件:

ncnn.param
ncnn.bin

其中param存放的是模型结构,bin存放的是类似卷积这些op的权重文件。

param的结构如下:

7767517
3 3
Input         input    0 1 data 0=4 1=4 2=1
InnerProduct  ip       1 1 data fc 0=10 1=1 2=80
Softmax       softmax  1 1 fc prob 0=0

第一行是magic num,指定7767517
第二行第一个是layer数量,第二个blob数量,其中blob是层与层之间传递的数据结构,定义在blob.h, blob.cpp, 如下:

class Blob
{
public:
    // empty
    Blob();

public:
#if NCNN_STRING
    // blob name
    std::string name;
#endif // NCNN_STRING
    // layer index which produce this blob as output 记录输出层的索引
    int producer;
    // layer index which need this blob as input 记录输入层的索引
    std::vector consumers;
};

第三行至最后记录了每一层的传递信息,以第三行为例:
1列是当前层的op, 2列是层名,3,4列分别是输入和输出节点数,之后string就是输入节点名称和输出节点名称。再之后的数字是每一层传入的常量参数,具体继续往下看:

[layer type] [layer name] [input count] [output count] [input blobs] [output blobs] [layer specific params]

layer type : type name, such as Convolution Softmax etc
layer name : name of this layer, must be unique among all layer names
input count : count of the blobs this layer needs as input
output count : count of the blobs this layer produces as output
input blobs : name list of all the input blob names, seperated by space, must be unique among input blob names of all layers
output blobs : name list of all the output blob names, seperated by space, must be unique among output blob names of all layers
layer specific params : key=value pair list, seperated by space

层参数:

0=1 1=2.5 -23303=2,2.0,3.0

key index should be unique in each layer line, pair can be omitted if the default value used

the meaning of existing param key index can be looked up at https://github.com/Tencent/ncnn/wiki/operation-param-weight-table

integer or float key : index 0 ~ 19
integer value : int
float value : float
integer array or float array key : -23300 minus index 0 ~ 19
integer array value : [array size],int,int,...,int
float array value : [array size],float,float,...,float

来自官方的解释,如果你还是不太理解,就以我在tensorflow2ncnn.cpp中定义Range的参数读取代码为例:

else if(node.op() == "Range"){
            const tensorflow::TensorProto& start = weights[node.input(0)];
            const tensorflow::TensorProto& limit = weights[node.input(1)];
            const tensorflow::TensorProto& delta = weights[node.input(2)];

            const int * start_data = reinterpret_cast(start.int_val().begin()); 
            const int * limit_data = reinterpret_cast(limit.int_val().begin()); 
            const int * delta_data = reinterpret_cast(delta.int_val().begin()); 


            fprintf(pp, " 0=%d", *start_data);
            fprintf(pp, " 1=%d", *limit_data);
            fprintf(pp, " 2=%d", *delta_data);
        }

这里的0,1,2是可以自己随意定义的,但是你需要知道它具体代表的含义,以便在前向传递的时候根据定义的 index(i.e.,0,1,2) 取值操作,所以同样是index 0在不同层可能具有不同含义。

bin文件结构如下:

  +---------+---------+---------+---------+---------+---------+
  | weight1 | weight2 | weight3 | weight4 | ....... | weightN |
  +---------+---------+---------+---------+---------+---------+
  ^         ^         ^         ^
  0x0      0x80      0x140     0x1C0

the model binary is the concatenation of all weight data, each weight buffer is aligned by 32bit.

weight buff的结构如下:

[flag] (optional)
[raw data]
[padding] (optional)

flag : unsigned int, little-endian, indicating the weight storage type, 0 => float32, 0x01306B47 => float16, otherwise => quantized int8, may be omitted if the layer implementation forced the storage type explicitly
raw data : raw weight data, little-endian, float32 data or float16 data or quantized table and indexes depending on the storage type flag
padding : padding space for 32bit alignment, may be omitted if already aligned

refer to:
https://github.com/Tencent/ncnn/wiki/param-and-model-file-structure

模型加载及提取输出demo

在得到模型文件后,需要构建相应的net对象,载入模型和权重,完成输入后即可获取指定层的输出,如下:

	ncnn::Net net;
    net.load_param("ncnn.param");
    net.load_model("ncnn.bin");

    const int target_size = 227;
    int img_w = bgr.cols;
    int img_h = bgr.rows;
    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, bgr.cols, bgr.rows, target_size, target_size);

    ncnn::Extractor ex = net.create_extractor();
    ex.input("input", in);

    ncnn::Mat out;
    ex.extract("softmax", out);

(param模型和bin文件的读取都在net.h和net.cpp中定义)

二. param模型读入

以 Net::load_param(FILE* fp)为例:

int Net::load_param(FILE* fp)
{
	//读取magic number
    int magic = 0;
    int nbr = fscanf(fp, "%d", &magic);
    if (nbr != 1)
    {
        LOG_HAN;
        fprintf(stderr, "issue with param file\n");
        return -1;
    }
    if (magic != 7767517)
    {
        fprintf(stderr, "param is too old, please regenerate\n");
        return -1;
    }

    // parse 读取layer_count和blob_count
    int layer_count = 0;
    int blob_count = 0;
    nbr = fscanf(fp, "%d %d", &layer_count, &blob_count);
    if (nbr != 2 || layer_count <= 0 || blob_count <= 0)
    {
        //LOG_HAN;
		fprintf(stderr, "nbr %d, layer_count %d, blob_count %d", nbr, layer_count, blob_count);
		fprintf(stderr, "issue with param file\n");
        return -1;
    }

    layers.resize((size_t)layer_count);
    blobs.resize((size_t)blob_count);

    ParamDict pd;

    int blob_index = 0;
    for (int i=0; itype = std::string(layer_type);
        layer->name = std::string(layer_name);
//         fprintf(stderr, "new layer %d %s\n", i, layer_name);
		
        layer->bottoms.resize(bottom_count);
		//创建输入节点并存入当前层中
        for (int j=0; jbottoms[j] = bottom_blob_index;
        }
		//创建输出节点并存入当前层中
        layer->tops.resize(top_count);
        for (int j=0; jtops[j] = blob_index;

            blob_index++;
        }

        // layer specific params 这里取值的是定义的常量层参数
        // 调用ParamDict::load_param取出所有定义的值
        int pdlr = pd.load_param(fp);
        if (pdlr != 0)
        {
            fprintf(stderr, "ParamDict load_param failed\n");
            continue;
        }
		
		//调用具体op层重载的loar_param函数
		//取出指定需要的值
        int lr = layer->load_param(pd);
        if (lr != 0)
        {
            fprintf(stderr, "layer load_param failed\n");
            continue;
        }

        layers[i] = layer;
    }

    return 0;
}

在加载每层具体参数这一块,同样以定义的Range为例(在src/layer/tfrange.h和tfrang.cpp中):

int TFRange::load_param(const ParamDict& pd)
{
    start = pd.get(0, 0);
    limit = pd.get(1, 1);
    delta = pd.get(2, 1);
    //fprintf(stderr, "slices: %d %d %d \n", start, limit, delta);
    return 0;
}

即可取出转换时定义的三个值。

三. bin文件读入

以 Net::load_model(FILE* fp)为例:

int Net::load_model(FILE* fp)
{
    if (layers.empty())
    {
        fprintf(stderr, "network graph not ready\n");
        return -1;
    }

    // load file
    int ret = 0;

    ModelBinFromStdio mb(fp); //mb的拷贝构造,定义在modelbin.h和modelbin.cpp
    //依次读入层
    for (size_t i=0; iload_model(mb);
        if (lret != 0)
        {
            fprintf(stderr, "layer load_model %d failed\n", (int)i);
            ret = -1;
            break;
        }

        int cret = layer->create_pipeline(opt);
        if (cret != 0)
        {
            fprintf(stderr, "layer create_pipeline %d failed\n", (int)i);
            ret = -1;
            break;
        }
    }

    fuse_network();

    return ret;
}

其中,权重文件并未每层都用,param中的层具体参数也一样,需要在转换时根据具体op定义,可以参照tensorflow2ncnn.cpp中的FusedBatchnorm:

const tensorflow::TensorProto& scale = weights[node.input(1)];
            const tensorflow::TensorProto& B = weights[node.input(2)];
            const tensorflow::TensorProto& mean = weights[node.input(3)];
            const tensorflow::TensorProto& var = weights[node.input(4)];

            int channels = scale.tensor_shape().dim(0).size(); // data size
            //fprintf(stderr, "channels: %d\n", channels);
            int dtype = scale.dtype();

            switch (dtype){
                case 1: //float
                {
                    float * scale_tensor = (float *)malloc(sizeof(float) * channels);
                    float * mean_tensor = (float *)malloc(sizeof(float) * channels);
                    float * var_tensor = (float *)malloc(sizeof(float) * channels);
                    float * b_tensor = (float *)malloc(sizeof(float) * channels);
                    const float * scale_data = reinterpret_cast(scale.tensor_content().c_str());
                    const float * mean_data = reinterpret_cast(mean.tensor_content().c_str());
                    const float * var_data = reinterpret_cast(var.tensor_content().c_str());
                    const float * b_data = reinterpret_cast(B.tensor_content().c_str());
                    
                    for(int i=0;i

写入均值,方差和缩放平移因子后,前向传播时在batchnorm层中(batchnorm.cpp)可以通过load_model读取:

int BatchNorm::load_model(const ModelBin& mb)
{
    slope_data = mb.load(channels, 1);
    if (slope_data.empty())
        return -100;

    mean_data = mb.load(channels, 1);
    if (mean_data.empty())
        return -100;

    var_data = mb.load(channels, 1);
    if (var_data.empty())
        return -100;

    bias_data = mb.load(channels, 1);
    if (bias_data.empty())
        return -100;

    a_data.create(channels);
    if (a_data.empty())
        return -100;
    b_data.create(channels);
    if (b_data.empty())
        return -100;

    for (int i=0; i

四. 前向传递

定义ncnn::Extractor对象后,首先传入输入层:

// 这一块比较简单,根据输入名称获取input op的index,将传入数据赋值
int Extractor::input(const char* blob_name, const Mat& in)
{
    int blob_index = net->find_blob_index_by_name(blob_name);
    if (blob_index == -1)
        return -1;

    return input(blob_index, in);
}

// 在create_extractor中会构建一个blob_count大小的blob_mats,即输出mat
int Extractor::input(int blob_index, const Mat& in)
{
    if (blob_index < 0 || blob_index >= (int)blob_mats.size())
        return -1;

    blob_mats[blob_index] = in;

    return 0;
}

接着就是提取输出层,输出层可以随意取其中定义的任意层,且extract算过的layer不会重复计算,如果后续的extract涉及到没有计算过的layer,还是会消耗一些时间来计算的。

因为代码过长,截取部分说明:

// 同样获取输出层的index传入
int Extractor::extract(const char* blob_name, Mat& feat)
{
    int blob_index = net->find_blob_index_by_name(blob_name);
    if (blob_index == -1)
        return -1;

    return extract(blob_index, feat);
}

// 在create_extractor中会构建一个blob_count大小的blob_mats,即输出mat
int Extractor::extract(int blob_index, Mat& feat)
{
    if (blob_index < 0 || blob_index >= (int)blob_mats.size())
        return -1;

    int ret = 0;
	//获得需要获取层的输出index
    if (blob_mats[blob_index].dims == 0)
    {
        int layer_index = net->blobs[blob_index].producer;
        ret = net->forward_layer(layer_index, blob_mats, opt);
   }
  //得到当前获取层Index对应的结果
  feat = blob_mats[blob_index];
  return ret;
}      

其中forward_layer作为层之间的参数传递:

int Net::forward_layer(int layer_index, std::vector& blob_mats, Option& opt) const
{
	//由输出层逐渐往前递归调用,将结果存入对应层index的blob_mats
    const Layer* layer = layers[layer_index];

	//每层layer的one_blob_only属性,为true时说明时单输入单输出
    if (layer->one_blob_only)
    {
        // load bottom blob
        int bottom_blob_index = layer->bottoms[0];
        int top_blob_index = layer->tops[0];

        if (blob_mats[bottom_blob_index].dims == 0)
        {
        	//依次获取前一层输入
            int ret = forward_layer(blobs[bottom_blob_index].producer, blob_mats, opt);
            if (ret != 0)
                return ret;
        }

        Mat bottom_blob = blob_mats[bottom_blob_index];

        if (opt.lightmode)
        {
            // delete after taken in light mode
            blob_mats[bottom_blob_index].release();
            // deep copy for inplace forward if data is shared
            if (layer->support_inplace && *bottom_blob.refcount != 1)
            {
                bottom_blob = bottom_blob.clone();
            }
        }

        // forward 支持support_inplace属性,即输出mat替代输入mat
        if (opt.lightmode && layer->support_inplace)
        {
            Mat& bottom_top_blob = bottom_blob;
			 //调用当前layer重载的farward_inplace
            int ret = layer->forward_inplace(bottom_top_blob, opt);
            if (ret != 0)
                return ret;

            // store top blob
            blob_mats[top_blob_index] = bottom_top_blob;
        }
        else
        {
        	//否则即重载输入mat和输出mat的forwad函数
            Mat top_blob;
            int ret = layer->forward(bottom_blob, top_blob, opt);
            if (ret != 0)
                return ret;

            // store top blob
            blob_mats[top_blob_index] = top_blob;
        }

    }
    else
    {
        // load bottom blobs
        //多输入,多输出/多输入,单输出/单输入,多输出的情况
        std::vector bottom_blobs(layer->bottoms.size());
        for (size_t i=0; ibottoms.size(); i++)
        {
            int bottom_blob_index = layer->bottoms[i];

            if (blob_mats[bottom_blob_index].dims == 0)
            {
                int ret = forward_layer(blobs[bottom_blob_index].producer, blob_mats, opt);
                if (ret != 0)
                    return ret;
            }

            bottom_blobs[i] = blob_mats[bottom_blob_index];

            if (opt.lightmode)
            {
                // delete after taken in light mode
                blob_mats[bottom_blob_index].release();
                // deep copy for inplace forward if data is shared
                if (layer->support_inplace && *bottom_blobs[i].refcount != 1)
                {
                    bottom_blobs[i] = bottom_blobs[i].clone();
                }
            }
        }

        // forward
        if (opt.lightmode && layer->support_inplace)
        {
            std::vector& bottom_top_blobs = bottom_blobs;
            int ret = layer->forward_inplace(bottom_top_blobs, opt);
            if (ret != 0)
                return ret;

            // store top blobs
            for (size_t i=0; itops.size(); i++)
            {
                int top_blob_index = layer->tops[i];

                blob_mats[top_blob_index] = bottom_top_blobs[i];
            }
        }
        else
        {
            std::vector top_blobs(layer->tops.size());
            int ret = layer->forward(bottom_blobs, top_blobs, opt);
            if (ret != 0)
                return ret;

            // store top blobs
            for (size_t i=0; itops.size(); i++)
            {
                int top_blob_index = layer->tops[i];

                blob_mats[top_blob_index] = top_blobs[i];
            }
        }
    }

    return 0;
}

以上就是ncnn根据param和bin前向传递的整体过程。

未完待续

你可能感兴趣的:(深度学习)