在做《智能计算系统》综合实验7-1-YOLOv3时,遇到了很多问题,实验书过程不全,现将整个实验流程梳理如下,以对其他读者有所裨益:
新建容器v7(非v7-update1)
1. 补全nms_detection.h
,实现函数:
#define T half // ./plugin_yolov3_detection_helper.h
__mlu_func__ void nms_detection(
int& output_box_num,
T* output_data,
Addr dst,
T* input_data_score,
T* input_data_box,
Addr src,
T *buffer,
int buffer_size,
T* sram,
SplitMode split_mode,
int input_box_num,
int input_stride,
int output_stride,
int keepNum,
T thresh_iou,
T thresh_score,
int save_method){...}
只需考虑以下情况:
src == NRAM
split_mode == NMS_BLOCK
save_method == 1
MODE == 1
此时,系统工作在单核模式下,输入数据存放在NRAM上,满足向量对其要求,且计算空间充分大,数据保存模式为score---, x1---, y1---, x2---, y2---。
上述条件下,在边界框保存阶段,每次搜索到的先保存在NRAM_save空间上,若存储的框个数M到达一定数量N(output_data_AddrType == NRAM ? N = 0 : N = 256)可批量拷贝到具体位置;当前max_box_Score <= thresh_score时,若目的空间为SRAM/GDRAM,将NRAM_save空间数据拷贝到目的空间并break,若为NRAM直接break。
if (output_data_AddrType != NRAM && output_box_num != 0)
{
...
if ((M == N) || (max_box_Score <= thresh_score))
{
__memcpy(...)
...
}
}
if (max_box_Score <= thresh_score) break;
2. 将/opt/code_chap_7_student/yolov3/bangc/PluginYolov3DetectionOutputOp
复制到/opt/code_chap_7_student/env/Cambricon-CNPlugin-MLU270/pluginops
路径下
3. 初始化环境
每次进入系统,都需进入env目录, 执行 source env.sh命令
cd /opt/code_chap_7_student/env
source env.sh
4. 编译
cd /opt/code_chap_7_student/env/Cambricon-CNPlugin-MLU270
./build_cnplugin.sh
5. 将./build/libcnplugin.so
复制到../neuware/lib64
cp ./build/libcnplugin.so /opt/code_chap_7_student/env/neuware/lib64
6. 将./pluginops/PluginYolov3DetectionOutputOp/cnplugin.h
复制到./neuware/include
cp ./pluginops/PluginYolov3DetectionOutputOp/cnplugin.h /opt/code_chap_7_student/env/neuware/include
补全/opt/code_chap_7_student/yolov3/tf-implementation/tf-1.14-detectionoutput
目录下文件。
这部分内容的具体介绍见TensorFlow的自定义算子实现。
1. MLULib封装
/*
mlu_lib_ops.h //line 924
mlu_lib_ops.cc //line 1918
*/
tensorflow::Status CreateYolov3DetectionOutputOp(
MLUBaseOp** op,
MLUTensor** input_tensors,
MLUTensor** output_tensors,
cnmlPluginYolov3DetectionOutputOpParam_t param){...}
tensorflow::Status ComputeYolov3DetectionOutputOp(
MLUBaseOp* op,
MLUCnrtQueue* queue,
void* inputs[],
int input_num,
void* outputs[],
int output_num){...}
2. MLUOp封装
/*
mlu_ops.h //line 530
*/
struct MLUYolov3DetectionOutputOpParam{};
DECLARE_OP_CLASS(MLUYolov3DetectionOutput);
/*
yolov3detectionoutput.cc //line 12
*/
Status MLUYolov3DetectionOutput::CreateMLUOp(std::vector &inputs, std::vector &outputs, void *param){...}
Status MLUYolov3DetectionOutput::Compute(const std::vector &inputs,
const std::vector &outputs, cnrtQueue_t queue){...}
3. MLUStream封装
/*
mlu_stream.h //line 141
*/
Status Yolov3DetectionOutput(
OpKernelContext* ctx,
Tensor* tensor_input0,
Tensor* tensor_input1,
Tensor* tensor_input2,
int batchNum,
int inputNum,
int classNum,
int maskGroupNum,
int maxBoxNum,
int netw,
int neth,
float confidence_thresh,
float nms_thresh,
int* inputWs,
int* inputHs,
float* biases,
Tensor* output1,
Tensor* output2){...}
4. MLUOpKernel封装
/*
yolov3_detection_output_op_mlu.h //line 49
*/
void ComputeOnMLU(OpKernelContext* context) override{...}
/*
yolov3_detection_output_op.cc //line 23
*/
namespace tensorflow{...}
5. 算子注册
/*
image_ops.cc //line 1007
*/
REGISTER_OP("Yolov3DetectionOutput"){...}
1. BUILD修改(已完成)
2. 将/opt/code_chap_7_student/yolov3/tf-implementation/tf-1.14-detectionoutput
下各文件依次放入对应文件夹,可利用cp
命令:
cp ./tf-implementation/tf-1.14-detectionoutput/BUILD ../env/tensorflow-v1.10/tensorflow/core/kernels/BUILD
cp ./tf-implementation/tf-1.14-detectionoutput/image_ops.cc ../env/tensorflow-v1.10/tensorflow/core/ops/image_ops.cc
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3_detection_output_op.cc ../env/tensorflow-v1.10/tensorflow/core/kernels/yolov3_detection_output_op.cc
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3_detection_output_op_mlu.h ../env/tensorflow-v1.10/tensorflow/core/kernels/yolov3_detection_output_op_mlu.h
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_lib_ops.cc ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/lib_ops/mlu_lib_ops.cc
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_lib_ops.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/lib_ops/mlu_lib_ops.h
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_ops.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/ops/mlu_ops.h
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3detectionoutput.cc ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/ops/yolov3detectionoutput.cc
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_stream.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_stream.h
3. 框架编译
rm -rf /root/.cache/bazel/_bazel_root
cd /opt/code_chap_7_student/env/tensorflow-v1.10
./build_tensorflow-v1.10_mlu.sh
在编译时,要先删除/root/.cache/bazel/_bazel_root
文件夹,若报错,则删除/root/.cache/bazel/_bazel_root/*
重试
1. pb->pbtxt
1)将/opt/Cambricon-Test/models/yolov3/
目录下yolov3_int8_bang_shape_new.pb
复制到/opt/code_chap_7_student/yolov3/yolov3-bcl/demo
目录
cd /opt/code_chap_7_student/yolov3/yolov3-bcl/demo
cp /opt/Cambricon-Test/models/yolov3/yolov3_int8_bang_shape_new.pb ./
2)将.pb
转为.pbtxt
python /opt/code_chap_7_student/tools/pb_to_pbtxt/pb_to_pbtxt.py yolov3_int8_bang_shape_new.pb yolov3_int8_bang_shape_new.pbtxt
2. 修改yolov3_int8_bang_shape_new.pbtxt
,添加node{...}
执行该过程时,由于.pbtxt文件过大,打开文件并修改导致连接断开,可利用shell命令添加相关内容。例如,将node{...}存放在pb_node_append.txt文件中(包含library{...}),类似:
//pb_node_append.txt
node {...}
library {...}
执行
sed -i "/^library/,/^38\n}$/d" yolov3_int8_bang_shape_new.pbtxt
cat ./pb_node_append.txt >> yolov3_int8_bang_shape_new.pbtxt
3. pbtxt->pb
python /opt/code_chap_7_student/tools/pbtxt_to_pb/pbtxt_to_pb.py ./yolov3_int8_bang_shape_new.pbtxt yolov3_int8.pb
4. 修改./run_evaluate.sh
中MODEL_PATH="./yolov3_int8.pb"
5. 运行./run_aicse.sh
六、可能遇到的问题
1. 框架编译时执行./build_tensorflow-v1.10_mlu.sh,fetching不通过,删除/root/.cache/bazel/_bazel_root/*
重试
2. ***cannot find ,需要初始化环境变量,即在./env目录下执行source env.sh命令,需要注意的是,每一次登录后进入开发容器均需要执行source env.sh命令
3. 框架编译时类似如下错误
Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Exit 1): bash failed: error executing command
/opt/code_chap_7_student/env/tensorflow-v1.10/tensorflow/python/keras/api/BUILD:28:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1 failed (Exit 1): bash failed: error executing command
需要确定是否在对CNPlugin编译后,将./build/libcnplugin.so
复制到./neuware/lib64。
七、其他
1.
开发手册下载:文档中心 – 寒武纪开发者社区
2. BUILD文件语法:bazel C++语法入门 - 简书
3. PB文件格式:Tensorflow模型持久化与恢复_jinying2224的博客-CSDN博客