semanticFusion_训练替换其他网络个人记录

semanticFusion(SF)替换其他网络常见问题总结:

分为两种方式,第一种直接替换,第二种部分替换,第一种方便快捷,第二种是要学习caffe必须要掌握的。

-------------------------分割线----------------------------

第一种直接替换:

caffe_semanticfusion模块,可直接替换其他caffe版本语义分割网络,只需要将改一下名字(eg:deeplab-public-ver2->caffe_semanticfusion),提换SF原有模块,然后在此基础上训练,将生成好的caffemodel放入替换对应位置。

如果只是rgb的网络,不使用深度信息,需要更改/semanticfusion/src/main.cpp中的路径。

//caffe.Init("../caffe_semanticfusion/models/nyu_rgbd/inference.prototxt","../caffe_semanticfusion/models/nyu_rgbd/inference.caffemodel");
  // This is for the RGB network
  caffe.Init("../caffe_semanticfusion/models/nyu_rgb/inference.prototxt","../caffe_semanticfusion/models/nyu_rgb/inference.caffemodel");

比如直接替换deeplabv2:

但可能会遇到一些问题:

1.cudn不匹配问题,下边有解决方案。

2.  ./include/caffe/common.cuh(9): error: function "atomicAdd(double *, double)" has already been defined,解决方案here

3.解决方案

../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarCreate'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_CreateVer'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarWrite'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarFree'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadInfo'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Close'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadDataLinear'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Open'
collect2: error: ld returned 1 exit status
caffe_semanticfusion/tools/CMakeFiles/upgrad

4.权限 / 库冲突  (简单点的,就整个sudo su,在root下,先编译caffe_semanticfusion,再编译外边的semanticfusion.

/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadDirectory@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFWriteEncodedStrip@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFIsTiled@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFOpen@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadEncodedStrip@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetField@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFWriteScanline@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFGetField@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFScanlineSize@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFNumberOfStrips@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetWarningHandler@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetErrorHandler@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadEncodedTile@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadRGBATile@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFClose@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFRGBAImageOK@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFClientOpen@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadRGBAStrip@LIBTIFF_4.0'

 

-------------------------分割线----------------------------

第二种部分替换:

如果想自己增添修改网络,建议自己安装Meld对比软件(apt-get install),查找学习各个caffe开源网络,替换相关层与数据。

下面不是直接替换

下面是部分替换

建议上面的方法最便捷,但如果要用caffe学习,就要学会自己设计。

1.编译caffe_semanticfusion的python接口

SF里的caffe是原本用cmake来编译的,配置文件是CMakeLists.txt, 但我使用本地caffe文件里的Makefile.config来配置,使用 make all -j8 命令直接在caffe_semanticfusion里编译,不新建build使用cmake,因为SF里的caffe并没有编译python接口和tools等,当然你也可以你也可以选择修改CMakeLists.txt文件。

这个操作中一般会遇到此问题:

PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/solvers/nesterov_solver.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/nesterov_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t)’:
./include/caffe/util/cudnn.hpp:21:10: warning: enumeration value ‘CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING’ not handled in switch [-Wswitch]
   switch (status) {
          ^
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:108:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
       pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
                                                                      ^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
     cudnnStatus_t status = condition; \
                            ^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
                 from ./include/caffe/util/device_alternate.hpp:40,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/nesterov_solver.cpp:3:
/usr/local/cuda/include/cudnn.h:500:27: note: declared here
 cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
                           ^
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/nesterov_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
         pad_h, pad_w, stride_h, stride_w));
                                         ^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
     cudnnStatus_t status = condition; \
                            ^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
                 from ./include/caffe/util/device_alternate.hpp:40,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/nesterov_solver.cpp:3:
/usr/local/cuda/include/cudnn.h:952:27: note: declared here
 cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
                           ^
Makefile:552: recipe for target '.build_release/src/caffe/solvers/nesterov_solver.o' failed
make: *** [.build_release/src/caffe/solvers/nesterov_solver.o] Error 1

cudnn版本不匹配,参考这里解决。我只是把当前成功安装的caffe里的相关文件替换了这里的。-

make all -j8

make pycaffe -j8

编译好python接口后用draw_net.py绘制了inference.prototxt的网路结构。

注:每次添加新的需要make clean清除之前make的文件。

 

部分截图:

semanticFusion_训练替换其他网络个人记录_第1张图片

 

2.编译semanticfusion

我这里只用rgb网络,不带深度,更改/src/main.cpp,使用下边那个,注意路径。

// This is for the RGB-D network
  caffe.Init("../caffe_semanticfusion/models/nyu_rgbd/inference.prototxt","../caffe_semanticfusion/models/nyu_rgbd/inference.caffemodel");
  // This is for the RGB network
  //caffe.Init("../caffe/models/nyu_rgb/inference.prototxt","../caffe/models/nyu_rgb/inference.caffemodel");
cd semanticfusion
mkdir build
cd build
cmake ..
make -j16

有时候会遇到关于cv的一组报错,make clean后用sudo su再编译,最高权限可以解决问题。

../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarCreate'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_CreateVer'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarWrite'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarFree'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadInfo'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Close'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadDataLinear'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Open'
collect2: error: ld returned 1 exit status
caffe_semanticfusion/tools/CMakeFiles/upgrad

3.制作训练集

流程:

-拍摄图片

-labelme标注图片

-缩放图片

-计算均值 R_mean is 90.402622, G_mean is 76.220016, B_mean is 72.216413 vs  常见的[104, 117, 123]

额,查了一下,caffe不支持多标签分类,需要更改,我就先不使用lmdb格式了,知乎上有这个问题的讨论。使用nyuv2的训练数据,省去拍摄,标注,缩放这些过程。

这里注意,官网的nyuv2数据集有459个分类,我使用的是13个分类的nyuv2数据集.

 

4.网络的输入

由于标签是语义图像,采用开源代码中的ImageSegDataLayer层来实现读入数据与标签(水~),我用的deeolabv2的代码,当然还有DenseImageData(segnet)等等,通过替换(这里ImageSegDataLayer层替换成功了,DenseImageData没有成功)

image_seg_data_layer.hpp/cpp,

base_data_layer.hpp/cpp/cu,

data_transformer.cpp/hpp,

io.cpp/hpp,(参数不一致)

error: no matching function for call to ‘ReadImageToCVMat(std::__cxx11::basic_string&, const int&, const int&, bool, bool)’
                                     new_height, new_width, false, true);

更改caffe.proto等操作扩展caffe本身。(data_layers问题通过检查路径或者替换)

对比文件过程使用meld软件,很方便,我是边替换边编译caffe_semanticfusion,然后出错make clean 后不断操作,慢慢了解了caffe的目录和使用。

 

5.修改原网络配置及训练

增添数据层和损失层...

name: "zebraNet"

layer {
  name: "data"
  type: "DenseImageData"
  #type: "SegmentationData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_value: 104.008
    mean_value: 116.669
    mean_value: 122.675
  }
  dense_image_data_param {
    root_folder: "/home/zebrajiahao/DL_dataset/nyu"
    source: "/home/zebrajiahao/DL_dataset/nyu/train.txt"
    batch_size: 2
    shuffle: true
    crop_width:224
    crop_height:224
  }
}


# 224 x 224
# conv1_1
layer {  bottom: "data"  top: "conv1_1"  name: "conv1_1"  type: "Convolution"
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0.0}
  convolution_param {    num_output: 64    pad: 1    kernel_size: 3  }}
layer { bottom: 'conv1_1' top: 'conv1_1' name: 'bn1_1' type: "BN"
  bn_param { bn_mode: INFERENCE scale_filler { type: 'constant' value: 1 }
             shift_filler { type: 'constant' value: 0.001 } } }

##################################原网络#######################

# seg-score
layer { name: 'class_score_nyu' type: "Convolution" bottom: 'deconv1_2' top: 'class_score'
  param { lr_mult: 1.0 decay_mult: 1.0}
  param { lr_mult: 2.0 decay_mult: 0.0}
  convolution_param { num_output: 14 kernel_size: 1
    weight_filler {      type: "gaussian"      std: 0.01    }
    bias_filler {      type: "constant"      value: 0    }} }


layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "class_score"
  bottom: "label"
  top: "loss"
}

开始训练~~~

在原有基础上继续训练,之后会试试迁移学习,冻住一些层的参数,最后将终端输出保存在一个文件当中。

./build/tools/caffe train --solver=/home/zebrajiahao/semanticfusion/caffe_semanticfusion/models/solver.prototxt  -weights=./inference.caffemodel 2>&1 | tee /home/zebrajiahao/zebranet.log

 

但如果你使用里其他网络模型(如pspnet),有时候运行会报错 has no field named "XXXX",这就需要自己手动添加一些层。

例如Message type "caffe.BNParameter" has no field named "slope_filler":

一般添加新层,我们需要成声明和实现两个部分,对应放在.hpp和.cpp文件中,如果有cuda实现,还应有.cu文件。其中.hpp头文件放在/caffe/include/caffe/layers/文件夹下,而 .cpp 和 .cu 放入/caffe/src/caffe/layers下。

caffe增加Layer的步骤如下:

1.在./src/caffe/proto/caffe.proto增加对应的LayerParameter message,注意ID不能重复。

2.在./include/caffe/layers/下增加对应layer的声明,编写hpp文件(仿照着其它层进行编写)。

3.在./src/caffe/layers/路径下增加对应layer的cu、cpp文件。

————————————————
 

我们这里直接用你替换(参考)的那个网络模型(如pspnet)原本的实现来替换修改现在的caffe:

1.通过meld文件对比pspnet文件和caffe_semanticfusion对比,你会发现在上边的两个路径下bn_layer.hpp/.cpp是蓝色的,表示同名但有内容不同(绿色表示对比中没有这个文件),会在pspnet文件这边发现“slope"相关代码,可以直接替换相关的所有文件(当然你得检查确保不同的地方不会引起错误)

2.同时要注意是否要需要修改caffe.proto,主要两个地方,增加层类型和增加层参数

 

注:自己替换会遇到不少问题,但这是学习模仿的过程,也是日后编写自己层的基础,希望你可以enjoy这段旅程。

 


 

 

-- 炼丹方法1 ->学习率

感谢大家的分享,主要参考以下几位:

1.https://blog.csdn.net/Xmo_jiao/article/details/78012504 (数据集处理)

2.https://blog.csdn.net/sinat_23853639/article/details/82834194 (语义分割整个流程)

3.https://blog.csdn.net/u013241583/article/details/90058732(添加DenseImageData数据层)

4.https://blog.csdn.net/qq_43019117/article/details/82587285(添加ImageSegData数据层)

2.https://blog.csdn.net/sinat_23853639/article/details/82834194 (语义分割整个流程)

 

你可能感兴趣的:(slam)