semanticFusion(SF)替换其他网络常见问题总结:
分为两种方式,第一种直接替换,第二种部分替换,第一种方便快捷,第二种是要学习caffe必须要掌握的。
-------------------------分割线----------------------------
第一种直接替换:
caffe_semanticfusion模块,可直接替换其他caffe版本语义分割网络,只需要将改一下名字(eg:deeplab-public-ver2->caffe_semanticfusion),提换SF原有模块,然后在此基础上训练,将生成好的caffemodel放入替换对应位置。
如果只是rgb的网络,不使用深度信息,需要更改/semanticfusion/src/main.cpp中的路径。
//caffe.Init("../caffe_semanticfusion/models/nyu_rgbd/inference.prototxt","../caffe_semanticfusion/models/nyu_rgbd/inference.caffemodel");
// This is for the RGB network
caffe.Init("../caffe_semanticfusion/models/nyu_rgb/inference.prototxt","../caffe_semanticfusion/models/nyu_rgb/inference.caffemodel");
比如直接替换deeplabv2:
但可能会遇到一些问题:
1.cudn不匹配问题,下边有解决方案。
2. ./include/caffe/common.cuh(9): error: function "atomicAdd(double *, double)" has already been defined,解决方案here
3.解决方案
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarCreate'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_CreateVer'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarWrite'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarFree'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadInfo'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Close'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadDataLinear'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Open'
collect2: error: ld returned 1 exit status
caffe_semanticfusion/tools/CMakeFiles/upgrad
4.权限 / 库冲突 (简单点的,就整个sudo su,在root下,先编译caffe_semanticfusion,再编译外边的semanticfusion.
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadDirectory@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFWriteEncodedStrip@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFIsTiled@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFOpen@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadEncodedStrip@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetField@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFWriteScanline@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFGetField@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFScanlineSize@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFNumberOfStrips@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetWarningHandler@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFSetErrorHandler@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadEncodedTile@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadRGBATile@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFClose@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFRGBAImageOK@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFClientOpen@LIBTIFF_4.0'
/opt/ros/kinetic/lib/x86_64-linux-gnu/libopencv_imgcodecs3.so.3.3.1: undefined reference to `TIFFReadRGBAStrip@LIBTIFF_4.0'
-------------------------分割线----------------------------
第二种部分替换:
如果想自己增添修改网络,建议自己安装Meld对比软件(apt-get install),查找学习各个caffe开源网络,替换相关层与数据。
下面不是直接替换
下面是部分替换
建议上面的方法最便捷,但如果要用caffe学习,就要学会自己设计。
1.编译caffe_semanticfusion的python接口
SF里的caffe是原本用cmake来编译的,配置文件是CMakeLists.txt, 但我使用本地caffe文件里的Makefile.config来配置,使用 make all -j8 命令直接在caffe_semanticfusion里编译,不新建build使用cmake,因为SF里的caffe并没有编译python接口和tools等,当然你也可以你也可以选择修改CMakeLists.txt文件。
这个操作中一般会遇到此问题:
PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/solvers/nesterov_solver.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/net.hpp:10,
from ./include/caffe/solver.hpp:7,
from ./include/caffe/sgd_solvers.hpp:7,
from src/caffe/solvers/nesterov_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t)’:
./include/caffe/util/cudnn.hpp:21:10: warning: enumeration value ‘CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING’ not handled in switch [-Wswitch]
switch (status) {
^
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:108:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition; \
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/net.hpp:10,
from ./include/caffe/solver.hpp:7,
from ./include/caffe/sgd_solvers.hpp:7,
from src/caffe/solvers/nesterov_solver.cpp:3:
/usr/local/cuda/include/cudnn.h:500:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
^
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/net.hpp:10,
from ./include/caffe/solver.hpp:7,
from ./include/caffe/sgd_solvers.hpp:7,
from src/caffe/solvers/nesterov_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
pad_h, pad_w, stride_h, stride_w));
^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition; \
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/net.hpp:10,
from ./include/caffe/solver.hpp:7,
from ./include/caffe/sgd_solvers.hpp:7,
from src/caffe/solvers/nesterov_solver.cpp:3:
/usr/local/cuda/include/cudnn.h:952:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
^
Makefile:552: recipe for target '.build_release/src/caffe/solvers/nesterov_solver.o' failed
make: *** [.build_release/src/caffe/solvers/nesterov_solver.o] Error 1
cudnn版本不匹配,参考这里解决。我只是把当前成功安装的caffe里的相关文件替换了这里的。-
make all -j8
make pycaffe -j8
编译好python接口后用draw_net.py绘制了inference.prototxt的网路结构。
注:每次添加新的需要make clean清除之前make的文件。
部分截图:
2.编译semanticfusion
我这里只用rgb网络,不带深度,更改/src/main.cpp,使用下边那个,注意路径。
// This is for the RGB-D network
caffe.Init("../caffe_semanticfusion/models/nyu_rgbd/inference.prototxt","../caffe_semanticfusion/models/nyu_rgbd/inference.caffemodel");
// This is for the RGB network
//caffe.Init("../caffe/models/nyu_rgb/inference.prototxt","../caffe/models/nyu_rgb/inference.caffemodel");
cd semanticfusion
mkdir build
cd build
cmake ..
make -j16
有时候会遇到关于cv的一组报错,make clean后用sudo su再编译,最高权限可以解决问题。
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarCreate'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_CreateVer'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarWrite'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarFree'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadInfo'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Close'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_VarReadDataLinear'
../lib/libcaffe.so.1.0.0-rc3: undefined reference to `Mat_Open'
collect2: error: ld returned 1 exit status
caffe_semanticfusion/tools/CMakeFiles/upgrad
3.制作训练集
流程:
-拍摄图片
-labelme标注图片
-缩放图片
-计算均值 R_mean is 90.402622, G_mean is 76.220016, B_mean is 72.216413 vs 常见的[104, 117, 123]
额,查了一下,caffe不支持多标签分类,需要更改,我就先不使用lmdb格式了,知乎上有这个问题的讨论。使用nyuv2的训练数据,省去拍摄,标注,缩放这些过程。
这里注意,官网的nyuv2数据集有459个分类,我使用的是13个分类的nyuv2数据集.
4.网络的输入
由于标签是语义图像,采用开源代码中的ImageSegDataLayer层来实现读入数据与标签(水~),我用的deeolabv2的代码,当然还有DenseImageData(segnet)等等,通过替换(这里ImageSegDataLayer层替换成功了,DenseImageData没有成功)
image_seg_data_layer.hpp/cpp,
base_data_layer.hpp/cpp/cu,
data_transformer.cpp/hpp,
io.cpp/hpp,(参数不一致)
error: no matching function for call to ‘ReadImageToCVMat(std::__cxx11::basic_string&, const int&, const int&, bool, bool)’
new_height, new_width, false, true);
更改caffe.proto等操作扩展caffe本身。(data_layers问题通过检查路径或者替换)
对比文件过程使用meld软件,很方便,我是边替换边编译caffe_semanticfusion,然后出错make clean 后不断操作,慢慢了解了caffe的目录和使用。
5.修改原网络配置及训练
增添数据层和损失层...
name: "zebraNet"
layer {
name: "data"
type: "DenseImageData"
#type: "SegmentationData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_value: 104.008
mean_value: 116.669
mean_value: 122.675
}
dense_image_data_param {
root_folder: "/home/zebrajiahao/DL_dataset/nyu"
source: "/home/zebrajiahao/DL_dataset/nyu/train.txt"
batch_size: 2
shuffle: true
crop_width:224
crop_height:224
}
}
# 224 x 224
# conv1_1
layer { bottom: "data" top: "conv1_1" name: "conv1_1" type: "Convolution"
param { lr_mult: 1.0 decay_mult: 1.0}
param { lr_mult: 2.0 decay_mult: 0.0}
convolution_param { num_output: 64 pad: 1 kernel_size: 3 }}
layer { bottom: 'conv1_1' top: 'conv1_1' name: 'bn1_1' type: "BN"
bn_param { bn_mode: INFERENCE scale_filler { type: 'constant' value: 1 }
shift_filler { type: 'constant' value: 0.001 } } }
##################################原网络#######################
# seg-score
layer { name: 'class_score_nyu' type: "Convolution" bottom: 'deconv1_2' top: 'class_score'
param { lr_mult: 1.0 decay_mult: 1.0}
param { lr_mult: 2.0 decay_mult: 0.0}
convolution_param { num_output: 14 kernel_size: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }} }
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "class_score"
bottom: "label"
top: "loss"
}
开始训练~~~
在原有基础上继续训练,之后会试试迁移学习,冻住一些层的参数,最后将终端输出保存在一个文件当中。
./build/tools/caffe train --solver=/home/zebrajiahao/semanticfusion/caffe_semanticfusion/models/solver.prototxt -weights=./inference.caffemodel 2>&1 | tee /home/zebrajiahao/zebranet.log
但如果你使用里其他网络模型(如pspnet),有时候运行会报错 has no field named "XXXX",这就需要自己手动添加一些层。
例如Message type "caffe.BNParameter" has no field named "slope_filler":
一般添加新层,我们需要成声明和实现两个部分,对应放在.hpp和.cpp文件中,如果有cuda实现,还应有.cu文件。其中.hpp头文件放在/caffe/include/caffe/layers/文件夹下,而 .cpp 和 .cu 放入/caffe/src/caffe/layers下。
caffe增加Layer的步骤如下:
1.在./src/caffe/proto/caffe.proto增加对应的LayerParameter message,注意ID不能重复。
2.在./include/caffe/layers/下增加对应layer的声明,编写hpp文件(仿照着其它层进行编写)。
3.在./src/caffe/layers/路径下增加对应layer的cu、cpp文件。
————————————————
我们这里直接用你替换(参考)的那个网络模型(如pspnet)原本的实现来替换修改现在的caffe:
1.通过meld文件对比pspnet文件和caffe_semanticfusion对比,你会发现在上边的两个路径下bn_layer.hpp/.cpp是蓝色的,表示同名但有内容不同(绿色表示对比中没有这个文件),会在pspnet文件这边发现“slope"相关代码,可以直接替换相关的所有文件(当然你得检查确保不同的地方不会引起错误)
2.同时要注意是否要需要修改caffe.proto,主要两个地方,增加层类型和增加层参数
注:自己替换会遇到不少问题,但这是学习模仿的过程,也是日后编写自己层的基础,希望你可以enjoy这段旅程。
-- 炼丹方法1 ->学习率
感谢大家的分享,主要参考以下几位:
1.https://blog.csdn.net/Xmo_jiao/article/details/78012504 (数据集处理)
2.https://blog.csdn.net/sinat_23853639/article/details/82834194 (语义分割整个流程)
3.https://blog.csdn.net/u013241583/article/details/90058732(添加DenseImageData数据层)
4.https://blog.csdn.net/qq_43019117/article/details/82587285(添加ImageSegData数据层)
2.https://blog.csdn.net/sinat_23853639/article/details/82834194 (语义分割整个流程)