DRBox——跳坑与填坑记录

工程链接: https://github.com/liulei01/DRBox

一、写在前面的话
1、CAFFEROOT:就是github下载得到的caffe文件夹(DRBox-master),如路径/workspace/DRBOX-master,该路径下面有examples, include, models, python, scripts, src等文件夹。

二、编译
// Modify Makefile.config according to your Caffe installation.不过我是直接把电脑上别的caffe文件下的Makefile.config和Makefile.config.example直接拷到自己CAFFEROOT下

cp Makefile.config.example Makefile.config // 这一步因为拷贝了我就没有做
make -j8
make py

报错啦!

PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/data_transformer.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/data_transformer.hpp:6,
                 from src/caffe/data_transformer.cpp:8:
./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t):
./include/caffe/util/cudnn.hpp:21:10: warning: enumeration value ‘CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING’ not handled in switch [-Wswitch]
   switch (status) {
          ^
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int):
./include/caffe/util/cudnn.hpp:113:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
       pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
                                                                      ^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
     cudnnStatus_t status = condition; \
                            ^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
                 from ./include/caffe/util/device_alternate.hpp:40,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/data_transformer.hpp:6,
                 from src/caffe/data_transformer.cpp:8:
/usr/local/cuda/include/cudnn.h:500:27: note: declared here
 cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
                           ^
Makefile:575: recipe for target '.build_release/src/caffe/data_transformer.o' failed
make: *** [.build_release/src/caffe/data_transformer.o] Error 1

这是因为当前版本的caffe的cudnn实现与系统所安装的cudnn的版本不一致引起的。
百度的解决办法:
参考:https://blog.csdn.net/u011070171/article/details/52292680

1.将CAFFEROOT下的./include/caffe/util/cudnn.hpp 换成最新版的caffe里的cudnn的实现,即相应的cudnn.hpp.
2. 将./include/caffe/layers里的,所有以cudnn开头的文件,例如cudnn_conv_layer.hpp。 都替换成最新版的caffe里的相应的同名文件。
3.将./src/caffe/layer里的,所有以cudnn开头的文件,例如cudnn_lrn_layer.cu,cudnn_pooling_layer.cpp,cudnn_sigmoid_layer.cu。都替换成最新版的caffe里的相应的同名文件。

PS: 我只进行了第一步操作就变易成功了,找了一个新版caffe的cudnn.hpp,附上我用来替换的cudnn.hpp。
cudann.cpp取自链接:https://github.com/zl535320706/20180925_caffe/blob/master/include/caffe/util/cudnn.hpp

#ifndef CAFFE_UTIL_CUDNN_H_
#define CAFFE_UTIL_CUDNN_H_
#ifdef USE_CUDNN

#include 

#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"

#define CUDNN_VERSION_MIN(major, minor, patch) \
    (CUDNN_VERSION >= (major * 1000 + minor * 100 + patch))

#define CUDNN_CHECK(condition) \
  do { \
    cudnnStatus_t status = condition; \
    CHECK_EQ(status, CUDNN_STATUS_SUCCESS) << " "\
      << cudnnGetErrorString(status); \
  } while (0)

inline const char* cudnnGetErrorString(cudnnStatus_t status) {
  switch (status) {
    case CUDNN_STATUS_SUCCESS:
      return "CUDNN_STATUS_SUCCESS";
    case CUDNN_STATUS_NOT_INITIALIZED:
      return "CUDNN_STATUS_NOT_INITIALIZED";
    case CUDNN_STATUS_ALLOC_FAILED:
      return "CUDNN_STATUS_ALLOC_FAILED";
    case CUDNN_STATUS_BAD_PARAM:
      return "CUDNN_STATUS_BAD_PARAM";
    case CUDNN_STATUS_INTERNAL_ERROR:
      return "CUDNN_STATUS_INTERNAL_ERROR";
    case CUDNN_STATUS_INVALID_VALUE:
      return "CUDNN_STATUS_INVALID_VALUE";
    case CUDNN_STATUS_ARCH_MISMATCH:
      return "CUDNN_STATUS_ARCH_MISMATCH";
    case CUDNN_STATUS_MAPPING_ERROR:
      return "CUDNN_STATUS_MAPPING_ERROR";
    case CUDNN_STATUS_EXECUTION_FAILED:
      return "CUDNN_STATUS_EXECUTION_FAILED";
    case CUDNN_STATUS_NOT_SUPPORTED:
      return "CUDNN_STATUS_NOT_SUPPORTED";
    case CUDNN_STATUS_LICENSE_ERROR:
      return "CUDNN_STATUS_LICENSE_ERROR";
#if CUDNN_VERSION_MIN(6, 0, 0)
    case CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING:
      return "CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING";
#endif
#if CUDNN_VERSION_MIN(7, 0, 0)
    case CUDNN_STATUS_RUNTIME_IN_PROGRESS:
      return "CUDNN_STATUS_RUNTIME_IN_PROGRESS";
    case CUDNN_STATUS_RUNTIME_FP_OVERFLOW:
      return "CUDNN_STATUS_RUNTIME_FP_OVERFLOW";
#endif
  }
  return "Unknown cudnn status";
}

namespace caffe {

namespace cudnn {

template <typename Dtype> class dataType;
template<> class dataType<float>  {
 public:
  static const cudnnDataType_t type = CUDNN_DATA_FLOAT;
  static float oneval, zeroval;
  static const void *one, *zero;
};
template<> class dataType<double> {
 public:
  static const cudnnDataType_t type = CUDNN_DATA_DOUBLE;
  static double oneval, zeroval;
  static const void *one, *zero;
};

template <typename Dtype>
inline void createTensor4dDesc(cudnnTensorDescriptor_t* desc) {
  CUDNN_CHECK(cudnnCreateTensorDescriptor(desc));
}

template <typename Dtype>
inline void setTensor4dDesc(cudnnTensorDescriptor_t* desc,
    int n, int c, int h, int w,
    int stride_n, int stride_c, int stride_h, int stride_w) {
  CUDNN_CHECK(cudnnSetTensor4dDescriptorEx(*desc, dataType<Dtype>::type,
        n, c, h, w, stride_n, stride_c, stride_h, stride_w));
}

template <typename Dtype>
inline void setTensor4dDesc(cudnnTensorDescriptor_t* desc,
    int n, int c, int h, int w) {
  const int stride_w = 1;
  const int stride_h = w * stride_w;
  const int stride_c = h * stride_h;
  const int stride_n = c * stride_c;
  setTensor4dDesc<Dtype>(desc, n, c, h, w,
                         stride_n, stride_c, stride_h, stride_w);
}

template <typename Dtype>
inline void createFilterDesc(cudnnFilterDescriptor_t* desc,
    int n, int c, int h, int w) {
  CUDNN_CHECK(cudnnCreateFilterDescriptor(desc));
#if CUDNN_VERSION_MIN(5, 0, 0)
  CUDNN_CHECK(cudnnSetFilter4dDescriptor(*desc, dataType<Dtype>::type,
      CUDNN_TENSOR_NCHW, n, c, h, w));
#else
  CUDNN_CHECK(cudnnSetFilter4dDescriptor_v4(*desc, dataType<Dtype>::type,
      CUDNN_TENSOR_NCHW, n, c, h, w));
#endif
}

template <typename Dtype>
inline void createConvolutionDesc(cudnnConvolutionDescriptor_t* conv) {
  CUDNN_CHECK(cudnnCreateConvolutionDescriptor(conv));
}

template <typename Dtype>
inline void setConvolutionDesc(cudnnConvolutionDescriptor_t* conv,
    cudnnTensorDescriptor_t bottom, cudnnFilterDescriptor_t filter,
    int pad_h, int pad_w, int stride_h, int stride_w) {
#if CUDNN_VERSION_MIN(6, 0, 0)
  CUDNN_CHECK(cudnnSetConvolution2dDescriptor(*conv,
      pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION,
      dataType<Dtype>::type));
#else
    CUDNN_CHECK(cudnnSetConvolution2dDescriptor(*conv,
      pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
#endif
}

template <typename Dtype>
inline void createPoolingDesc(cudnnPoolingDescriptor_t* pool_desc,
    PoolingParameter_PoolMethod poolmethod, cudnnPoolingMode_t* mode,
    int h, int w, int pad_h, int pad_w, int stride_h, int stride_w) {
  switch (poolmethod) {
  case PoolingParameter_PoolMethod_MAX:
    *mode = CUDNN_POOLING_MAX;
    break;
  case PoolingParameter_PoolMethod_AVE:
    *mode = CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING;
    break;
  default:
    LOG(FATAL) << "Unknown pooling method.";
  }
  CUDNN_CHECK(cudnnCreatePoolingDescriptor(pool_desc));
#if CUDNN_VERSION_MIN(5, 0, 0)
  CUDNN_CHECK(cudnnSetPooling2dDescriptor(*pool_desc, *mode,
        CUDNN_PROPAGATE_NAN, h, w, pad_h, pad_w, stride_h, stride_w));
#else
  CUDNN_CHECK(cudnnSetPooling2dDescriptor_v4(*pool_desc, *mode,
        CUDNN_PROPAGATE_NAN, h, w, pad_h, pad_w, stride_h, stride_w));
#endif
}

template <typename Dtype>
inline void createActivationDescriptor(cudnnActivationDescriptor_t* activ_desc,
    cudnnActivationMode_t mode) {
  CUDNN_CHECK(cudnnCreateActivationDescriptor(activ_desc));
  CUDNN_CHECK(cudnnSetActivationDescriptor(*activ_desc, mode,
                                           CUDNN_PROPAGATE_NAN, Dtype(0)));
}

}  // namespace cudnn

}  // namespace caffe

#endif  // USE_CUDNN
#endif  // CAFFE_UTIL_CUDNN_H_

三、制作数据集报错

cd $CAFFEROOT
./data/Ship-Opt/create_data.sh

报错啦
error while loading shared libraries: libhdf5_hl.so.100: cannot open shared object file: No such file or directory**
原因:动态库链接出错
解决方法:
方法一:
1、在根目录下打开terminal输入sudo gedit .bashrc
2、将Anaconda lib的路径添加到环境变量LD_LIBRARY_PATH中,在bashrc最后加入并保存:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{anaconda_dir}/lib
如我的:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/hp/anaconda3/lib

多说一句

$which python // 查看python目前在那个anaconda环境下
/home/hp/anaconda3/bin/python

3、在!!CAFFEROOT!!里打开terminal输入source ~/.bashrc

方法二:
每次在CAFFEROOT下运行时,直接在终端输入下述语句,可以避免修改bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/hp/anaconda3/lib

四、Train时报错
报错啦
AttributeError: ‘module’ object has no attribute ‘PriorRBoxParameter’
解决方法:类同三
1、打开环境变量后,将caffe环境变量指向DRBox-master,即

export PYTHONPATH=/你的路径/DRBox-master/python:$PYTHONPATH

2、保存后,在!!CAFFEROOT!!里打开terminal执行source ~/.bashrc即可。

五、Train时报错
报错啦
TypeError: ‘>’ not supported between instances of ‘builtin_function_or_method’ and ‘int’
解决办法:
打开CAFFEROOT/python/caffe/model_libs.py
将assert len > 0注释掉变成下面,16行

# assert int(len) > 0

六、Train时报错
NameError:name ‘xrange’ is not defined
原因:
在Python 3中,range()与xrange()合并为range( )。
解决办法:
将xrange( )函数全部换为range( )。

七、Train时报错
报错啦
Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
原因:
这是由于GPU数量编号不匹配造成的,或是batch太大造成的。
解决办法
修改原来examples/rbox/rbox_pascal_ship_opt.py训练py文件中的,大概在166行

gpus = "0, 1"             // 我只用了1,改成“1”
batch_size = 16           // 改为4
accum_batch_size = 16	  // 改为4
test_batch_size = 64	  // 改为16

PS
不可以在py文件中用下面的,会冲突。(不要问我怎么知道的)

os.environ["CUDA_VISIBLE_DEVICES"] = "0" 

巨坑预警!!!!!!!

/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFReadDirectory@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFWriteEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFIsTiled@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFOpen@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFReadEncodedStrip@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFSetField@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFWriteScanline@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFGetField@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFScanlineSize@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFNumberOfStrips@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFSetWarningHandler@LIBTIFF_4.0'
/usr/local/lib/libopencv_imgcodecs.so: undefined reference to `TIFFSetErrorHandler@LIBTIFF_4.0'
// 太多了反正就是在./build_release/lib.....这个时候报了一堆错

怎么会产生呢?就是在这个时候!
Before training for ship, you should replace src/caffe/util/rbox_util.cpp with src/caffe/util.rbox_util.cpp.ship and rebuilding the codes. The reason is that we ignore the head and tail of a ship to make the problem easier.

cd $CAFFEROOT
mv src/caffe/util/rbox_util.cpp src/caffe/util/rbox_util.cpp.old
mv src/caffe/util/rbox_util.cpp.ship src/caffe/util/rbox_util.cpp
make -j8
python examples/rbox/rbox_pascal_ship_opt.py

1、把src/caffe/util/rbox_util.cpp改名为src/caffe/util/rbox_util.cpp.old
2、把src/caffe/util/rbox_util.cpp.ship改名为src/caffe/util/rbox_util.cpp
3、然后直接make -j8来rebuild the code是没用的!!!!!我就make clean了。
原罪来了!!!!一天多在坑里爬不出去,一直是上面的LIBTIFF_4.0报错
解决方法:
1、尝试了所有的方法:
1)sudo make -j8,//不是权限问题
2)重建虚拟环境,//还是有错
3)bashrc中删掉export PATH, //不行
4)conda uninstall libtiff,// 根本删不掉,删掉会有更大的问题
5)在CAFFEROOT下的Makefile中,搜索std++在后面加个啥东西, // 依旧不可
…………太多了我不想说了
2、和博士师兄捣鼓了好久,无数次失败,吐槽了无数声,然后!!!
请删掉电脑硬盘上你所有的工程DRBox-master,重新拷贝一份原版无改动的DRBox-master到一个新的盘,从头再来!!!!
估计是因为初次尝试时成功make后,修改了src/caffe/util/rbox_util.cpp等一系列文件名后,又make clean并重新编译make -j8,有残留的东西在文件夹里或文件里没有清干净。
大胆清零吧!!从头再来!!

爬坑回忆

1、download工程DRBox-master,拷贝一份本机别的caffe工程可以使用的Makefile、Makefile.config.、Makefile.config.example到CAFFEROOT下覆盖原来的;
2、数据准备&模型准备:
~将舰船图像数据拷贝到CAFFEROOT/data/Ship-Opt/train_data里面,还有对应的trainval到CAFFEROOT/data/Ship-Opt里(trainval中的名称必须是train_data里存在的);
~CAFFEROOT下新建models/VGGNet/,将vgg模型放进去,包括prototxt和一个caffemodel;
3、按上述二:将CAFFEROOT下的./include/caffe/util/cudnn.hpp 换成最新版的caffe里的cudnn的实现,即相应的cudnn.hpp。
4、因为我是做舰船检测,所以手动重命名,不要用mv
把src/caffe/util/rbox_util.cpp改名为src/caffe/util/rbox_util.cpp.old
把src/caffe/util/rbox_util.cpp.ship改名为src/caffe/util/rbox_util.cpp
5、一定先完成4再编译:

make -j8
make py

6、按上述四:
在bashrc中添加下述并source ~/.bashrc

export PYTHONPATH=/你的路径/DRBox-master/python:$PYTHONPATH

7、按上述三:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/hp/anaconda3/lib

8、制作数据集lmbd:
修改/data/Ship-Opt/create_data.sh中的root_dir= CAFFEROOT,data_root_dir = CAFFEROOT/data/Ship-Opt/train_data

cd $CAFFEROOT
./data/Ship-Opt/create_data.sh

成功的话会在examples下生成Ship-Opt/Ship-Opt_trainval_lmbd文件夹
如果链接失败,直接把data/Ship-Opt/train_data/Ship-Opt/lmbd里的文件夹Ship-Opt_trainval_lmbd(里面有data和lock)拷到examples/Ship-Opt里去
9、训练:

cd $CAFFEROOT
python examples/rbox/rbox_pascal_ship_opt.py

10、未完待续……

【注】
1、要确保CAFFEROOT/scripts/create_annoset_r.py里的print都加了()
2、要确保CAFFEROOT/python/caffe/model_libs.py里的assert > 0删掉
3、要确保xrange( )函数全部换为range( )

你可能感兴趣的:(目标检测,深度学习,caffe)