配置fater rcnn需要的环境配置:https://blog.csdn.net/M_Z_G_Y/article/details/81180390
1.需要下载的数据、代码、文件:
数据:Pascal voc2007数据集
代码:https://github.com/CharlesShang/TFFRCNN
文件:VGG16.npy和VGGnet_fast_rcnn_iter_70000.ckpt(https://download.csdn.net/download/m_z_g_y/10582980)
2.训练和测试
直接使用论文训练好的模型进行测试:demo.py(在faster_rcnn文件夹下)
cd ./lib
make
# 在import下添加以下两行代码
import glob
plt.switch_backend('agg')
# 将最后几行代码改成如下形式:
for im_name in im_names:
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
print 'Demo for {:s}'.format(im_name)
demo(sess, net, im_name)
plt.savefig(im_name)
# plt.show()
python demo.py --model model/VGGnet_fast_rcnn_iter_70000.ckpt
自己训练:train_net.py
python ./faster_rcnn/train_net.py --gpu 0 --restore 0 --weights /root/hujiahui/TFFRCNN-master/data/pretrain_model//VGG_16.npy --imdb voc_2007_trainval --iters 70000 --cfg /root/hujiahui/TFFRCNN-master/experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir
3.走过的坑:
(1)tensorflow.python.framework.errors_impl.NotFoundError: ./lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,需要修改lib文件夹下的make.sh文件,修改后如下:
#!/usr/bin/env bash
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC
CUDA_PATH=/usr/local/cuda/
cd roi_pooling_layer
nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
# roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
# for gcc5-built tf
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
cd ..
# add building psroi_pooling layer
cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc \
psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
# psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
cd ..
(2)tensorflow.python.framework.errors_impl.NotFoundError: ./faster_rcnn/../lib/psroi_pooling_layer/psroi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE,如果再次出现错误,需要继续修改lib文件夹下的make.sh文件,修改后如下:
#!/usr/bin/env bash
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC
CUDA_PATH=/usr/local/cuda/
cd roi_pooling_layer
nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
# roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
# for gcc5-built tf
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
roi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
cd ..
# add building psroi_pooling layer
cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52
g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0\
psroi_pooling_op.cu.o -I $TF_INC -L $TF_LIB -ltensorflow_framework -D GOOGLE_CUDA=1 \
-fPIC $CXXFLAGS -lcudart -L $CUDA_PATH/lib64
## if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below
#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
# psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64
cd ..
(3)TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType……(我觉得解决办法有点麻烦,所以直接改了代码),在lib/fast_rcnn/train.py文件155行左右:
# load vgg16
if self.pretrained_model is not None and not restore:
print
'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
self.net.load(self.pretrained_model, sess, True)
# try:
# print
# 'Loading pretrained model weights from {:s}'.format(self.pretrained_model)
# self.net.load(self.pretrained_model, sess, True)
# except:
# raise 'Check your pretrained model {:s}'.format(self.pretrained_model)
(4)如果在训练阶段忽视了所有的网络层,即ignore……,说明下载的VGG16.npy和论文中要求的VGG_imagenet.npy有些不同,需要对lib/networks/network.py中的load函数进行一下修改:
def load(self, data_path, session, ignore_missing=False):
data_dict = np.load(data_path).item()
for key in data_dict:
with tf.variable_scope(key, reuse=True):
for subkey in data_dict[key]:
try:
# var = tf.get_variable(subkey)
# session.run(var.assign(data_dict[key][subkey]))
# print "assign pretrain model "+subkey+ " to "+key
var = tf.get_variable("weights")
session.run(var.assign(data_dict[key][0]))
var = tf.get_variable("biases")
session.run(var.assign(data_dict[key][1]))
print
"assign pretrain model " + " to " + key
except ValueError:
print
"ignore " + key
if not ignore_missing:
raise
(5)缺少各种的环境配置,如yaml和skimage等:
sudo adpt-get install python-skimage
sudo adpt-get install python-yaml