环境 ubantu16.04+cudnn7.0+cuda_9.0.176
Pytorch1.0+Tensorflow-gpu 1.9.0+python3.6.5+ anaconda3
参考代码: https://github.com/ruotianluo/pytorch-faster-rcnn
编译(nms, roi_pooling, and roi_align(from facebookresearch/maskrcnn-benchmark)模块:
$ cd pytorch-faster-rcnn/lib
$ python setup.py build develop
$ cd ../
$ cd data
$ git clone https://github.com/pdollar/coco.git
$ cd coco/PythonAPI
$ make
$ cd ../../..
命令下载直接用:
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
数据集解压命令,在当前文件夹解压,会自动生成VOCdevkit文件夹。
$ tar xvf VOCtrainval_06-Nov-2007.tar
$ tar xvf VOCtest_06-Nov-2007.tar
$ tar xvf VOCdevkit_08-Jun-2007.tar
VGG16模型( 可以用于迁移的模型,路径data/imagenet_weights)
$ mkdir -p data/imagenet_weights
$ cd data/imagenet_weights
$ wget -v http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz(命令行执行)
$ tar -xzvf vgg_16_2016_08_28.tar.gz
#(解压)
$ mv vgg_16.ckpt vgg16.ckpt
#(重命名)
cd ../..
1、下载预训练好的模型和权重
pytorch_vgg
and caffe
命令:
$ mkdir -p data/imagenet_weights
$ cd data/imagenet_weights
$ python # open python in terminal and run the following Python code
import torch
from torch.utils.model_zoo import load_url
from torchvision import models
sd = load_url("https://s3-us-west-2.amazonaws.com/jcjohns-models/vgg16-00b39a1b.pth")
sd['classifier.0.weight'] = sd['classifier.1.weight']
sd['classifier.0.bias'] = sd['classifier.1.bias']
del sd['classifier.1.weight']
del sd['classifier.1.bias']
sd['classifier.3.weight'] = sd['classifier.4.weight']
sd['classifier.3.bias'] = sd['classifier.4.bias']
del sd['classifier.4.weight']
del sd['classifier.4.bias']
torch.save(sd, "vgg16.pth")
cd ../..
2、训练:
$ ./experiments/scripts/train_faster_rcnn.sh [GPU_ID] [DATASET] [NET]
# GPU_ID is the GPU you want to test on# NET in {vgg16, res50, res101, res152} is the network arch to use# DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh# Examples:
$ ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
$ ./experiments/scripts/train_faster_rcnn.sh 1 coco res101
3、可视化:
$ tensorboard --logdir=tensorboard/vgg16/voc_2007_trainval/ --port=7001 &
训练过程中的error:
1、 ZeroDivisionErro:
solution:
参考:
https://blog.csdn.net/duanyajun987/article/details/83790384
删除缓存,data/cache文件夹, output文件夹
2、AssertionError:
solution:
我的TensorboardX版本为1.6,换成1.2即可
3、我训练VOC2007数据集中的两类数据person和car,其他为背景类,
报错KeyError:
修改 pytorch-faster-rcnn-master/lib/datasets/pascal_voc.py,classes内容
self._classes = (‘background’, # always index 0
‘aeroplane’, ‘bicycle’, ‘bird’, ‘boat’,
‘bottle’, ‘bus’, ‘car’, ‘cat’, ‘chair’,
‘cow’, ‘diningtable’, ‘dog’, ‘horse’,
‘motorbike’, ‘person’, ‘pottedplant’,
‘sheep’, ‘sofa’, ‘train’, ‘tvmonitor’)
改为只剩下’background’,‘person’,'car’三类。
输入命令:
./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
报错:KeyError
solution:
参考:
https://blog.csdn.net/duanyajun987/article/details/83790384
首先核对 pytorch-faster-rcnn-master/lib/datasets/pascal_voc.py文件中self._class内容
其次寻找以下类似代码
objs = diff_objs (or non_diff_objs)
并在下方添加代码
cls_objs = [obj for obj in objs if obj.find(‘name’).text in self._classes]
objs = cls_objs
命令:
./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16
命令:
python ./tools/demo.py --net vgg16 --dataset pascal_voc
报错:RuntimeError:
demo.py中的class没有改
solution:
①CLASSES = (‘background’, ‘car’, ‘person’ ),如 pytorch-faster-rcnn-master/lib/datasets/pascal_voc.py一样
②# load network
if demonet == ‘vgg16’:
net = vgg16()
elif demonet == ‘res101’:
net = resnetv1(num_layers=101)
else:
raise NotImplementedError
net.create_architecture(3, tag=‘default’, anchor_scales=[8, 16, 32]) 由原来的21类改成3类。
$ mkdir -p data/imagenet_weights
$ cd data/imagenet_weights
# download from my gdrive (link in pytorch-resnet)
$ mv resnet101-caffe.pth res101.pth
$ cd ../..
2、训练
命令:
$ ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc res101
报错:KeyError
Loading initial model weights from data/imagenet_weights/res101.pth
Traceback (most recent call last):
File “./tools/trainval_net.py”, line 164, in
max_iters=args.max_iters)
File “/home/yuxin/pytorch-faster-rcnn-master/tools/…/lib/model/train_val.py”, line 393, in train_net
sw.train_model(max_iters)
File “/home/yuxin/pytorch-faster-rcnn-master/tools/…/lib/model/train_val.py”, line 254, in train_model
lr, last_snapshot_iter, stepsizes, np_paths, ss_paths = self.initialize(
File “/home/yuxin/pytorch-faster-rcnn-master/tools/…/lib/model/train_val.py”, line 196, in initialize
self.net.load_pretrained_cnn(torch.load(self.pretrained_model))
File “/home/yuxin/pytorch-faster-rcnn-master/tools/…/lib/nets/resnet_v1.py”, line 195, in load_pretrained_cnn
for k in list(self.resnet.state_dict())})
File “/home/yuxin/pytorch-faster-rcnn-master/tools/…/lib/nets/resnet_v1.py”, line 195, in
for k in list(self.resnet.state_dict())})
KeyError: ‘bn1.num_batches_tracked’
Command exited with non-zero status 1
5.25user 0.45system 0:05.75elapsed 99%CPU (0avgtext+0avgdata 646840maxresident)k
32inputs+5072outputs (0major+162986minor)pagefaults 0swaps
solution:
参考:
https://github.com/ruotianluo/pytorch-faster-rcnn/pull/126/files
在 lib/nets/network.py中修改代码:
在lib/nets/resnet_v1.py中修改代码:
3、训练结果
4、测试的demo结果
修改相应的类别等。数据集制作按照VOC风格。其中VOC2007/ImageSets/Main中txt的制作如下
import os
import random
trainval_percent = 0.8 # trainval占比例多少
train_percent = 0.7 # test数据集占比例多少
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
报错:FileNotFoundError
Writing apple VOC results file
Traceback (most recent call last):
File “./tools/test_net.py”, line 141, in
test_net(net, imdb, filename, max_per_image=args.max_per_image)
File “/home/yuxin/pytorch-faster-rcnn-master-apple-peach/tools/…/lib/model/test.py”, line 181, in test_net
imdb.evaluate_detections(all_boxes, output_dir)
File “/home/yuxin/pytorch-faster-rcnn-master-apple-peach/tools/…/lib/datasets/pascal_voc.py”, line 289, in evaluate_detections
self._write_voc_results_file(all_boxes)
File “/home/yuxin/pytorch-faster-rcnn-master-apple-peach/tools/…/lib/datasets/pascal_voc.py”, line 215, in _write_voc_results_file
with open(filename, ‘wt’) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/yuxin/pytorch-faster-rcnn-master-apple-peach/data/VOCdevkit2007/results/VOC2007/Main/comp4_f8353590-5788-445f-ae42-49d75c459364_det_test_apple.txt’
Command exited with non-zero status 1
166.28user 13.22system 0:48.49elapsed 370%CPU (0avgtext+0avgdata 2863148maxresident)k
0inputs+2184outputs (0major+807151minor)pagefaults 0swaps
solution:
在已有的数据集中添加results/VOC2007/Main/dummy文件,可以在原有的VOCdevkit2007中拷贝过来
训练过程:
测试过程: