faster-rcnn+自己的人头数据集。希望顺利一点吧~
开工!
1.准备原材料
https://github.com/rbgirshick/py-faster-rcnn
远程的服务器翻不了墙,很烦,有些原料通过搜罗各处的百度云找到。
2.跑demo
./tools/demo.py
跑demo 成功
3.用预训练的VGG16模型跑VOC2007数据集
./experiments/scripts/faster_rcnn_alt_opt.sh 0 VGG16 pascalvoc
训练日志解读
中断:
home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:100: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
fg_inds, size=fg_rois_per_this_image, replace=False)
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:113: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
bg_inds, size=bg_rois_per_this_image, replace=False)
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:120: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
labels[fg_rois_per_this_image:] = 0
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:176: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:177: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
I0218 18:06:15.150728 364 solver.cpp:229] Iteration 0, loss = 4.44656
I0218 18:06:15.150751 364 solver.cpp:245] Train net output #0: loss_bbox = 0.412176 (* 1 = 0.412176 loss)
I0218 18:06:15.150758 364 solver.cpp:245] Train net output #1: loss_cls = 4.03438 (* 1 = 4.03438 loss)
I0218 18:06:15.150761 364 sgd_solver.cpp:106] Iteration 0, lr = 0.001
F0218 18:06:15.154310 364 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
训练了一半,,溢出。。
网上说吧batch_size调小一点就好。
用faster-rcnn默认的demo检测了一下我们的图片,嗯,效果还是很差的。。
跑了一晚上,stage1跑完了,然后报错。
Solving...
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:100: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
fg_inds, size=fg_rois_per_this_image, replace=False)
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:113: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
bg_inds, size=bg_rois_per_this_image, replace=False)
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:120: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
labels[fg_rois_per_this_image:] = 0
/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py:176: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "./tools/train_faster_rcnn_alt_opt.py", line 195, in train_fast_rcnn
max_iters=max_iters)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 160, in train_net
model_paths = sw.train_model(max_iters)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 101, in train_model
self.solver.step(1)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 144, in forward
blobs = self._get_next_minibatch()
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 63, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
num_classes)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 125, in _sample_rois
roidb['bbox_targets'][keep_inds, :], num_classes)
File "/home/robot3/zrcai/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 176, in _get_bbox_regression_labels
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
ValueError: could not broadcast input array from shape (4) into shape (0)
据说主要是改的文件的问题。
解决方法:尝试
https://github.com/rbgirshick/py-faster-rcnn/issues/52
为了快点验证自己改的对不对,把迭代次数改为:
迭代次数在文件py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py中进行修改
max_iters= [80000, 40000, 80000, 40000]
分别对应rpn第1阶段,fast rcnn第1阶段,rpn第2阶段,fast rcnn第2阶段的迭代次数。
【20000,10000,20000,10000】
改了一下重新训练,,依然不行。。
I0219 19:25:45.681346 6976 solver.cpp:229] Iteration 0, loss = 1.40653
I0219 19:25:45.681371 6976 solver.cpp:245] Train net output #0: loss_bbox = 0.365611 (* 1 = 0.365611 loss)
I0219 19:25:45.681376 6976 solver.cpp:245] Train net output #1: loss_cls = 1.04092 (* 1 = 1.04092 loss)
I0219 19:25:45.681381 6976 sgd_solver.cpp:106] Iteration 0, lr = 0.001
F0219 19:25:45.684870 6976 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
看网上说可以改batchsize
深度学习中经常看到epoch、 iteration和batchsize,下面按自己的理解说说这三个的区别:
(1)batchsize:批大小。在深度学习中,一般采用SGD训练,即每次训练在训练集中取batchsize个样本训练;
(2)iteration:1个iteration等于使用batchsize个样本训练一次;
(3)epoch:1个epoch等于使用训练集中的全部样本训练一次;
举个例子,训练集有1000个样本,batchsize=10,那么:
训练完整个样本集需要:
100次iteration,1次epoch。
error == cudaSuccess (2 vs. 0) out of memory解决策略:
try to shrink the network filters and inputs.
先用ZF跑一下看行不行吧。
注意 重新训练要删除缓存
删除py-faster-rcnn文件夹下所有的.pyc文件及data文件夹下的cache文件夹, data/VOCdekit2007下的annotations_cache文件夹
ZF跑不通 好,换中型VGG试一下吧。
ZF和VGG都是卡在第二阶段。跑一下看看明早起来怎样把。
终于跑完咯
下一步就增加下迭代次数吧!改回8,4,8,4
然后随机抽几张图片用demo.py测试了一下。小人头+密集人头的效果不太好。
例如:
后方小人头,一片黑点点检测不出来。
考虑做的改进:
1.调参
2.交叉训练
3.改网络