R-CNN是一个非常优秀的目标检测模型,虽然相比今天很多state-of-the-art方法,它的精度和效率都有略显不足,但是该模型是很多算法的基础思想。本文以官网文本为基础,只做本地运行的修改和中文说明:http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb
细节信息可参考作者论文:Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.
---Last update 2015年6月8日
1.下载训练好的R-CNN模型,也可以自己使用Caffe训练一个自己的模型。预训练模型基于Imagenet数据集,并在ILSVRC13上进行微调,输出200个检测分类。
下载方法:~/caffe-master$ ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13
2.下载Selective Search,并运行matlab编译相关mex文件。
(1) 下载方法:https://github.com/sergeyk/selective_search_ijcv_with_python ,下载后解压,改名,并复制到 ~/caffe-master/python/selective_search_ijcv_with_python/
(2) 编译方法:启动matlab客户端,并运行~/caffe-master/python/selective_search_ijcv_with_python/demo.m ,无报错信息运行后关闭matlab即可。
3.执行python/detect.py报错时,可参考如下修改方法:
(1) 报错信息:OSError: [Errno 2] No such file or directory
修改文件:~/caffe-master/python/selective_search_ijcv_with_python/selective_search.py
修改前:mc = "matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)
修改后:mc = "/usr/local/MATLAB/R2014a/bin/matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)
(2) 报错信息:ValueError: 'axis' entry 2 is out of bounds (-2, 2)
修改文件:~/caffe-master/python/caffe/detector.py
修改前:predictions = out[self.outputs[0]].squeeze(axis=(2, 3))
修改后:predictions = out[self.outputs[0]].squeeze()
创建临时目录,并导入检测样本。检测样本可以同时导入多个,但会被作为一个样本进行处理,这种方式适合多预处理融合。
% cd '/home/ouxinyu/caffe-master'
! mkdir -p _temp
! echo examples/images/fish-bike.jpg > _temp/det_input.txt
调用Selective Search进行Region Proposal,然后调用Caffe进行分类预测。默认运行于GPU模式,若需要运行于CPU模式,可去掉--gpu
! python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
下面的内容没什么问题,路径继续改改,说明直接贴原作的....
Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.
In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.
Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.
Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.
with open('data/ilsvrc12/det_synset_words.txt') as f:
labels_df = pd.DataFrame([
{
'synset_id': l.strip().split(' ')[0],
'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
}
for l in f.readlines()
])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
Let's look at the activations.
plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')
Now let's take max across all windows and plot the top classes.
max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.
# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()
# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, second-best top detection in blue.
im = plt.imread('examples/images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()
det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.
def nms_detections(dets, overlap=0.3):
"""
Non-maximum suppression: Greedily select high-scoring detections and
skip detections that are significantly covered by a previously
selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Parameters
----------
dets: ndarray
each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
overlap: float
minimum overlap ratio (0.3 default)
Output
------
dets: ndarray
remaining after suppression.
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
ind = np.argsort(dets[:, 4])
w = x2 - x1
h = y2 - y1
area = (w * h).astype(float)
pick = []
while len(ind) > 0:
i = ind[-1]
pick.append(i)
ind = ind[:-1]
xx1 = np.maximum(x1[i], x1[ind])
yy1 = np.maximum(y1[i], y1[ind])
xx2 = np.minimum(x2[i], x2[ind])
yy2 = np.minimum(y2[i], y2[ind])
w = np.maximum(0., xx2 - xx1)
h = np.maximum(0., yy2 - yy1)
wh = w * h
o = wh / (area[i] + area[ind] - wh)
ind = ind[np.nonzero(o <= overlap)[0]]
return dets[pick, :]
scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.
plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(
plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
fill=False, edgecolor=c, linewidth=5)
)
print 'scores:', nms_dets[:3, 4]
This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.
You should try out detection on an image of your own next!
(Remove the temp directory to clean up, and we're done.)
!rm -rf _temp