caffe之python接口实战 :detection 官方教程源码解析

本文是官方文档的源码解析笔记系列之一

注1:本文内容属于caffe_root/example/下的ipynb文件的源码解析,旨在通过源码注释,加速初学者的学习进程。
注2:以下解析中,未对各部分英文注释做翻译,旨在告诫初学者,应该去适应原汁原味的英文教程阅读,这样有助于提升自己阅读技术文献的能力,也是高级程序员的必备素养。
注3:建议大家在jupyter nootebook环境下结合源码注释,运行程序。

R-CNN is a state-of-the-art detector that classifies region proposals by a finetuned Caffe model. For the full details of the R-CNN system and model, refer to its project site and the paper:

Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.

In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.

Let’s run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).

First, we’ll need region proposals and the Caffe R-CNN ImageNet model:

  • Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)

-Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13 to get the Caffe R-CNN ImageNet model.

With that done, we’ll call the bundled detect.py to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help.

!mkdir -p _temp #创建的临时目录,实验后记得删除
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt
!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5

This run was in GPU mode. For CPU mode detection, call detect.py without the --gpu argument.

Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file.
(We only ran on one image, so the filenames will all be the same.)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# df为1570*5的dataframe,列字段共五个:主要是每一个提议窗口实例的prediction(200个类的得分),ymin ,xmin ,ymax   和xmax
#df.prediction.values为1570*200矩阵
df = pd.read_hdf('_temp/det_output.h5', 'df') #
print(df.shape)
print(df.iloc[0])

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size – selective search isn’t scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results.
Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There’s no need for hardcoding.

Anyway, let’s now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you’ll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

with open('../data/ilsvrc12/det_synset_words.txt') as f:
    labels_df = pd.DataFrame([
        {
            'synset_id': l.strip().split(' ')[0],  #synset_id为列字段
            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]  #name字段每一行只取多个字段的一个字段作为分类标签。
        }
        for l in f.readlines()
    ])
#产生的dataframe的字段为200个类的字段,每一行为一个提议窗口,下面为给dataframe每一列加字段
labels_df.sort('synset_id') #按id号按升序排序,并返回排序后的dataframe
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name']) #df.prediction.values为1570*200矩阵,
print(predictions_df.iloc[0]) #输出第0个提议窗口,对应类的预测,打印输出,下面第一列为对应的字段,值为1*200列表

Let’s look at the activations.

plt.gray()
plt.matshow(predictions_df.values) #value代表不含字段的纯矩阵,mat为1570*200
plt.xlabel('Classes')
plt.ylabel('Windows') #1570个提议窗口的分类结果,每个窗口产生一个长度为200的向量

Now let’s take max across all windows and plot the top classes.

max_s = predictions_df.max(0) #prediction_df.values为1570*200纯矩阵矩阵,在此的语句为返回第一个窗口的分类的值最大对应的几个name标签
max_s.sort(ascending=False)
print(max_s[:10])

The top detections are in fact a person and bicycle.
Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()   #1570*1的向量,返回的时name字段对应的标签索引
j = predictions_df['bicycle'].argmax() #同上

# Show top predictions for top detection.
# df为1570*5的dataframe,列字段共五个:主要是每一个提议窗口实例的prediction(200个类的得分),ymin ,xmin ,ymax   和xmax
#df.prediction.values为1570*200矩阵
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name']) #建立类别标签到prediction字段的某一个窗口的类预测列表的索引
print('Top detection:')
print(f.order(ascending=False)[:5]) #按降序取最大的五个标签类
print('')

# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])

# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()

det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin'] #plt绘图时,对应的0坐标在左上角
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))

det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))

That’s cool. Let’s take all ‘bicycle’ detections and NMS them to get rid of overlapping windows.

def nms_detections(dets, overlap=0.3):
    """
    Non-maximum suppression: Greedily select high-scoring detections and
    skip detections that are significantly covered by a previously
    selected detection.

    This version is translated from Matlab code by Tomasz Malisiewicz,
    who sped up Pedro Felzenszwalb's code.

    Parameters
    ----------
    dets: ndarray
        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
    overlap: float
        minimum overlap ratio (0.3 default)   #非极大值抑制会选取重叠率大于0.3的选择框

    Output
    ------
    dets: ndarray
        remaining after suppression.
    """
    x1 = dets[:, 0] #所有检测框的x1坐标值,1570*1
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    ind = np.argsort(dets[:, 4]) #排序后的1570*1的score值对应的索引,列表中越靠后索引的score值越大,即bicycle得分越大

    w = x2 - x1
    h = y2 - y1
    area = (w * h).astype(float)  #1570*1个提议框

    pick = []
    while len(ind) > 0:
        i = ind[-1]   # 每次循环总取排序后score最大的索引
        pick.append(i) #把得分高对应的索引值放在pick中
        ind = ind[:-1] #处理过的索引值从ind中去除,方便取得下一个比较大的索引

        #取得分最大对应的选择框与得分小于它的所有选择框的最小的IOU交叠区域,共length(ind)个IOU区域
        xx1 = np.maximum(x1[i], x1[ind]) #最大得分的选择框的坐标与其他小于它得分的所有对应的坐标值取最大的坐标值,一标量和一向量的比较得到一个向量
        yy1 = np.maximum(y1[i], y1[ind]) #最大得分的选择框的坐标与其他小于它得分的所有对应的坐标值取最大的坐标值
        xx2 = np.minimum(x2[i], x2[ind]) #最大得分的选择框的坐标与其他小于它得分的所有对应的坐标值取最小的坐标值
        yy2 = np.minimum(y2[i], y2[ind]) #最大得分的选择框的坐标与其他小于它得分的所有对应的坐标值取最小的坐标值

        #若xx2 - xx1小于0,则取0
        #若yy2 - yy1小于0,则取0
        w = np.maximum(0., xx2 - xx1) #
        h = np.maximum(0., yy2 - yy1)

        wh = w * h
        o = wh / (area[i] + area[ind] - wh) #第i个提议框,与ind对应其他索引的提议框的交叠面积比,得到一个比值向量

        ind = ind[np.nonzero(o <= overlap)[0]] #与得分最大的选择框的交叠区域满足阈值,即与之含有较小的重叠区域的其他提议框保留下来!!,参与下一轮的筛选

    return dets[pick, :]
scores = predictions_df['bicycle'] #得到1570个提议框中,bicycle的标签的1057*1的得分向量
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis])) #演水平方向扩展一列,得到each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']这样的ndarray
#
nms_dets = nms_detections(dets) #应用非极大值抑制思想,得到得分较高,与与得分高的提议框交叠较小的所有得分较高的筛选狂

Show top 3 NMS’d detections for ‘bicycle’ in the image and note the gap between the top scoring box (red) and the remaining boxes.

plt.imshow(im)
currentAxis = plt.gca()#获得当前figure的坐标系句柄用于添加检测框
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
    currentAxis.add_patch(
        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
        fill=False, edgecolor=c, linewidth=5)
    )
print 'scores:', nms_dets[:3, 4]

This was an easy instance for bicycle as it was in the class’s training set. However, the person result is a true detection since this was not in the set for that class.

You should try out detection on an image of your own next!

(Remove the temp directory to clean up, and we’re done.)

!rm -rf _temp

你可能感兴趣的:(Caffe.####)