目录
faster rcnn论文备注
主要组件
这个图摘自faster rcnn的论文 重要包含如下几个组件:
CNN层,卷基层的网络接口如下:
faster RCNN卷积
共有13个卷积层后置一个relu的激活, 4个池化.这是CNN部分的caffe prototxt
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "conv1_2"
type: "Convolution"
bottom: "conv1_1"
top: "conv1_2"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
layer {
name: "relu1_2"
type: "ReLU"
bottom: "conv1_2"
top: "conv1_2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#中间层此处省略 #
layer {
name: "conv5_3"
type: "Convolution"
bottom: "conv5_2"
top: "conv5_3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
}
}
layer {
name: "relu5_3"
type: "ReLU"
bottom: "conv5_3"
top: "conv5_3"
}
可以看出每次卷积核(kernel)大小是3,垫置(pad)大小是1,从cs231n#conv中可以看出卷积后大小关系: (W - 3 + 2)/1 + 1 = W,卷积的输入宽高和输出层的宽高大小不变.池化层的参数kernel size = 2, stride = 2以极大值池化,每次池化宽高减半 总共4个池化,最后卷积输出的通道数512(VGG16),feature map大小和输入的缩放图映射对应比例是1/16,卷基层的最终输出是'conv5_3',输入一路送入RPN算出对应的框,一路送入ROI算出对应feature map进行分类 Region Proposal Networks(RPN)
模型中负责生成'框'的网络, 输入是CNN中feature map中n×n的一个滑窗,输出是认为有物体的框和对应得分.一个滑窗的有效覆盖范围是228x228,经过锚点的映射后(缺省scale 和radio都是[0.5:1, 1:1, 2:1])成为9个框,下图出资论文原图针对VGG 可以看出anchor给出的框大小和横纵的适应性,通常一幅图像滑动feature map滑动窗大小是2400,anchor的总数约为20K左右(For a convolutional feature map of a size W � H (typically �2,400), there are WHk anchors intotal.) anchor设计是一个关键点,不用每次将图片resize到不同大小重新计算特征值,所有anchor的预测都是基于同一份feature(The design of multiscale anchors is a key component for sharing features without extra cost for addressing scales.)
RPN接收一个512xHxW的feature map,经过一次卷积之后甩出2路,一路用于生成K个框(2值cls, FG和BG得分),一路生成对应得分(4值bbox标识矩形框),网络结构如下:
layer {
name: "rpn_conv/3x3"
type: "Convolution"
bottom: "conv5_3"
top: "rpn/output"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 512
kernel_size: 3 pad: 1 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "rpn_relu/3x3"
type: "ReLU"
bottom: "rpn/output"
top: "rpn/output"
}
假设原始训练图片的shape(3,h_origianl,w_origianl),每个批次一张图片,经过resize后==>(1, 3, h_resized,w_resized)经过CNN卷积池化操作之后==>(1,512,h_conv,w_conv) w_resized/16 = w_conv,h_resized/16 = h_conv ,经过'rpn_conv/3x3'(F = 3, P = 1, S = 1)后大小依然不变==>(1,512,h_conv,w_conv)但是内容已经图像卷积的feature map运算为RPN的基值(适应RPN loss从CNN的feature map做了一次转化),滑动窗的个数就等于w_conv×h_conv,所有anchor的数目是w_conv×h_conv×k(9)也就说一次rpn的卷积就完成了对全图的feature map生成proposal的过程借助GPU的并行运算能力非常省时,'rpn_conv/3x3'的输出作为'rpn_bbox_pred'和'rpn_cls_score'的输入,'rpn_cls_score'输出shape(1, 18, w_conv, h_conv), 18对应9个anchor的2个得分,因为输入blob shape(N, C, H, W)中NxHxW要等于预测/label的个数,所以这里要reshape一下(参数是shape { dim: 0 dim: 2 dim: -1 dim: 0 } ),再计算cls loss和softmax之前shape变为(1,2,9×h_conv,w_conv)可以参见softmax_loss_layer.cpp的解释:
得出图形所有的anchor scores一路送入计算loss一路走softmax算出FG和BG的概率.'rpn_cls_prob'输出是(1,2, 9*h_conv, w_conv),再reshape回(1,18,h_conv, w_conv)每一个window的9个anchor的概率就出来了,结合对应框送入proposal层;'rpn_conv/3x3'的另一路输出送入了'rpn_bbox_pred'算出对应的框(1,36, h_conv, w_conv),'rpn_bbox_pred'一路计算框的loss另一路送入proposal层;proposal层集合输入的概率和框生成proposal送入ROI层,整体流程如下:
RPN network
caffe代码框架简介
caffe整体结构
要了解faster rcnn的实现细节就要了解caffe的结构,以及如何定制自己的层(layer)
源码结构
src是caffe的实现层
结构如下:
Solver和Net的构造
Solver是一个基础类,封装caffe对外的训练和测试操作,类似tensorflow的optimizer,上面架着sgd,adam等等solver,反向传播更新参数时有些差异,除了直接构造SGDSolver类也可以通过python来创建: self.solver = caffe.SGDSolver(solver_prototxt),公共的基础操作都维护在Solver类中
以一个SGDSolver的构造过程看一下里面的结构和操作SGDSolver的构造器实现直接放进了头文件里,主要是清理一下历史,更新,临时备份的参数,主要工作都在Solver中完成
template
class SGDSolver : public Solver {
public:
explicit SGDSolver(const SolverParameter& param)
: Solver(param) { PreSolve(); }
explicit SGDSolver(const string& param_file)
: Solver(param_file) { PreSolve(); }
virtual inline const char* type() const { return "SGD"; }
void SGDSolver::PreSolve() {
// Initialize the history
const vector*>& net_params = this->net_->learnable_params();
history_.clear();
update_.clear();
temp_.clear();
for (int i = 0; i < net_params.size(); ++i) {
const vector& shape = net_params[i]->shape();
history_.push_back(shared_ptr >(new Blob(shape)));
update_.push_back(shared_ptr >(new Blob(shape)));
temp_.push_back(shared_ptr >(new Blob(shape)));
}
}
// history maintains the historical momentum data.
// update maintains update related data and is not needed in snapshots.
// temp maintains other information that might be needed in computation
// of gradients/updates and is not needed in snapshots
vector > > history_, update_, temp_;
再看Solver的构造, 默认root_solver = nullptr, void ReadSolverParamsFromTextFileOrDie(const string& param_file,SolverParameter* param) 主要是从proto反序列化为SolverParameter对象,针对历史版本做兼容,主要代码在Init中
Solver::Solver(const string& param_file, const Solver* root_solver)
: net_(), callbacks_(), root_solver_(root_solver),
requested_early_exit_(false) {
SolverParameter param;
ReadSolverParamsFromTextFileOrDie(param_file, ¶m);
Init(param);
}
Init()中做了必要的初始化和检查,比如iter_和current_step_,两者关系是:this->current_step_ = this->iter_ / this->param_.stepsize();stepsize是在solver.prototxt中指定,关联学习率的修改
void Solver::Init(const SolverParameter& param) {
CHECK(Caffe::root_solver() || root_solver_)
<< "root_solver_ needs to be set for all non-root solvers";
LOG_IF(INFO, Caffe::root_solver()) << "Initializing solver from parameters: "
<< std::endl << param.DebugString();
param_ = param;
CHECK_GE(param_.average_loss(), 1) << "average_loss should be non-negative.";
CheckSnapshotWritePermissions();
if (Caffe::root_solver() && param_.random_seed() >= 0) {
Caffe::set_random_seed(param_.random_seed());
}
// Scaffolding code
InitTrainNet();
if (Caffe::root_solver()) {
InitTestNets();
LOG(INFO) << "Solver scaffolding done.";
}
iter_ = 0;
current_step_ = 0;
}
往下再看InitTrainNet()函数,这里写伪代码突出重点和流向,依照这log可以看出代码的流向:
solver.cpp:81] Creating training net from train_net file: models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt
void Solver::InitTrainNet() {
//训练部分参数的检查,包含有训练的网络参数,是否指定训练文件等等
deserialize train net file -> net_param
net_.reset(new Net(net_param));
}
重点部分在Net的初始化,抽取的伪代码如下:
void Net::Init(const NetParameter& in_param) {
//过滤参数
FilterNet(in_param, &filtered_param);
// Create a copy of filtered_param with splits added where necessary.
NetParameter param;
InsertSplits(filtered_param, ¶m);
memory_used_ = 0;
// set the input blobs
for (int input_id = 0; input_id < param.input_size(); ++input_id) {
const int layer_id = -1;
// inputs have fake layer ID -1,设置输入数据blob
// Helper for Net::Init: add a new input or top blob to the net. (Inputs have
// layer_id == -1, tops have layer_id >= 0.)
//构造设置关键的变量,vector > > blobs_( @brief the blobs storing intermediate results between the layer.) blob_names_, blob_need_backward_, net_input_blob_indices_, net_input_blobs_等等
AppendTop(param, layer_id, input_id, &available_blobs, &blob_name_to_idx);
for (int layer_id = 0; layer_id < param.layer_size(); ++layer_id) {
//构造每一层的layer, 这里使用类工厂的设计模型,通过宏来控制把构造函数放进注册中心,里面会设置blobs_,后面blobs_会伸出来在net以不同纬度共享引用
layers_.push_back(LayerRegistry::CreateLayer(layer_param));
// Figure out this layer's input and output
for (int bottom_id = 0; bottom_id < layer_param.bottom_size();
++bottom_id) {
//构造每一层input blob,此处bottom_vecs_和blobs_通过指针共享blob对象
const int blob_id = AppendBottom(param, layer_id, bottom_id,&available_blobs, &blob_name_to_idx);
// If a blob needs backward, this layer should provide it.
need_backward |= blob_need_backward_[blob_id];
}
//设置每一个layer的输出, top_vecs_和blobs_通过指针共享blob对象
for (int top_id = 0; top_id < num_top; ++top_id) {
AppendTop(param, layer_id, top_id, &available_blobs,&blob_name_to_idx);
}
//根据网络设置layer->AutoTopBlobs(),创建自动输出的top的blob对象
//调用每一层的初始化函数
layers_[layer_id]->SetUp(bottom_vecs_[layer_id], top_vecs_[layer_id]);
//根据每层内的参数是否设置了learning rate设置反向传播标致,构造每层的参数
for (int param_id = 0; param_id < num_param_blobs; ++param_id) {
layers_[layer_id]->set_param_propagate_down(param_id, param_need_backward);
AppendParam(param, layer_id, param_id);
}
}
// Handle force_backward if needed.
for (int layer_id = layers_.size() - 1; layer_id >= 0; --layer_id) {
set layer_contributes_loss flag
set layer_need_backward_
}
// In the end, all remaining blobs are considered output blobs.
for (set::iterator it = available_blobs.begin();
it != available_blobs.end(); ++it) {
net_output_blobs_.push_back(blobs_[blob_name_to_idx[*it]].get());
net_output_blob_indices_.push_back(blob_name_to_idx[*it]);
}
LOG_IF(INFO, Caffe::root_solver()) << "Network initialization done.";
}
至此solver -> net -> layer的初始化构造就完成了, 至于每一个layer定制的实现(卷积,池化,定制层)如何耦合进入框架稍后会有分析,整个过程图解如下:
SGDSolver构造
训练一次的step
网络构造完成后,就可以训练了, 一般的训练过程是:读入一批数据数据 -> 正向传播 -> 基于ground true计算loss ->反向求偏导映射到每个可以训练的layer上根据训练策略更新参数.
while (cur < max_repeat){
data, result_group_true = read_data()
result_calc = front_propagation(data);
loss = calc_loss(result_calc, result_group_true);
dws = compute_partial_derivative_4w(loss)
update_w_by_strategy()
}
caffe把一次训练封装成一次step, SGDSolver直接调用Solver的step.抽取关键部分,代码如下:
void Solver::Step(int iters) {
end_iter = cur + iters
while (cur < end_iter){
clear_up()
insert_test_if_need()
hookup_before()
Dtype loss = 0;
for (int i = 0; i < param_.iter_size(); ++i) {
loss += net_->ForwardBackward(bottom_vec);
}
loss /= param_.iter_size();
// average the loss across iterations for smoothed reporting,若average_loss为n:loss_容器里面就会存储前n个loss的值,而smooth_loss_相当于做了一个loss平均
UpdateSmoothedLoss(loss, start_iter, average_loss);
hookup_after()
ApplyUpdate();
take_snapshot_if_necessary()
}
}
显而易见重点就是net_的ForwardBackward(const vector
首先看下Net的ForwardBackward(const vector
Dtype ForwardBackward(const vector* > & bottom) {
Dtype loss;
Forward(bottom, &loss);
Backward();
return loss;
}
这里有一个点有些奇怪, Step(int iter)中声明的vector
net_input_blobs_等于啥都没放
const vector*>& Net::Forward(
const vector*> & bottom, Dtype* loss) {
// Copy bottom to internal bottom
for (int i = 0; i < bottom.size(); ++i) {
net_input_blobs_[i]->CopyFrom(*bottom[i]);
}
return ForwardPrefilled(loss);
}
其中ForwardPrefilled(Dtype* loss)调用了ForwardFromTo(int start, int end),这里要做全网络的FP, 所以是*loss = ForwardFromTo(0, layers_.size() - 1);去除冗余的检查和debug信息后,代码非常凝练,这里就完成各个layer之间按照层级FG加loss的组织,各个层只要实现好自己Forward函数就好了
Dtype Net::ForwardFromTo(int start, int end) {
for (int i = start; i <= end; ++i) {
// LOG(ERROR) << "Forwarding " << layer_names_[i];
Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]);
loss += layer_loss;
}
return loss;
}
在Forward(bottom, &loss);完成后接着进行反向传播Backward(),Backward()除了打了debug信息就调用了BackwardFromTo(layers_.size() - 1, 0);
void Net::BackwardFromTo(int start, int end) {
for (int i = start; i >= end; --i) {
if (layer_need_backward_[i]) {
layers_[i]->Backward(top_vecs_[i], bottom_need_backward_[i], bottom_vecs_[i]);
}
}
}
每一层实现的函数原型是自己定制caffe layer Backward函数,从上面的loss偏导(error gradient)求出本层输入对应的偏导,propagate_down标识对应'bottom'是否计算loss偏导,标识函数原型如下:
/**
* @brief Given the top blob error gradients, compute the bottom blob error
* gradients.
*
* @param top
* the output blobs, whose diff fields store the gradient of the error
* with respect to themselves
* @param propagate_down
* a vector with equal length to bottom, with each index indicating
* whether to propagate the error gradients down to the bottom blob at
* the corresponding index
* @param bottom
* the input blobs, whose diff fields will store the gradient of the error
* with respect to themselves after Backward is run
*
* The Backward wrapper calls the relevant device wrapper function
* (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the
* top blob diffs.
*
* Your layer should implement Backward_cpu and (optionally) Backward_gpu.
*/
inline void Backward(const vector*>& top,
const vector& propagate_down,
const vector*>& bottom);
这样反向转一遍,bottom_vecs_中就保存着偏导信息.有一点值得注意,net_中包含全量信息(偏导,参数,中间的输入输出),bottom_vecs_指向的blobs_的某些块儿
/// @brief the blobs storing intermediate results between the layer.
vector > > blobs_;
/// bottom_vecs stores the vectors containing the input for each layer.
/// They don't actually host the blobs (blobs_ does), so we simply store
/// pointers.
vector*> > bottom_vecs_;
bottom_vecs_[layer_id].push_back(blobs_[blob_id].get());
至此一次正向传播算loss,一次反向传播算error gradient就完成了,剩下的就是如何更新参数了,以简单的SGD为例
void SGDSolver::ApplyUpdate() {
Dtype rate = GetLearningRate();
ClipGradients();
for (int param_id = 0; param_id < this->net_->learnable_params().size();
++param_id) {
Normalize(param_id);
Regularize(param_id);
ComputeUpdateValue(param_id, rate);
}
this->net_->Update();
}
此处caffe里的clip gradient是什么意思?可以参考一下,大概的意思是限速,这不妨碍主流程.
对于每一个learnable的参数都是进行了一次Normalize, Regularize,然后更新参数.之前在Init时有在每一层AppendParam(net_param, layer_id, param_id);进行映射
params_.push_back(layers_[layer_id]->blobs()[param_id]);
if (xx condition){
...
const int learnable_param_id = learnable_params_.size();
learnable_params_.push_back(params_[net_param_id].get());
...
}
更新参数时就是对learnable的那些blob进行axpy操作,一般在CPU模式下是调用BLAS的cblas_daxpy(N, alpha, X, 1, Y, 1),如果是GPU模式下是cublasSaxpy(Caffe::cublas_handle(), N, &alpha, X, 1, Y, 1).操作data = A*diff + data,完成参数更新:
blob基于error gradient更新参数
至此一次迭代FG->loss&BG->update就大体清楚了
class LayerRegistry {
public:
//函数指针类型定义
typedef shared_ptr > (*Creator)(const LayerParameter&);
typedef std::map CreatorRegistry;
static CreatorRegistry& Registry() {
//全局通过name找到构造layer函数指针
static CreatorRegistry* g_registry_ = new CreatorRegistry();
return *g_registry_;
}
// Adds a creator. 添加layer类型
static void AddCreator(const string& type, Creator creator) {
//check exist ...
registry[type] = creator;
}
// Get a layer using a LayerParameter.构造一个新的layer对象
static shared_ptr > CreateLayer(const LayerParameter& param) {
//例行检查
return registry[type](param);
}
private:
//确保单例
LayerRegistry() {}
};
LayerRegistry是注册条目,有LayerRegisterer管理,代码如下: class LayerRegisterer {
public:
LayerRegisterer(const string& type,
shared_ptr > (*creator)(const LayerParameter&)) {
LayerRegistry::AddCreator(type, creator);
}
};
#define REGISTER_LAYER_CREATOR(type, creator) \
static LayerRegisterer g_creator_f_##type(#type, creator); \
static LayerRegisterer g_creator_d_##type(#type, creator) \
#define REGISTER_LAYER_CLASS(type) \
template \
shared_ptr > Creator_##type##Layer(const LayerParameter& param) \
{ \
return shared_ptr >(new type##Layer(param)); \
} \
REGISTER_LAYER_CREATOR(type, Creator_##type##Layer)
只要是调到了LayerRegisterer的构造器就LayerRegistry放入了类工厂,后面就可以实例化对象了.caffe就是通过宏动态生成的代码,把customer的层加入到框架里的,可以参考layer_factory.hpp的注释
layer_factory.hpp
也就是在实现层cpp加入REGISTER_LAYER_CLASS宏就可以了,之前ngx build自己添加的plug in 指定cover那几个circle也是通过类似的宏手段控制编译的代码.
roi_pooling_layer.cp
template
shared_ptr > Creator_ROIPoolingLayer(const LayerParameter& param)
{
return shared_ptr >(new ROIPoolingLayer(param));
}
//这里就调用了LayerRegisterer的构造器进而创建了LayerRegistry,这里创建一个float,一个double的
static LayerRegisterer g_creator_f_ROIPooling(ROIPooling, creator);
static LayerRegisterer g_creator_d_ROIPooling(ROIPooling, creator)
layer {
name: 'input-data'
#指定类型
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
#python文件
module: 'roi_data_layer.layer'
#对应的class
layer: 'RoIDataLayer'
#传递给python的参数
param_str: "'num_classes': 21"
}
}
以上就是一个加单的python层的定义,不涉及具体含义,先看下接口定义,和c++层一样需要实现forward,backward,setup,reshape class RoIDataLayer(caffe.Layer):
def setup(self, bottom, top):
"""Setup the RoIDataLayer."""
layer_params = yaml.load(self.param_str_)
#prototxt中定义参数传递到代码中
self._num_classes = layer_params['num_classes']
...
def forward(self, bottom, top):
"""Get blobs and copy them into this layer's top blob vector."""
blobs = self._get_next_minibatch()
for blob_name, blob in blobs.iteritems():
top_ind = self._name_to_top_map[blob_name]
# Reshape net's input blobs
top[top_ind].reshape(*(blob.shape))
# Copy data into net's input blobs
top[top_ind].data[...] = blob.astype(np.float32, copy=False)
def backward(self, top, propagate_down, bottom):
"""This layer does not propagate gradients."""
pass
def reshape(self, bottom, top):
"""Reshaping happens during the call to forward."""
pass
当然python层只能在cpu模式下运行,不能高效的使用GPU,使用中还是要做适当的trade offfaster rcnn代码分析
time ./tools/train_net.py --gpu ${GPU_ID} \
--solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/ solver.prototxt \
--weights data/imagenet_models/${NET}.v2.caffemodel \
--imdb ${TRAIN_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/faster_rcnn_end2end.yml \
${EXTRA_ARGS}
time ./tools/test_net.py --gpu ${GPU_ID} \
--def models/${PT_DIR}/${NET}/faster_rcnn_end2end/t est.prototxt \
--net ${NET_FINAL} \
--imdb ${TEST_IMDB} \
--cfg experiments/cfgs/faster_rcnn_end2end.yml \
${EXTRA_ARGS}
训练入口在train_net.py中,测试入口在test_net.py中.抽取重要逻辑train_net.py中逻辑如下 import caffe
self.solver = caffe.SGDSolver(solver_prototxt)
while self.solver.iter < max_iters:
# Make one SGD update
self.solver.step(1)
take_snapshot_if_necessary()
return model_paths
之前我们已经讲过了SGDSolver的初始化过程和Step流程.import caffe这一句已经包含所有需要的东西了,但是遍历caffe的python目录,也没有caffe.py这个文件, 其实import不仅可以import py文件也可以import目录,只要这个目录有__init__.py(不学习caffe还真不知道python有这个用法,可以参考下what-is-init-py-for) python/caffe的目录结构 python/caffe
看下__init__.pyfrom .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
from ._caffe import set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed
from ._caffe import __version__
from .proto.caffe_pb2 import TRAIN, TEST
from .classifier import Classifier
from .detector import Detector
from . import io
from .net_spec import layers, params, NetSpec, to_proto
可以看出SGDSolver是从pycaffe中取得的 from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
RMSPropSolver, AdaDeltaSolver, AdamSolver
_caffe.so是从_caffe.cpp编译出来的,看下caffe.cpp的代码,是基于boost python编译出来的python module将python函数&类映射成c++的函数&类,关键部分代码如下: namespace bp = boost::python;
// Selecting mode.
void set_mode_gpu() { Caffe::set_mode(Caffe::GPU); }
//所以编译出来是_caffe.so的python模块
BOOST_PYTHON_MODULE(_caffe) {
//import的caffe模块属性映射
bp::scope().attr("__version__") = AS_STRING(CAFFE_VERSION);
//函数映射
bp::def("set_mode_gpu", &set_mode_gpu);
//类映射,python端使用默认构造器
bp::class_, shared_ptr >, boost::noncopyable>(
"Solver", bp::no_init)
//属性映射
.add_property("net", &Solver::net)
.add_property("test_nets", bp::make_function(&Solver::test_nets,
bp::return_internal_reference<>()))
.add_property("iter", &Solver::iter)
.def("solve", static_cast::*)(const char*)>(
&Solver::Solve), SolveOverloads())
//关键函数
.def("step", &Solver::Step)
.def("restore", &Solver::Restore)
.def("snapshot", &Solver::Snapshot);
//SGDSolver继承Solver,需要一个string参数构造器,explicit SGDSolver(const string& param_file) : Solver(param_file) { PreSolve(); }
bp::class_, bp::bases >,
shared_ptr >, boost::noncopyable>(
"SGDSolver", bp::init());
}
这样整个流程从python到c++的串联就完成了input-data层
这层的目的是读入数据,做预处理,输出:图片内容(index:0, name:'data');图像宽高,缩放比例(index:1, name:'im_info'); label和ground true框信息(index:2, name:'gt_box')如图所示
input输出
, data/im_info/gt_box送入'rpn-data'层出score的loss,data送入卷基层,gt_boxes送入'roi-data'层(集合proposal输出roi),im_info送入'proposal'层生成proposallayer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': N"
}
}
代码在roi_data_layer/layer.py中 def forward(self, bottom, top):
"""Get blobs and copy them into this layer's top blob vector."""
# 获得blob数据,key-value形式,按照name 设置top的输出顺序.
blobs = self._get_next_minibatch()
for blob_name, blob in blobs.iteritems():
top_ind = self._name_to_top_map[blob_name]
# Reshape net's input blobs
top[top_ind].reshape(*(blob.shape))
# Copy data into net's input blobs
top[top_ind].data[...] = blob.astype(np.float32, copy=False)
在_get_next_minibatch中,USE_PREFETCH默认是不开启的,作者发现没有太大作用('So far I haven't found this useful; likely more engineering work is required').当前拿的batch图片是否是一个新的epoch,如果是就shuffle一下,为了更好的性能shuffle的时候按照横图和纵图分组.拿到的是lmdb的项,minibatch.py中的get_minibatch获得完整数据, 这里有一个点需要注意一下, config.py和在脚本中指定的experiments/cfgs/faster_rcnn_end2end.yml融合成的配置,实际生效的配置需要再检查一下log('IMS_PER_BATCH': 1) def _get_next_minibatch_inds(self):
"""Return the roidb indices for the next minibatch."""
if self._cur + cfg.TRAIN.IMS_PER_BATCH >= len(self._roidb):
self._shuffle_roidb_inds()
#_perm保存的是排序的索引
db_inds = self._perm[self._cur:self._cur + cfg.TRAIN.IMS_PER_BATCH]
self._cur += cfg.TRAIN.IMS_PER_BATCH
return db_inds
def _get_next_minibatch(self):
"""Return the blobs to be used for the next minibatch.
If cfg.TRAIN.USE_PREFETCH is True, then blobs will be computed in a
separate process and made available through self._blob_queue.
"""
if cfg.TRAIN.USE_PREFETCH:
return self._blob_queue.get()
else:
#获得这个batch的lmdb索引
db_inds = self._get_next_minibatch_inds()
#lmdb记录
minibatch_db = [self._roidb[i] for i in db_inds]
#从对应lmdb记录转成图像数据输出,框信息 label信息,图片大小信息&缩放信息
return get_minibatch(minibatch_db, self._num_classes)
def _shuffle_roidb_inds(self):
"""Randomly permute the training roidb."""
# Make minibatches from images that have similar aspect ratios (i.e. both tall and thin or both short and wide) in order to avoid wasting computation on zero-padding.通过横纵group避免zero padding
if cfg.TRAIN.ASPECT_GROUPING:
widths = np.array([r['width'] for r in self._roidb])
heights = np.array([r['height'] for r in self._roidb])
horz = (widths >= heights)
vert = np.logical_not(horz)
#横图
horz_inds = np.where(horz)[0]
#纵图
vert_inds = np.where(vert)[0]
inds = np.hstack((
np.random.permutation(horz_inds),
np.random.permutation(vert_inds)))
# 2个一组,绝大多数同一组的形状一致
inds = np.reshape(inds, (-1, 2))
row_perm = np.random.permutation(np.arange(inds.shape[0]))
#以2个一组打算为单元重排,拉倒一层里,相邻的形状一致,之所以是两个一组,猜想是默认的__C.TRAIN.IMS_PER_BATCH = 2
inds = np.reshape(inds[row_perm, :], (-1,))
self._perm = inds
else:
self._perm = np.random.permutation(np.arange(len(self._roidb)))
self._cur = 0
这是基础输出的log辅助理解代码: horz = [ True True True ..., True True True], horz = [False False False ..., False False False]
horz_inds = [ 0 1 2 ..., 186205 186206 186207], vert_inds = [ 6 43 65 ..., 186176 186186 186194]
inds = [163257 59770 49424 ..., 56475 31817 126653]
inds = [[163257 59770]
[ 49424 41168]
[156295 1803]
...,
[ 99367 20315]
[142904 56475]
[ 31817 126653]]
row_perm = [77629 51661 58201 ..., 91810 47169 48787]
inds = [118195 143322 121405 ..., 19415 18933 26468]
这样就返回了一batch的lmdb记录的索引,从_roi中找到对应lmdb记录,get_minibatch负责读取,以下是伪代码 def get_minibatch(roidb, num_classes):
"""Given a roidb, construct a minibatch sampled from it."""
num_images = len(roidb)
# Sample random scales to use for each image in this batch
#其实SCALES只有一个是600,这么写是为了支持缩放到多个尺寸
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES),
size=num_images)
#这里BATCH_SIZE = num_images, 在yml指定为1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
# Get the input image blob, formatted for caffe
# 传入lmdb记录和比例的索引
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
#数据 batch序号:C:H:W
blobs = {'data': im_blob}
#faster rcnn主要就是使用RPN
if cfg.TRAIN.HAS_RPN:
gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
#label框乘以缩放比例 = 统一缩放输入的框大小
gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
#对应分类一起赋值
gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
blobs['gt_boxes'] = gt_boxes
#'im_info' = (H,W, im_scale)
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
_get_image_blob在minibatch.py中, 处理缩放和把opencv imread的image数据转换成blob def _get_image_blob(roidb, scale_inds):
"""Builds an input blob from the images in the roidb at the specified
scales.
"""
num_images = len(roidb)
processed_ims = []
im_scales = []
for i in xrange(num_images):
im = cv2.imread(roidb[i]['image'])
#target_size = 600
target_size = cfg.TRAIN.SCALES[scale_inds[i]]
#做缩放 返回图像&比例
im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,
cfg.TRAIN.MAX_SIZE)
im_scales.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
#做格式转换
blob = im_list_to_blob(processed_ims)
return blob, im_scales
prep_im_for_blob和im_list_to_blob都是util下blob的方法 def im_list_to_blob(ims):
"""Convert a list of images into a network input.
Assumes images are already prepared (means subtracted, BGR order, ...).
"""
图像的shape是H * W * 通道数, 取图像中最大的shape(np.array([(100, 5, 3), (110, 4, 3)]).max(axis=0) --> array([110, 5, 3]))
max_shape = np.array([im.shape for im in ims]).max(axis=0)
num_images = len(ims)
blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
dtype=np.float32)
for i in xrange(num_images):
im = ims[i]
#序号:H:W:C
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
# Move channels (axis 3) to axis 1
# Axis order will become: (batch elem, channel, height, width)
channel_swap = (0, 3, 1, 2)
#交换shape的维度内的内容
blob = blob.transpose(channel_swap)
return blob
def prep_im_for_blob(im, pixel_means, target_size, max_size):
"""Mean subtract and scale an image for use in a blob."""
# type(im) = numpy array, uint8 -> float
im = im.astype(np.float32, copy=False)
# 减均值预处理
im -= pixel_means
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
#缩放比率 原图W/H * scale = 目标图像大小,短边缩放的600
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
# 图像有最大限制,默认1000, 以上面的缩放比率是否超限,假如超限就用最大允许大小缩放
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
return im, im_scale
至此input层就大体清晰了,为什么之前看到前向传播时没有赋值input的blob(Dtype ForwardBackward(const vector layer {
name: 'rpn-data'
type: 'Python'
bottom: 'rpn_cls_score'
bottom: 'gt_boxes'
bottom: 'im_info'
bottom: 'data'
top: 'rpn_labels'
top: 'rpn_bbox_targets'
top: 'rpn_bbox_inside_weights'
top: 'rpn_bbox_outside_weights'
python_param {
module: 'rpn.anchor_target_layer'
layer: 'AnchorTargetLayer'
param_str: "'feat_stride': 16"
}
}
rpn-data
参数只有一个是步长, class是anchor_target_layer, 实现接口setup,forward, 这层是输出框和label,为下面计算loss所用,不可训练所以backward和reshape都是空实现,依次看setup代码如下:
def setup(self, bottom, top):
layer_params = yaml.load(self.param_str_)
# prototxt没指定, 默认的anchor缩放比例大小
anchor_scales = layer_params.get('scales', (8, 16, 32))
#对应一个卷积的K(9)个框, (左上坐标,右下坐标)
self._anchors = generate_anchors(scales=np.array(anchor_scales))
self._num_anchors = self._anchors.shape[0]
self._feat_stride = layer_params['feat_stride']
# allow boxes to sit over the edge by a small amount
self._allowed_border = layer_params.get('allowed_border', 0)
height, width = bottom[0].data.shape[-2:]
A = self._num_anchors
# labels
top[0].reshape(1, 1, A * height, width)
# bbox_targets
top[1].reshape(1, A * 4, height, width)
# bbox_inside_weights
top[2].reshape(1, A * 4, height, width)
# bbox_outside_weights
top[3].reshape(1, A * 4, height, width)
其中generate_anchor在generate_anchor.py中,借助numpy完成 def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
# base anchor :np array [0,0, 15, 15]
base_anchor = np.array([1, 1, base_size, base_size]) - 1
# 宽高比扩展:纵框,平框,横框
ratio_anchors = _ratio_enum(base_anchor, ratios)
# 在base anchor大小的基础上针对大小扩展: x8, x16, x32
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])])
return anchors
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
#转换成w,h,中心坐标
w, h, x_ctr, y_ctr = _whctrs(anchor)
#原始面积
size = w * h
#base anchor是一个正方形,假设边长为n, new w = n/(√radio), new h = n*√radio,新的边长具有如下特点:面积大体不变(忽略上下round的损失),w/h = radio,也就说这样计算完在面积大体不变的情况下:实现宽高按照raio设定的比例走,有点像拉长和压扁
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
#转成坐标形式,_whctrs的逆操作
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
#按照面积比例扩展,实际是scales元素的平方扩展
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
接下来是forward,代码比较复杂,抽取伪代码看思路和方法. def forward(self, bottom, top):
# Algorithm:
#
# for each (H, W) location i
# generate 9 anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the 9 anchors
# filter out-of-image anchors
# measure GT overlap
assert bottom[0].data.shape[0] == 1, \
'Only single item batches are supported'
# map of shape (..., H, W),此处是框的得分,reshape = (1,18,H,W)
height, width = bottom[0].data.shape[-2:]
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = bottom[1].data
# im_info
im_info = bottom[2].data[0, :]
# 1. Generate proposals from bbox deltas and shifted anchors
# 这块的思路是生成一系列的shift, 然后每一个shift和9个anchor想加,迭代出每一个位置的9个框
shift_x = np.arange(0, width) * self._feat_stride
shift_y = np.arange(0, height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
#经过meshgrid shift_x = [[ 0 16 32 ..., 560 576 592] [ 0 16 32 ..., 560 576 592] [ 0 16 32 ..., 560 576 592] ..., [ 0 16 32 ..., 560 576 592] [ 0 16 32 ..., 560 576 592] [ 0 16 32 ..., 560 576 592]]
#shift_y = [[ 0 0 0 ..., 0 0 0] [ 16 16 16 ..., 16 16 16] [ 32 32 32 ..., 32 32 32] ..., [560 560 560 ..., 560 560 560] [576 576 576 ..., 576 576 576] [592 592 592 ..., 592 592 592]]
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
#转至之后形成所有位移
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0]
# numpy array + 操作_anchors中每一个anchor和每一个shift想加等出结果
all_anchors = (self._anchors.reshape((1, A, 4)) +
shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
#K个位移,每个位移A个框
all_anchors = all_anchors.reshape((K * A, 4))
total_anchors = int(K * A)
# only keep anchors inside the image,框在图片内
inds_inside = np.where(
(all_anchors[:, 0] >= -self._allowed_border) &
(all_anchors[:, 1] >= -self._allowed_border) &
(all_anchors[:, 2] < im_info[1] + self._allowed_border) & # width
(all_anchors[:, 3] < im_info[0] + self._allowed_border) # height
)[0]
# keep only inside anchors
anchors = all_anchors[inds_inside, :]
# label: 1 is positive, 0 is negative, -1 is dont care
labels = np.empty((len(inds_inside), ), dtype=np.float32)
labels.fill(-1)
# overlaps between the anchors and the gt boxes
# overlaps (ex, gt), 每个框对应每个box的重合面积,overlaps [anchor数目,box数目]
overlaps = bbox_overlaps(
np.ascontiguousarray(anchors, dtype=np.float),
np.ascontiguousarray(gt_boxes, dtype=np.float))
# 针对每一个anchor内覆盖率最高的索引
argmax_overlaps = overlaps.argmax(axis=1)
# 从索引取覆盖率, 每一个anchor覆盖最大的box的覆盖率
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
# 从box出发覆盖最好的anchor的索引
gt_argmax_overlaps = overlaps.argmax(axis=0)
#取覆盖最好的anchor全部box的覆盖值
gt_max_overlaps = overlaps[gt_argmax_overlaps,
np.arange(overlaps.shape[1])]
#match的anchor
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels first so that positive labels can clobber them
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# fg label: for each gt, anchor with highest overlap
labels[gt_argmax_overlaps] = 1
# fg label: above threshold IOU
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels last so that negative labels can clobber positives
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# subsample positive labels if we have too many
#最好是各FG,BG占一半,FG不足BG补充
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:
disable_inds = npr.choice(
fg_inds, size=(len(fg_inds) - num_fg), replace=False)
labels[disable_inds] = -1
# subsample negative labels if we have too many
num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
disable_inds = npr.choice(
bg_inds, size=(len(bg_inds) - num_bg), replace=False)
labels[disable_inds] = -1
# 算出anchor和ground true box的dx,dy, dw,dh的偏差
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)
bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0)
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
(cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
np.sum(labels == 1))
negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights
# map up to original set of anchors
labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
# labels
labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
labels = labels.reshape((1, 1, A * height, width))
top[0].reshape(*labels.shape)
top[0].data[...] = labels
# bbox_targets
bbox_targets = bbox_targets \
.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
top[1].reshape(*bbox_targets.shape)
top[1].data[...] = bbox_targets
# bbox_inside_weights
bbox_inside_weights = bbox_inside_weights \
.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
assert bbox_inside_weights.shape[2] == height
assert bbox_inside_weights.shape[3] == width
top[2].reshape(*bbox_inside_weights.shape)
top[2].data[...] = bbox_inside_weights
# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights \
.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
assert bbox_outside_weights.shape[2] == height
assert bbox_outside_weights.shape[3] == width
top[3].reshape(*bbox_outside_weights.shape)
top[3].data[...] = bbox_outside_weights
* proposal层
* roi-data层
后记
看到讲解faster rcnn的文章无一都要陌拜一下Ross Girshick大神,这里我也膜拜一下,确实厉害.论文写得非常有深度
作者:db24cc
链接:https://www.jianshu.com/p/00a6a6efd83d
來源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。