Caffe的solver类提供了6种优化算法,配置文件中可以通过type关键字设置:
Stochastic Gradient Descent (type: “SGD”)
AdaDelta (type: “AdaDelta”)
Adaptive Gradient (type: “AdaGrad”)
Adam (type: “Adam”)
Nesterov’s Accelerated Gradient (type: “Nesterov”)
RMSprop (type: “RMSProp”)
Solver.prototxt文件:
首先设计好需要优化的对象,以及用于学习的训练网络和测试网络的prototxt文件(通常是train.prototxt和test.prototxt文件)通过forward和backward迭代进行优化来更新参数定期对网络进行评价优化过程中显示模型和solver的状态
solver参数
1)base_lr
这个参数代表的是此网络最开始的学习速率(Beginning Learning rate),一般是个浮点数,根据机器学习中的知识,lr过大会导致不收敛,过小会导致收敛过慢,所以这个参数设置也很重要。
2)lr_policy
这个参数代表的是learning rate应该遵守什么样的变化规则,这个参数对应的是字符串,选项及说明如下:
3)gamma
这个参数就是和learning rate相关的,lr_policy中包含此参数的话,需要进行设置,一般是一个实数。
4)stepsize
This parameter indicates how often (at some iteration count) that we should move onto the next “step” of training. This value is a positive integer.
5)stepvalue
This parameter indicates one of potentially many iteration counts that we should move onto the next “step” of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.
6)max_iter)
最大迭代次数,这个数值告诉网络何时停止训练,太小会达不到收敛,太大会导致震荡,为正整数。
7)momentum
上一次梯度更新的权重,real fraction
8)weight_decay
权重衰减项,用于防止过拟合。
9)solver_mode
选择CPU训练或者GPU训练。
10)snapshot)
训练快照,确定多久保存一次model和solverstate,positive integer。
11)snapshot_prefix
snapshot的前缀,就是model和solverstate的命名前缀,也代表路径。
12)net
path to prototxt (train and val)
13)test_iter
每次test_interval的test的迭代次数,假设测试样本总数为10000张图片,一次性执行全部的话效率很低,所以将测试数据分为几个批次进行测试,每个批次的数量就是batch_size。如果batch_size=100,那么需要迭代100次才能将10000个数据全部执行完,所以test_iter设置为100。
14)test_interval
测试间隔,每训练多少次进行一次测试。
15)display)
间隔多久对结果进行输出
16)iter_size
这个参数乘上train.prototxt中的batch size是你实际使用的batch size。 相当于读取batchsize * itersize个图像才做一下gradient decent。 这个参数可以规避由于gpu内存不足而导致的batchsize的限制 因为你可以用多个iteration做到很大的batch 即使单次batch有限。
17)average_loss
取多次foward的loss作平均,进行显示输出。
template<>
void caffe_cpu_gemm<float>(const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C) {
int lda = (TransA == CblasNoTrans) ? K : M;
int ldb = (TransB == CblasNoTrans) ? N : K;
cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
ldb, beta, C, N);
}
template<>
void caffe_cpu_gemm<double>(const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
const double alpha, const double* A, const double* B, const double beta,
double* C) {
int lda = (TransA == CblasNoTrans) ? K : M;
int ldb = (TransB == CblasNoTrans) ? N : K;
cblas_dgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
ldb, beta, C, N);
}
功能: C=alpha*A*B+beta*C
A,B,C 是输入矩阵(一维数组格式)
CblasRowMajor :数据是行主序的(二维数据也是用一维数组储存的)
TransA, TransB:是否要对A和B做转置操作(CblasTrans CblasNoTrans)
M: A、C 的行数
N: B、C 的列数
K: A 的列数, B 的行数
lda : A的列数(不做转置)行数(做转置)
ldb: B的列数(不做转置)行数(做转置)
template <>
void caffe_cpu_gemv<float>(const CBLAS_TRANSPOSE TransA, const int M,
const int N, const float alpha, const float* A, const float* x,
const float beta, float* y) {
cblas_sgemv(CblasRowMajor, TransA, M, N, alpha, A, N, x, 1, beta, y, 1);
}
template <>
void caffe_cpu_gemv<double>(const CBLAS_TRANSPOSE TransA, const int M,
const int N, const double alpha, const double* A, const double* x,
const double beta, double* y) {
cblas_dgemv(CblasRowMajor, TransA, M, N, alpha, A, N, x, 1, beta, y, 1);
}
功能: y=alpha*A*x+beta*y
其中X和Y是向量,A 是矩阵
M:A 的行数
N:A 的列数
cblas_sgemv 中的 参数1 表示对X和Y的每个元素都进行操作
template <>
void caffe_axpy<float>(const int N, const float alpha, const float* X,
float* Y) { cblas_saxpy(N, alpha, X, 1, Y, 1); }
template <>
void caffe_axpy<double>(const int N, const double alpha, const double* X,
double* Y) { cblas_daxpy(N, alpha, X, 1, Y, 1); }
功能: Y=alpha*X+Y
N:为X和Y中element的个数
template <typename Dtype>
void caffe_set(const int N, const Dtype alpha, Dtype* Y) {
if (alpha == 0) {
memset(Y, 0, sizeof(Dtype) * N); // NOLINT(caffe/alt_fn)
return;
}
for (int i = 0; i < N; ++i) {
Y[i] = alpha;
}
}
template void caffe_set<int>(const int N, const int alpha, int* Y);
template void caffe_set<float>(const int N, const float alpha, float* Y);
template void caffe_set<double>(const int N, const double alpha, double* Y);
功能:用常数 alpha 对 Y 进行初始化
函数 void *memset(void *buffer, char c, unsigned count) 一般为新申请的内存做初始化,功能是将buffer所指向内存中的每个字节的内容全部设置为c指定的ASCII值, count为块的大小
template <>
void caffe_add_scalar(const int N, const float alpha, float* Y) {
for (int i = 0; i < N; ++i) {
Y[i] += alpha;
}
}
template <>
void caffe_add_scalar(const int N, const double alpha, double* Y) {
for (int i = 0; i < N; ++i) {
Y[i] += alpha;
}
}
功能: 给 Y 的每个 element 加上常数 alpha
template <typename Dtype>
void caffe_copy(const int N, const Dtype* X, Dtype* Y) {
if (X != Y) {
if (Caffe::mode() == Caffe::GPU) {
#ifndef CPU_ONLY
// NOLINT_NEXT_LINE(caffe/alt_fn)
CUDA_CHECK(cudaMemcpy(Y, X, sizeof(Dtype) * N, cudaMemcpyDefault));
#else
NO_GPU;
#endif
} else {
memcpy(Y, X, sizeof(Dtype) * N); // NOLINT(caffe/alt_fn)
}
}
}
template void caffe_copy<int>(const int N, const int* X, int* Y);
template void caffe_copy<unsigned int>(const int N, const unsigned int* X,
unsigned int* Y);
template void caffe_copy<float>(const int N, const float* X, float* Y);
template void caffe_copy<double>(const int N, const double* X, double* Y);
函数 void *memcpy(void *dest, void *src, unsigned int count) 把src所指向的内存区域 copy到dest所指向的内存区域, count为块的大小
emplate <>
void caffe_scal<float>(const int N, const float alpha, float *X) {
cblas_sscal(N, alpha, X, 1);
}
template <>
void caffe_scal<double>(const int N, const double alpha, double *X) {
cblas_dscal(N, alpha, X, 1);
}
功能:X = alpha*X
N: X中element的个数
template <>
void caffe_cpu_axpby<float>(const int N, const float alpha, const float* X,
const float beta, float* Y) {
cblas_saxpby(N, alpha, X, 1, beta, Y, 1);
}
template <>
void caffe_cpu_axpby<double>(const int N, const double alpha, const double* X,
const double beta, double* Y) {
cblas_daxpby(N, alpha, X, 1, beta, Y, 1);
}
功能:Y= alpha*X+beta*Y
template <>
void caffe_add<float>(const int n, const float* a, const float* b,
float* y) {
vsAdd(n, a, b, y);
}
template <>
void caffe_add<double>(const int n, const double* a, const double* b,
double* y) {
vdAdd(n, a, b, y);
}
template <>
void caffe_sub<float>(const int n, const float* a, const float* b,
float* y) {
vsSub(n, a, b, y);
}
template <>
void caffe_sub<double>(const int n, const double* a, const double* b,
double* y) {
vdSub(n, a, b, y);
}
template <>
void caffe_mul<float>(const int n, const float* a, const float* b,
float* y) {
vsMul(n, a, b, y);
}
template <>
void caffe_mul<double>(const int n, const double* a, const double* b,
double* y) {
vdMul(n, a, b, y);
}
template <>
void caffe_div<float>(const int n, const float* a, const float* b,
float* y) {
vsDiv(n, a, b, y);
}
template <>
void caffe_div<double>(const int n, const double* a, const double* b,
double* y) {
vdDiv(n, a, b, y);
}
功能:这四个函数分别实现element-wise的加减乘除(y[i] = a[i] + - * \ b[i])
template <>
void caffe_powx<float>(const int n, const float* a, const float b,
float* y) {
vsPowx(n, a, b, y);
}
template <>
void caffe_powx<double>(const int n, const double* a, const double b,
double* y) {
vdPowx(n, a, b, y);
}
template <>
void caffe_sqr<float>(const int n, const float* a, float* y) {
vsSqr(n, a, y);
}
template <>
void caffe_sqr<double>(const int n, const double* a, double* y) {
vdSqr(n, a, y);
}
template <>
void caffe_sqrt<float>(const int n, const float* a, float* y) {
vsSqrt(n, a, y);
}
template <>
void caffe_sqrt<double>(const int n, const double* a, double* y) {
vdSqrt(n, a, y);
}
template <>
void caffe_exp<float>(const int n, const float* a, float* y) {
vsExp(n, a, y);
}
template <>
void caffe_exp<double>(const int n, const double* a, double* y) {
vdExp(n, a, y);
}
template <>
void caffe_log<float>(const int n, const float* a, float* y) {
vsLn(n, a, y);
}
template <>
void caffe_log<double>(const int n, const double* a, double* y) {
vdLn(n, a, y);
}
template <>
void caffe_abs<float>(const int n, const float* a, float* y) {
vsAbs(n, a, y);
功能 : 同样是element-wise操作,分别是y[i] = a[i] ^ b, y[i] = a[i]^2,y[i] = exp(a[i] ),y[i] = |a[i] |
unsigned int caffe_rng_rand() {
return (*caffe_rng())();
}
功能:返回一个随机数
template <typename Dtype>
Dtype caffe_nextafter(const Dtype b) {
return boost::math::nextafter(
b, std::numeric_limits::max());
}
template
float caffe_nextafter(const float b);
template
double caffe_nextafter(const double b);
功能 : 返回 b 最大方向上可以表示的最接近的数值。
template <>
double caffe_cpu_strided_dot<double>(const int n, const double* x,
const int incx, const double* y, const int incy) {
return cblas_ddot(n, x, incx, y, incy);
}
功能: 返回 vector X 和 vector Y 的内积。
incx, incy : 步长,即每隔incx 或 incy 个element 进行操作。
>template <>
int caffe_cpu_hamming_distance<float>(const int n, const float* x,
const float* y) {
int dist = 0;
for (int i = 0; i < n; ++i) {
dist += __builtin_popcount(static_cast(x[i]) ^
static_cast(y[i]));
}
return dist;
}
功能:返回 x 和 y 之间的海明距离。(两个等长字符串之间的海明距离是两个字符串对应位置的不同字符的个数。)
template <>
float caffe_cpu_asum<float>(const int n, const float* x) {
return cblas_sasum(n, x, 1);
}
template <>
double caffe_cpu_asum<double>(const int n, const double* x) {
return cblas_dasum(n, x, 1);
}
功能:计算 vector x 的所有element的绝对值之和。
template <>
void caffe_cpu_scale<float>(const int n, const float alpha, const float *x,
float* y) {
cblas_scopy(n, x, 1, y, 1);
cblas_sscal(n, alpha, y, 1);
}
template <>
void caffe_cpu_scale<double>(const int n, const double alpha, const double *x,
double* y) {
cblas_dcopy(n, x, 1, y, 1);
cblas_dscal(n, alpha, y, 1);
}
功能:y = alpha*x
#ifndef CAFFE_INNER_PRODUCT_LAYER_HPP_
#define CAFFE_INNER_PRODUCT_LAYER_HPP_
#include
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
namespace caffe {
/**
* @brief Also known as a "fully-connected" layer, computes an inner product
* with a set of learned weights, and (optionally) adds biases.
*
* TODO(dox): thorough documentation for Forward, Backward, and proto params.
*/
template <typename Dtype>
class InnerProductLayer : public Layer {
public:
explicit InnerProductLayer(const LayerParameter& param)
: Layer(param) {}
virtual void LayerSetUp(const vector *>& bottom,
const vector *>& top);
virtual void Reshape(const vector *>& bottom,
const vector *>& top);
virtual inline const char* type() const { return "InnerProduct"; }
virtual inline int ExactNumBottomBlobs() const { return 1; }
virtual inline int ExactNumTopBlobs() const { return 1; }
protected:
virtual void Forward_cpu(const vector *>& bottom,
const vector *>& top);
virtual void Forward_gpu(const vector *>& bottom,
const vector *>& top);
virtual void Backward_cpu(const vector *>& top,
const vector<bool>& propagate_down, const vector *>& bottom);
virtual void Backward_gpu(const vector *>& top,
const vector<bool>& propagate_down, const vector *>& bottom);
int M_; //样本数量
int K_; //单个样本特征长度
int N_; //输出神经元个数
bool bias_term_;//是否包含偏执项
Blob bias_multiplier_; //偏置项乘子
bool transpose_; ///< if true, assume transposed weights
};
} // namespace caffe
#endif // CAFFE_INNER_PRODUCT_LAYER_HPP_
#include
#include "caffe/filler.hpp"
#include "caffe/layers/inner_product_layer.hpp"
#include "caffe/util/math_functions.hpp"
namespace caffe {
template <typename Dtype>
void InnerProductLayer::LayerSetUp(const vector *>& bottom,
const vector *>& top) {
const int num_output = this->layer_param_.inner_product_param().num_output();//从prototxt文件得到输出个数
bias_term_ = this->layer_param_.inner_product_param().bias_term(); //从prototxt文件得到偏置项标志
transpose_ = this->layer_param_.inner_product_param().transpose(); //从....得到转置标志
N_ = num_output;
const int axis = bottom[0]->CanonicalAxisIndex(
this->layer_param_.inner_product_param().axis());//axis=1
// Dimensions starting from "axis" are "flattened" into a single
// length K_ vector. For example, if bottom[0]'s shape is (N, C, H, W),
// and axis == 1, N inner products with dimension CHW are performed.
K_ = bottom[0]->count(axis); //if 成立 K_=C*H*W 从1维到3维乘积大小
// Check if we need to set up the weights
if (this->blobs_.size() > 0) {
LOG(INFO) << "Skipping parameter initialization";
} else {
if (bias_term_) { //如果包含偏执项
this->blobs_.resize(2); //blobs_[0] blobs_[1]
} else {
this->blobs_.resize(1); //blobs_[0]
}
//Initialize the weights 初始化权重维度N_ *K_
vector<int> weight_shape(2);
if (transpose_) {
weight_shape[0] = K_;
weight_shape[1] = N_;
} else {
weight_shape[0] = N_;
weight_shape[1] = K_;
}
this->blobs_[0].reset(new Blob(weight_shape));//bolbs_[0]是全连接层的参数
// fill the weights 初始化权重
shared_ptr > weight_filler(GetFiller(
this->layer_param_.inner_product_param().weight_filler()));
weight_filler->Fill(this->blobs_[0].get());
//If necessary, intiialize and fill the bias term
if (bias_term_) {
vector<int> bias_shape(1, N_);//N_为样本数量
this->blobs_[1].reset(new Blob(bias_shape)); //blobs_[1]是全连接层偏置项
shared_ptr > bias_filler(GetFiller(
this->layer_param_.inner_product_param().bias_filler()));
bias_filler->Fill(this->blobs_[1].get());
}
}//parameter initialization
this->param_propagate_down_.resize(this->blobs_.size(), true);//指明该层的每个参数是否反向传播
}
template <typename Dtype>
void InnerProductLayer::Reshape(const vector *>& bottom,
const vector *>& top) {
// Figure out the dimensions
const int axis = bottom[0]->CanonicalAxisIndex(
this->layer_param_.inner_product_param().axis()); //axis=1
//这里解释一下,blob的CanonicalAxisIndex是为了标准化维度索引的输入,将一些非法维度输入转化为合法输
//blob的count(int)是统计从某个维度开始,到结尾的总个数。这里第一个维度表示的是样本个数,
//也即是M_,与全连接层是独立的,其后面的是表示输入特征的个数。
const int new_K = bottom[0]->count(axis); //从1维到3维乘积大小
CHECK_EQ(K_, new_K) //K_单个样本特征长度 C*H*W
<< "Input size incompatible with inner product parameters.";
// The first "axis" dimensions are independent inner products; the total
// number of these is M_, the product over these dimensions.
M_ = bottom[0]->count(0, axis); //M_样本数量[0,1)第0维大小
// The top shape will be the bottom shape with the flattened axes dropped,
// and replaced by a single axis with dimension num_output (N_).
vector<int> top_shape = bottom[0]->shape(); //
top_shape.resize(axis + 1);
top_shape[axis] = N_;//输出单元个数
top[0]->Reshape(top_shape);
// Set up the bias multiplier
if (bias_term_) {
vector<int> bias_shape(1, M_); //M_样本数量
bias_multiplier_.Reshape(bias_shape);//
caffe_set(M_, Dtype(1), bias_multiplier_.mutable_cpu_data());//初始化所有偏执为1
}
}
//实现的功能就是 y=wx+b
// x为输入,维度 M_*K_
// y为输出,维度 M_*N_
// w为权重,维度 K_*N_
// b为偏置,维度 N_*1_
//一批次处理多个样本,在每一批次中权重矩阵与偏置矩阵是不变的
template <typename Dtype>
void InnerProductLayer::Forward_cpu(const vector *>& bottom,
const vector *>& top) {
const Dtype* bottom_data = bottom[0]->cpu_data(); //得到tottom[0]的cpu指针
Dtype* top_data = top[0]->mutable_cpu_data(); //top_data指针
const Dtype* weight = this->blobs_[0]->cpu_data(); //全连接层参数 blobs_[0]
// bottom_data为M*K矩阵,权重为K*N矩阵,top_data为M*N矩阵
// top_data = bottom_data * weight w*x-->y
caffe_cpu_gemm(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
M_, N_, K_, (Dtype)1.,
bottom_data, weight, (Dtype)0., top_data);
//top_data = 1*bottom_data*weight + 0*top_data -->top_data
if (bias_term_) {
caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, 1, (Dtype)1.,
bias_multiplier_.cpu_data(),
this->blobs_[1]->cpu_data(), (Dtype)1., top_data);
}
// top_data = 1*bias_multplier_*blobs_[1]+ 1*top_data -->>top_data y+bias_multplier*b-->y
}
void InnerProductLayer::Backward_cpu(const vector *>& top,
const vector<bool>& propagate_down,
const vector *>& bottom) {
//param_propagate_down_ 是否需要对参数w求导
if (this->param_propagate_down_[0]) {
const Dtype* top_diff = top[0]->cpu_diff(); //维度N_*M_ 每一行代表一个样本的error term
const Dtype* bottom_data = bottom[0]->cpu_data();//data 数据
// Gradient with respect to weight
// 求权重的偏导,weight_diff += top_diff' * bottom_data
if (transpose_) {
caffe_cpu_gemm(CblasTrans, CblasNoTrans,
K_, N_, M_,
(Dtype)1., bottom_data, top_diff,
(Dtype)1., this->blobs_[0]->mutable_cpu_diff());
} else {
caffe_cpu_gemm(CblasTrans, CblasNoTrans,
N_, K_, M_,
(Dtype)1., top_diff, bottom_data,
(Dtype)1., this->blobs_[0]->mutable_cpu_diff());
// 1*top_diff'*bottom_data+1*blobs_[0] -->blobs[0] top_diff'*x+ 1*w_diff-->w_diff
}
}
//求偏置项的偏导,bias_diff += 1*top_diff * bias_multiplier
if (bias_term_ && this->param_propagate_down_[1]) { //是否需要对参数偏置b 求导
const Dtype* top_diff = top[0]->cpu_diff();
// Gradient with respect to bias
caffe_cpu_gemv(CblasTrans, M_, N_, (Dtype)1., top_diff,
bias_multiplier_.cpu_data(), (Dtype)1.,
this->blobs_[1]->mutable_cpu_diff());
//blobs[1]=1*top_diff'*bias_multiplier+1*blobs[1] ==>> bias_diff=top_diff'*bias_multiplier+bias_diff
}
//对输入数据bottom求导
if (propagate_down[0]) {//是否需要对数据bottom求导
const Dtype* top_diff = top[0]->cpu_diff();
// Gradient with respect to bottom data
//求bottom数据的偏导,bottom_data_diff = top_diff * weight
if (transpose_) {
caffe_cpu_gemm(CblasNoTrans, CblasTrans,
M_, K_, N_,
(Dtype)1., top_diff, this->blobs_[0]->cpu_data(),
(Dtype)0., bottom[0]->mutable_cpu_diff()); //x= 1*top_diff*w'+0*x
} else {
caffe_cpu_gemm(CblasNoTrans, CblasNoTrans,
M_, K_, N_,
(Dtype)1., top_diff, this->blobs_[0]->cpu_data(),
(Dtype)0., bottom[0]->mutable_cpu_diff()); //x= 1*top_diff*w++0*x
}
}
}
#ifdef CPU_ONLY
STUB_GPU(InnerProductLayer);
#endif
INSTANTIATE_CLASS(InnerProductLayer);
REGISTER_LAYER_CLASS(InnerProduct);
} // namespace caffe
#ifndef CAFFE_EUCLIDEAN_LOSS_LAYER_HPP_
#define CAFFE_EUCLIDEAN_LOSS_LAYER_HPP_
#include
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/layers/loss_layer.hpp"
namespace caffe {
template <typename Dtype>
class EuclideanLossLayer : public LossLayer {
public:
explicit EuclideanLossLayer(const LayerParameter& param)
: LossLayer(param), diff_() {}
virtual void Reshape(const vector *>& bottom,
const vector *>& top);
virtual inline const char* type() const { return "EuclideanLoss"; }
/**
* Unlike most loss layers, in the EuclideanLossLayer we can backpropagate
* to both inputs -- override to return true and always allow force_backward.
*/
virtual inline bool AllowForceBackward(const int bottom_index) const {
return true;
}
protected:
/// @copydoc EuclideanLossLayer
virtual void Forward_cpu(const vector *>& bottom,
const vector *>& top);
virtual void Forward_gpu(const vector *>& bottom,
const vector *>& top);
virtual void Backward_cpu(const vector *>& top,
const vector<bool>& propagate_down, const vector *>& bottom);
virtual void Backward_gpu(const vector *>& top,
const vector<bool>& propagate_down, const vector *>& bottom);
Blob diff_;
};
} // namespace caffe
#endif // CAFFE_EUCLIDEAN_LOSS_LAYER_HPP_
#include
#include "caffe/layers/euclidean_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"
namespace caffe {
template <typename Dtype>
void EuclideanLossLayer::Reshape(
const vector *>& bottom, const vector *>& top) {
LossLayer::Reshape(bottom, top);
CHECK_EQ(bottom[0]->count(1), bottom[1]->count(1)) //保证bottom[0] bottom[1] 维度相同
<< "Inputs must have the same dimension.";
diff_.ReshapeLike(*bottom[0]);//Blob 类型的diff_用来存放两个bottom的差,和bottom具有相同的shape
}
template <typename Dtype>
void EuclideanLossLayer::Forward_cpu(const vector *>& bottom,
const vector *>& top) {
int count = bottom[0]->count();
caffe_sub(
count,
bottom[0]->cpu_data(),
bottom[1]->cpu_data(),
diff_.mutable_cpu_data()); //diff_ = bottom[0] - bottom[1]
Dtype dot = caffe_cpu_dot(count, diff_.cpu_data(), diff_.cpu_data());// dot = ||diff_||^2
Dtype loss = dot / bottom[0]->num() / Dtype(2);//输出的loss,除以总数再除以2
top[0]->mutable_cpu_data()[0] = loss;
}
template <typename Dtype>
void EuclideanLossLayer::Backward_cpu(const vector *>& top,
const vector<bool>& propagate_down, const vector *>& bottom) {
for (int i = 0; i < 2; ++i) {
if (propagate_down[i]) { //对于输入的label bottom propagate_dowm 为0
const Dtype sign = (i == 0) ? 1 : -1; //由于diff_ = bottom[0] - bottom[1]
//top[0]->cpu_diff()[0] 表示loss weight,默认为1
//bottom[i]->num()样本个数
const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
caffe_cpu_axpby(
bottom[i]->count(), // count
alpha, // alpha
diff_.cpu_data(), // a
Dtype(0), // beta
bottom[i]->mutable_cpu_diff()); // b
}//bottom[i]->mutable_cpu_diff()) = alpha*diff_.cpu_data()
}
}
#ifdef CPU_ONLY
STUB_GPU(EuclideanLossLayer);
#endif
INSTANTIATE_CLASS(EuclideanLossLayer);
REGISTER_LAYER_CLASS(EuclideanLoss);
} // namespace caffe
~
import caffe
import numpy as np
class EuclideanLossLayer(caffe.Layer):
"""
Compute the Euclidean Loss in the same manner as the C++ EuclideanLossLayer
to demonstrate the class interface for developing layers in Python.
"""
def setup(self, bottom, top):
# check input pair
if len(bottom) != 2:
raise Exception("Need two inputs to compute distance.")
def reshape(self, bottom, top):
# check input dimensions match
if bottom[0].count != bottom[1].count:
raise Exception("Inputs must have the same dimension.")
# difference is shape of inputs
self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
# loss output is scalar
top[0].reshape(1)
def forward(self, bottom, top):
self.diff[...] = bottom[0].data - bottom[1].data
top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.
def backward(self, top, propagate_down, bottom):
for i in range(2):
if not propagate_down[i]:
continue
if i == 0:
sign = 1
else:
sign = -1
bottom[i].diff[...] = sign * self.diff / bottom[i].num
.top[0]->cpu_diff()[0]:在反向传播中,top代表从高一层反向传过来的变量,所以top[0]->cpu_diff()表示从高一层传过来的error。但问题来了,这明明是loss层,也就是最后一层,为什么还有所谓的再高一层呢?其实大家可以发现,这里用的是top[0]->cpu_diff()[0],而不是top[0]->cpu_diff()。caffe中反向传给低层error时其实用户还可以给这个error乘以一个倍数,这个倍数就存储在top[0]->cpu_diff()的第一个元素,也就是top[0]->cpu_diff()[0]。而用户设置这个倍数则是通过在layer参数中添加loss_weight参数,如:
Blob是作为Caffe中数据流通的一个基本类,网络各层之间的数据是通过Blob来传递的。
详细介绍参考http://blog.csdn.net/fengbingchun/article/details/59106613
shared_ptr data_;
shared_ptr diff_;
shared_ptr shape_data_;
vector<int> shape_;
int count_;
int capacity_;
BLob只是一个基本的数据结构,因此内部的变量相对较少,首先是data_指针,指针类型是shared_ptr,属于boost库的一个智能指针,这一部分主要用来申请内存存储data,data主要是正向传播的时候用的。同理,diff_主要用来存储偏差,update data,shape_data和shape_都是存储Blob的形状,一个是老版本一个是新版本。count表示Blob中的元素个数,也就是个数通道数高度*宽度,capacity表示当前的元素个数,因为Blob可能会reshape。
template
class Blob {
public:
Blob()
: data_(), diff_(), count_(0), capacity_(0) {}
/// @brief Deprecated; use
Blob(const vector & shape)
.
explicit Blob(const int num, const int channels, const int height,
const int width);
explicit Blob(const vector<int>& shape);
/// @brief Deprecated; use
Reshape(const vector & shape).
void Reshape(const int num, const int channels, const int height,
const int width);
其中Blob作为一个最基础的类,其中构造函数开辟一个内存空间来存储数据,Reshape函数在Layer中的reshape或者forward操作中来adjust dimension。同时在改变Blob大小时,内存将会被重新分配如果内存大小不够了,并且额外的内存将不会被释放。对input的blob进行reshape,如果立马调用Net::Backward是会出错的,因为reshape之后,要么Net::forward或者Net::Reshape就会被调用来将新的input shape 传播到高层
Blob类里面有重载很多个count()函数,主要还是为了统计Blob的容量(volume),或者是某一片(slice),从某个axis到具体某个axis的shape乘积。
inline int count(int start_axis, int end_axis)
并且Blob的Index是可以从负坐标开始读的,这一点跟Python好像
inline int CanonicalAxisIndex(int axis_index)
对于Blob中的4个基本变量num,channel,height,width可以直接通过
shape(0),shape(1),shape(2),shape(3)
来访问。