caffe的损失函数,目前已经囊括了所有可以用的了吧,损失函数由激活函数决定,同时有时会加入regularization,在BP过程中,使得误差传递得以良好运行。
一、
contrastive_loss,对应contrastive_loss_layer,我看了看代码,这个应该是输入是一对用来做验证的数据,比如两张人脸图,可能是同一个人的(正样本),也可能是不同个人(负样本)。在caffe的examples中,siamese这个例子中,用的损失函数是该类型的。
该损失函数具体数学表达形式:
二、
euclidean_loss,对应euclidean_loss_layer,该损失函数就是loss=(y-f(wx))^2
hinge_loss,对应hinge_loss_layer,该损失函数就是loss=(0,)
infogain_loss,对应infogain_loss_layer,损失函数表达式:
multinomial_logistic_loss,对应multinomial_logistic_loss_layer,损失函数表达式:
sigmoid_cross_entropy,对应sigmoid_cross_entropy_loss_layer,损失函数表达式:
softmax_loss,对应softmax_loss_layer,损失函数表达式:
三、
输入:
形状: (N×C×1×1) 特征 a∈[−∞,+∞]
形状: (N×C×1×1) 特征 b∈[−∞,+∞]
形状: (N×1×1×1) 相似性 y∈[0,1]
输出:
形状: (1×1×1×1)
对比损失函数为: E=12N∑n=1N(y)d+(1−y)max(margin−d,0)
其中 d=||an−bn||22 .
适合场景:
可以用来训练Siamese网络
#include <algorithm> #include <vector> #include "caffe/layer.hpp" #include "caffe/loss_layers.hpp" #include "caffe/util/io.hpp" #include "caffe/util/math_functions.hpp" // Computes the contrastive loss ,loss = [sum(y)*d + max(margin - d,0)]/(2*N) // where d = |a_i-b_i|. Detailed reference to the official document: // http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ContrastiveLossLayer.html namespace caffe { template <typename Dtype> void ContrastiveLossLayer<Dtype>::LayerSetUp( const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { LossLayer<Dtype>::LayerSetUp(bottom, top); CHECK_EQ(bottom[0]->channels(), bottom[1]->channels()); CHECK_EQ(bottom[0]->height(), 1); CHECK_EQ(bottom[0]->width(), 1); CHECK_EQ(bottom[1]->height(), 1); CHECK_EQ(bottom[1]->width(), 1); CHECK_EQ(bottom[2]->channels(), 1); CHECK_EQ(bottom[2]->height(), 1); CHECK_EQ(bottom[2]->width(), 1); diff_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1); diff_sq_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1); dist_sq_.Reshape(bottom[0]->num(), 1, 1, 1); // vector of ones used to sum along channels summer_vec_.Reshape(bottom[0]->channels(), 1, 1, 1); for (int i = 0; i < bottom[0]->channels(); ++i) summer_vec_.mutable_cpu_data()[i] = Dtype(1); } template <typename Dtype> void ContrastiveLossLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { int count = bottom[0]->count(); // diff_ = a_i-b_i
// d caffe_sub( count, bottom[0]->cpu_data(), // a bottom[1]->cpu_data(), // b diff_.mutable_cpu_data()); // a_i-b_i const int channels = bottom[0]->channels(); // margin for dissimilar pair, default = 1.0 Dtype margin = this->layer_param_.contrastive_loss_param().margin(); // legacy_version = false (the default) uses (margin - d)^2 // legacy_version = true uses (margin - d^2). This is kept to support // reproduce existing models and results bool legacy_version = this->layer_param_.contrastive_loss_param().legacy_version(); Dtype loss(0.0); for (int i = 0; i < bottom[0]->num(); ++i) {
// d 点乘 dist_sq_.mutable_cpu_data()[i] = caffe_cpu_dot(channels, diff_.cpu_data() + (i*channels), diff_.cpu_data() + (i*channels)); if (static_cast<int>(bottom[2]->cpu_data()[i])) { // similar pairs loss += dist_sq_.cpu_data()[i]; } else { // dissimilar pairs // if legacy_version = true ,uses (margin - d^2), else uses (margin - d)^2 if (legacy_version) { loss += std::max(margin - dist_sq_.cpu_data()[i], Dtype(0.0)); } else { Dtype dist = std::max(margin - sqrt(dist_sq_.cpu_data()[i]), 0.0); loss += dist*dist; } } } loss = loss / static_cast<Dtype>(bottom[0]->num()) / Dtype(2); top[0]->mutable_cpu_data()[0] = loss; } template <typename Dtype> void ContrastiveLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) { Dtype margin = this->layer_param_.contrastive_loss_param().margin(); bool legacy_version = this->layer_param_.contrastive_loss_param().legacy_version(); for (int i = 0; i < 2; ++i) { if (propagate_down[i]) { const Dtype sign = (i == 0) ? 1 : -1; const Dtype alpha = sign * top[0]->cpu_diff()[0] / static_cast<Dtype>(bottom[i]->num()); int num = bottom[i]->num(); int channels = bottom[i]->channels(); for (int j = 0; j < num; ++j) { Dtype* bout = bottom[i]->mutable_cpu_diff(); if (static_cast<int>(bottom[2]->cpu_data()[j])) { // similar pairs caffe_cpu_axpby( channels, alpha, diff_.cpu_data() + (j*channels), Dtype(0.0), bout + (j*channels)); } else { // dissimilar pairs Dtype mdist(0.0); Dtype beta(0.0); if (legacy_version) { mdist = margin - dist_sq_.cpu_data()[j]; beta = -alpha; } else { Dtype dist = sqrt(dist_sq_.cpu_data()[j]); mdist = margin - dist; beta = -alpha * mdist / (dist + Dtype(1e-4)); } if (mdist > Dtype(0.0)) { caffe_cpu_axpby( channels, beta, diff_.cpu_data() + (j*channels), Dtype(0.0), bout + (j*channels)); } else { caffe_set(channels, Dtype(0), bout + (j*channels)); } } } } } } #ifdef CPU_ONLY STUB_GPU(ContrastiveLossLayer); #endif INSTANTIATE_CLASS(ContrastiveLossLayer); REGISTER_LAYER_CLASS(ContrastiveLoss); } // namespace caffe
从程序中可以看到, 代码不是按照 上边的那个代价函数写的, 这可能是 caffe 一种优化方法, 这种代价函数可能效果更好。
从代码中可以看出
loss = d + max(margin - d)^2 或 + (margin - d^2)