caffe中softmax loss源码阅读

(1) softmax loss

<1> softmax loss的函数形式为:

                                     caffe中softmax loss源码阅读_第1张图片    (1)

zi为softmax的输入,f(zi)为softmax的输出。

<2> softmax loss对其输入zj求导:

                                     caffe中softmax loss源码阅读_第2张图片     (2)

 如果j==k,则zk是变量,否则zj是变量。

和的导数等于导数的和,对和中某个元素求导的话有:

 

(2) softmax_loss_layer.cpp中的Forward_cpu()函数:

 1 template 
 2 void SoftmaxWithLossLayer::Forward_cpu(
 3     const vector*>& bottom, const vector*>& top) {
 4   // The forward pass computes the softmax prob values.
//调用softmax层的forward函数,得到对应的输出,存到prob_中 5 softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_); 6 const Dtype* prob_data = prob_.cpu_data();
//一般loss层有两个输入blob,网络的predict blob(bottom[0])和label blob(bottom[1])
7 const Dtype* label = bottom[1]->cpu_data();
//dim = N*C*H*W / N = C*H*W
8 int dim = prob_.count() / outer_num_;
//count变量是计算loss时的有效样本数
9 int count = 0; 10 Dtype loss = 0; 11 for (int i = 0; i < outer_num_; ++i) { 12 for (int j = 0; j < inner_num_; j++) {
//读取label
13 const int label_value = static_cast<int>(label[i * inner_num_ + j]);
//如果该样本的label等于deploy中softmaxWithLoss中设定的参数ignore_label_,则该样本不参与前向和后向计算
14 if (has_ignore_label_ && label_value == ignore_label_) { 15 continue; 16 }
//判断label_value是否大于等于0
17 DCHECK_GE(label_value, 0);
//判断label_value是否小于prob_.shape(softmax_axis_)=C
18 DCHECK_LT(label_value, prob_.shape(softmax_axis_));
//对于softmax的输出channel,计算label_value索引对应的channel中prob的log.对应公式(1)
19 loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], 20 Dtype(FLT_MIN)));
//有效样本数加一
21 ++count; 22 } 23 }
//最终在训练日志中显示的loss为计算的总loss除以有效样本数
24 top[0]->mutable_cpu_data()[0] = loss / get_normalizer(normalization_, count); 25 if (top.size() == 2) { 26 top[1]->ShareData(prob_); 27 } 28 }

 

 (3) softmax_loss_layer.cpp中的Backward_cpu函数:

 1 template 
 2 void SoftmaxWithLossLayer::Backward_cpu(const vector*>& top,
 3     const vector<bool>& propagate_down, const vector*>& bottom) {
 4   if (propagate_down[1]) {
 5     LOG(FATAL) << this->type()
 6                << " Layer cannot backpropagate to label inputs.";
 7   }
 8   if (propagate_down[0]) {
 9     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
10     const Dtype* prob_data = prob_.cpu_data();
//将softmax的输出prob_复制给bottom[0]的diff(梯度) blob
11 caffe_copy(prob_.count(), prob_data, bottom_diff); 12 const Dtype* label = bottom[1]->cpu_data(); 13 int dim = prob_.count() / outer_num_; 14 int count = 0; 15 for (int i = 0; i < outer_num_; ++i) { 16 for (int j = 0; j < inner_num_; ++j) { 17 const int label_value = static_cast<int>(label[i * inner_num_ + j]); 18 if (has_ignore_label_ && label_value == ignore_label_) { 19 for (int c = 0; c < bottom[0]->shape(softmax_axis_); ++c) { 20 bottom_diff[i * dim + c * inner_num_ + j] = 0; 21 } 22 } else {
//对应公式(2),在反传梯度时,label索引对应的diff减1,其他不变。
23 bottom_diff[i * dim + label_value * inner_num_ + j] -= 1; 24 ++count; 25 } 26 } 27 } 28 // Scale gradient
//top[0]->cpu_diff()[0] = N
//N / count 29 Dtype loss_weight = top[0]->cpu_diff()[0] / 30 get_normalizer(normalization_, count); 31 caffe_scal(prob_.count(), loss_weight, bottom_diff); 32 } 33 }

 

你可能感兴趣的:(caffe中softmax loss源码阅读)