yolo中增加L1及L2正则化

动量更新:

动量更新的公式为:

\small \small V_{t+1} = moment*V_{t} + \alpha* l.delta

\small W_{t+1} = W_{t} + V_{t+1}

首先参考caffe中动量更新的方式:

case Caffe::CPU: {
    //计算公式:data = local_rate * diff + moment * old_data              
    caffe_cpu_axpby(net_params[param_id]->count(), local_rate,    
    net_params[param_id]->cpu_diff(), momentum, history_[param_id]->mutable_cpu_data()); 
    //复制当前动量到diff中 
    caffe_copy(net_params[param_id]->count(), history_[param_id]->cpu_data(),   
    net_params[param_id]->mutable_cpu_diff()); 
    break; 
}

转化成yolo里对应的数学公式就是: 

\small l.weight_updates = l.weight_updates*moment + learning_rate*l.delta

与动量更新的公式对应,参照此,去看yolo的代码

事实上,yolo代码weights的更新默认就是L2正则化的,因此为了首先说明动量的问题,先以bias的更新代码为例:

axpy_cpu(l.n, learning_rate/batch, l.bias_updates, 1, l.biases, 1);
scal_cpu(l.n, momentum, l.bias_updates, 1);

转化成数学公式是:

\small l.bias_updates = l.bias_updates*moment + l.delta  //之前backward的时候计算出来的

\small l.biases = l.biases + learn_rate*l.bias_updates

\small l.bias_updates = l.bias_updates*moment

现在整合一下,成为最开始的数学形式:

\small l.bias_updates = l.bias_updates*moment*learn_rate + learn_rate*l.delta

\small l.biases = l.biases + l.bias_updates

和caffe所差的地方也就是前面选项中多乘了一个learn_rate

然后再看w的更新方式,这是加了L2正则化的

  axpy_cpu(l.nweights, -decay*batch, l.weights, 1, l.weight_updates, 1);
  axpy_cpu(l.nweights, learning_rate/batch, l.weight_updates, 1, l.weights, 1);
  scal_cpu(l.nweights, momentum, l.weight_updates, 1);

转化成数学形式并整合:

\small l.weight_updates = l.weight_updates*moment + neet.input*l.delta //之前backward的时候计算出来的,计算出某一点w的梯度

\small l.weight_updates = learn_rate*l.weight_updates - learn_rate*decay*l.weight  //加上L2正则化

\small l.weights =l.weights + l.weight_updates

相比caffe版本,多了一个moment处乘learn_rate, 因此每次更新中其实如果learn_rate太小的话,yolo的动量性质没那么好的

YOLO的L1正则化:

有了L2正则化,加L1就很简单了,代码如下:

cuda_pull_array(l.weights_gpu, l.weights, l.size*l.size*l.n*l.c); //比较懒,就不写一个kernel函数了,而且计算量也用不到  
l.sign_weights[i]= l.weights > 0 ? 1:-1; //求取w处的符号
cuda_push_array(l.sign_weights_gpu, l.sign_weights, l.size*l.size*l.n*l.c); //只有push之后才能在gpu中运算
axpy_gpu(l.n, -decay*batch, l.sign_weights, 1, l.weight_updates, 1);  //求此处的L1正则化梯度
axpy_gpu(size, learning_rate/batch, l.weight_updates_gpu, 1, l.weights_gpu, 1);
scal_gpu(size, momentum, l.weight_updates_gpu, 1);

 

你可能感兴趣的:(yolo中增加L1及L2正则化)