cs231n 学习过程 问题记录

(1)数据预处理时,将数据集转化为0均值的原因:

若数据均大于0,则梯度方向传播时,会均为正或者负,则导致最后权值均为正或负。

但实际使用中由于BATCHsize的存在,在更新权值时,所有数据的梯度相加,可能会使 权值有多重符号。

This has implications on the dynamics during gradient descent, because if the data coming into a neuron is always positive (e.g. x>0x>0 elementwise in f=wTx+bf=wTx+b)), then the gradient on the weights ww will during backpropagation become either all be positive, or all negative (depending on the gradient of the whole expression ff). This could introduce undesirable zig-zagging dynamics in the gradient updates for the weights. However, notice that once these gradients are added up across a batch of data the final update for the weights can have variable signs, somewhat mitigating this issue. Therefore, this is an inconvenience but it has less severe consequences compared to the saturated activation problem above.

你可能感兴趣的:(caffe学习笔记)