1 . Pyotrch & Torch
- default parameters:
eps=1e-05
momentum=0.1
running_avg = momentum * new + (1 - momentum)* old
2. Tensorflow
- defalut parameters:
slim.batchnorm(
decay=0.999,
epsilon=0.001,
tf.layers.batch_normalization(
inputs,
axis=-1,
momentum=0.99,
epsilon=0.001,
decay: Decay for the moving average. Reasonable values for
decay
are close
to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc.
Lowerdecay
value (recommend tryingdecay
=0.9) if model experiences
reasonably good training performance but poor validation and/or test
performance. Try zero_debias_moving_mean=True for improved stability
shadow_variable = decay * old+ (1 - decay) * new
3. Darknet yolov2
momentum = 0.99
eps = .000001
running_average 总是 给old值较大的权重,本轮新产生的值一个较小的权重。
momentum上面, tensorflow 和pytorch的意义恰好相反,但表达效果相同。