l2正则化系数过高导致梯度消失

 

 lamda = 3  # 正则化惩罚系数
 w_grad=(np.dot(self.input.T, grad) + (self.lamda * self.w))/self.batch_size

这里把正则化系数设置为3

如果采用4层relu隐藏层的神经网络,将会直接导致梯度消失

1%| | 2/200 [00:17<28:09, 8.53s/it]loss为0.23035251606429571
准确率为0.09375
梯度均值为-1.452481815897801e-11
2%|▏ | 3/200 [00:26<28:26, 8.66s/it]loss为0.23077135760414888
准确率为0.1015625
梯度均值为1.422842658558051e-14
2%|▏ | 4/200 [00:34<27:49, 8.52s/it]loss为0.23046438461223917
准确率为0.10546875
梯度均值为-8.111952281250118e-18
2%|▎ | 5/200 [00:41<26:23, 8.12s/it]loss为0.2301827048850293
准确率为0.12109375
梯度均值为-6.3688796773963155e-21
3%|▎ | 6/200 [00:49<25:47, 7.98s/it]loss为0.23023365984639205
准确率为0.125
梯度均值为-1.2646968613522145e-23
4%|▎ | 7/200 [00:56<25:00, 7.77s/it]loss为0.23074116618703105
准确率为0.08984375
梯度均值为7.443049613238094e-26
4%|▍ | 8/200 [01:03<24:34, 7.68s/it]loss为0.23025406010680918
准确率为0.11328125
梯度均值为5.544761930793375e-29
4%|▍ | 9/200 [01:11<24:14, 7.62s/it]loss为0.23057808569519062
准确率为0.08984375
梯度均值为-2.505663387779514e-30
5%|▌ | 10/200 [01:19<24:35, 7.76s/it]loss为0.23014966000613057
准确率为0.10546875
梯度均值为-1.588181439704063e-31

梯度均值会越来越低,从e-5一直下降到e-31

而把lamda改为1后,将会缓解这个情况

0%| | 0/200 [00:00

lamda值是提升模型泛化能力的,但是不能设置过高,否则也会导致梯度消失,也不能设置过低,将会导致梯度爆炸

你可能感兴趣的:(神经网络,机器学习)