NN学习技巧之参数最优化的四种方法对比(SGD, Momentum, AdaGrad, Adam),基于MNIST数据集

前面几篇博文分析了每一种参数优化方案,现在做一个对比,代码参考斋藤的红鱼书第六章。

实验对mnist数据集的6万张图片训练,使用5层全连接神经网络(4个隐藏层,每个隐藏层有100个神经元),共迭代2000次,下图是损失函数随着训练迭代次数的变化:

可以看到SGD是最慢的,而AdaGrad最快, 且最终的识别精度也更高,这并不是一定的,跟数据也有关
NN学习技巧之参数最优化的四种方法对比(SGD, Momentum, AdaGrad, Adam),基于MNIST数据集_第1张图片
贴出部分迭代过程变化:

===========iteration:1200===========
SGD:0.2986528195291609
Momentum:0.1037981040196782
AdaGrad:0.0668137679448615
Adam:0.05010293181776089
===========iteration:1300===========
SGD:0.17833478097202
Momentum:0.06128433751079029
AdaGrad:0.01779291355463178
Adam:0.036788168826807605
===========iteration:1400===========
SGD:0.30288604165486865
Momentum:0.07708723420976107
AdaGrad:0.036239187352732696
Adam:0.03584596636673899
===========iteration:1500===========
SGD:0.21648932214740826
Momentum:0.11593046640138721
AdaGrad:0.033343153287890816
Adam:0.039999528396092415
===========iteration:1600===========
SGD:0.23519516569365168
Momentum:0.06509188355944322
AdaGrad:0.0377409654184555
Adam:0.05803067028715449
===========iteration:1700===========
SGD:0.28851197390150085
Momentum:0.14561108131745754
AdaGrad:0.07160438141432544
Adam:0.07280250583341145
===========iteration:1800===========
SGD:0.14382629146685216
Momentum:0.03977221072571262
AdaGrad:0.015159891599626725
Adam:0.019623602905335474
===========iteration:1900===========
SGD:0.19067465612724083
Momentum:0.053986168113818435
AdaGrad:0.03665586658910679
Adam:0.038508895473566646

主要代码:(完整代码可去图灵社区找红鱼书,随书下载)

# coding: utf-8
# OptimizerCompare.py

import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from MultiLayerNet import MultiLayerNet
from util import smooth_curve
from optimizer import *


# 0:读入MNIST数据==========
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)

train_size = x_train.shape[0]
batch_size = 128
max_iterations = 2000

# 1:进行实验的设置==========
optimizers = {}
optimizers['SGD'] = SGD()
optimizers['Momentum'] = Momentum()
optimizers['AdaGrad'] = AdaGrad()
optimizers['Adam'] = Adam()
# optimizers['RMSprop'] = RMSprop()

networks = {}
train_loss = {}
for key in optimizers.keys():
    networks[key] = MultiLayerNet(
        input_size=784, hidden_size_list=[100, 100, 100, 100],
        output_size=10)
    train_loss[key] = []

# 2:开始训练==========
for i in range(max_iterations):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]

    for key in optimizers.keys():
        grads = networks[key].gradient(x_batch, t_batch)
        optimizers[key].update(networks[key].params, grads)

        loss = networks[key].loss(x_batch, t_batch)
        train_loss[key].append(loss)

    if i % 100 == 0:
        print("===========" + "iteration:" + str(i) + "===========")
        for key in optimizers.keys():
            loss = networks[key].loss(x_batch, t_batch)
            print(key + ":" + str(loss))

# 3.绘制图形==========
markers = {"SGD": "o", "Momentum": "x", "AdaGrad": "s", "Adam": "D"}
x = np.arange(max_iterations)
for key in optimizers.keys():
    plt.plot(x, smooth_curve(train_loss[key]), marker=markers[key], \
    markevery=100, label=key)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.ylim(0, 1)
plt.legend()
plt.show()

你可能感兴趣的:(机器学习,Python)