关于 LeNet 在测试集上的精度始终为 0.1 这一问题

目录

  • 一、问题背景
  • 二、问题解决
  • 三、进一步探讨

一、问题背景

搭建 LeNet 用来学习 FashionMNIST 数据集,batch size 为 64,学习率为 0.05,训练/测试 10 个 Epoch:

import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from Experiment import Experiment as E


class LeNet(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Flatten(),
            nn.Linear(400, 120), nn.Sigmoid(),
            nn.Linear(120, 84), nn.Sigmoid(),
            nn.Linear(84, 10),
        )
        
    def forward(self, x):
        return self.net(x)


train_data = torchvision.datasets.FashionMNIST('/mnt/mydataset', train=True, transform=ToTensor(), download=True)
test_data = torchvision.datasets.FashionMNIST('/mnt/mydataset', train=False, transform=ToTensor(), download=True)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(test_data, batch_size=64, num_workers=4)

lenet = LeNet()
e = E(train_loader, test_loader, lenet, 10, 0.05)
e.main()

在 NVIDIA GeForce RTX 3090 上的训练/测试结果如下:

Epoch 1
--------------------------------------------------
Train Avg Loss: 2.307199, Train Accuracy: 0.099383
Test  Avg Loss: 2.311069, Test  Accuracy: 0.100000

Epoch 2
--------------------------------------------------
Train Avg Loss: 2.306746, Train Accuracy: 0.101433
Test  Avg Loss: 2.304611, Test  Accuracy: 0.100000

Epoch 3
--------------------------------------------------
Train Avg Loss: 2.306037, Train Accuracy: 0.099633
Test  Avg Loss: 2.304882, Test  Accuracy: 0.100000

Epoch 4
--------------------------------------------------
Train Avg Loss: 2.306285, Train Accuracy: 0.098717
Test  Avg Loss: 2.304786, Test  Accuracy: 0.100000

Epoch 5
--------------------------------------------------
Train Avg Loss: 2.305623, Train Accuracy: 0.100350
Test  Avg Loss: 2.312366, Test  Accuracy: 0.100000

Epoch 6
--------------------------------------------------
Train Avg Loss: 2.305421, Train Accuracy: 0.100033
Test  Avg Loss: 2.304870, Test  Accuracy: 0.100000

Epoch 7
--------------------------------------------------
Train Avg Loss: 2.304797, Train Accuracy: 0.102300
Test  Avg Loss: 2.307974, Test  Accuracy: 0.100000

Epoch 8
--------------------------------------------------
Train Avg Loss: 2.304658, Train Accuracy: 0.102000
Test  Avg Loss: 2.302825, Test  Accuracy: 0.100000

Epoch 9
--------------------------------------------------
Train Avg Loss: 2.303870, Train Accuracy: 0.102867
Test  Avg Loss: 2.304149, Test  Accuracy: 0.100000

Epoch 10
--------------------------------------------------
Train Avg Loss: 2.302295, Train Accuracy: 0.106083
Test  Avg Loss: 2.301472, Test  Accuracy: 0.100000

--------------------------------------------------
29408.7 samples/sec
--------------------------------------------------

Done!

可以看出 LeNet 在测试集上的精度始终为 0.1,猜测是使用了默认的初始化方案导致的,我们可能需要更换初始化方案。

二、问题解决

保持其他参数不变,使用 Xavier 初始化,如下:

def init_net(m):
    if type(m) == nn.Linear or type(m) == nn.Conv2d:
        nn.init.xavier_uniform_(m.weight)


lenet = LeNet()
lenet.apply(init_net)
e = E(train_loader, test_loader, lenet, 10, 0.05)
e.main()

依然在 NVIDIA GeForce RTX 3090 进行训练/测试:

Epoch 1
--------------------------------------------------
Train Avg Loss: 2.307741, Train Accuracy: 0.097667
Test  Avg Loss: 2.302877, Test  Accuracy: 0.100000

Epoch 2
--------------------------------------------------
Train Avg Loss: 2.305074, Train Accuracy: 0.104150
Test  Avg Loss: 2.306036, Test  Accuracy: 0.100000

Epoch 3
--------------------------------------------------
Train Avg Loss: 2.302001, Train Accuracy: 0.107317
Test  Avg Loss: 2.296702, Test  Accuracy: 0.100000

Epoch 4
--------------------------------------------------
Train Avg Loss: 2.274006, Train Accuracy: 0.164833
Test  Avg Loss: 2.182108, Test  Accuracy: 0.285000

Epoch 5
--------------------------------------------------
Train Avg Loss: 1.605953, Train Accuracy: 0.461283
Test  Avg Loss: 1.216395, Test  Accuracy: 0.561700

Epoch 6
--------------------------------------------------
Train Avg Loss: 1.080627, Train Accuracy: 0.593417
Test  Avg Loss: 0.990674, Test  Accuracy: 0.624100

Epoch 7
--------------------------------------------------
Train Avg Loss: 0.922300, Train Accuracy: 0.654517
Test  Avg Loss: 0.889618, Test  Accuracy: 0.677200

Epoch 8
--------------------------------------------------
Train Avg Loss: 0.847294, Train Accuracy: 0.685733
Test  Avg Loss: 0.831707, Test  Accuracy: 0.690800

Epoch 9
--------------------------------------------------
Train Avg Loss: 0.799123, Train Accuracy: 0.702950
Test  Avg Loss: 0.792512, Test  Accuracy: 0.702100

Epoch 10
--------------------------------------------------
Train Avg Loss: 0.752247, Train Accuracy: 0.719433
Test  Avg Loss: 0.752238, Test  Accuracy: 0.719900

--------------------------------------------------
29523.7 samples/sec
--------------------------------------------------

Done!

到了第 10 个 Epoch 精度就已经上升到 0.7 了。

三、进一步探讨

不知道是不是由于使用默认的初始化方案所导致的,事实上如果选择 Adam 优化器,也能够解决这一问题。但对于一些更深的网络,如 NiN,换用优化器起不到什么作用,但选择 Xavier 初始化就可以改善。

学到现在,感觉深度学习的炼丹是一件非常玄学的事情,如果有uu对这个问题有更深入的见解欢迎在评论区留言。

你可能感兴趣的:(Error,Pytorch,pytorch,深度学习,机器学习)