深度学习 前馈神经网络(2)自动梯度计算和优化问题

这篇接上一篇
深度学习 第3章前馈神经网络 实验五 pytorch实现

目录

  • 4.3 自动梯度计算和预定义算子
    • 4.3.1 利用预定义算子重新实现前馈神经网络
    • 4.3.2 完善Runner类
    • 4.3.3 模型训练
    • 4.3.4 性能评价
  • 4.4 优化问题
    • 4.4.1 参数初始化
    • 4.4.2 梯度消失问题
      • 4.4.2.1 模型构建
      • 4.4.2.2 使用Sigmoid型函数进行训练
    • 4.4.3 死亡 ReLU 问题
      • 4.4.3.1 使用ReLU进行模型训练
      • 4.4.3.2 使用Leaky ReLU进行模型训练
  • 习题
    • 1. 使用pytorch的预定义算子来重新实现二分类任务。
    • 2. 增加一个3个神经元的隐藏层,再次实现二分类,并与1做对比。
    • 3.【思考题】自定义梯度计算和自动梯度计算:从计算性能、计算结果等多方面比较,谈谈自己的看法。
  • 需要导入的包
  • 补充
    • 1. torch.optim.SGD()
    • 2. torch.nn.init
    • 3. 梯度消失和梯度爆炸产生的原因即解决办法
  • 总结
  • 参考

4.3 自动梯度计算和预定义算子

虽然我们能够通过模块化的方式比较好地对神经网络进行组装,但是每个模块的梯度计算过程仍然十分繁琐且容易出错。在深度学习框架中,已经封装了自动梯度计算的功能,我们只需要聚焦模型架构,不再需要耗费精力进行计算梯度。

pytorch提供了nn.Module类,来方便快速的实现自己的层和模型。模型和层都可以基于nn.Module扩充实现,模型只是一种特殊的层。

继承了nn.Module类的算子中,可以在内部直接调用其它继承nn.Module类的算子,pytorch框架会自动识别算子中内嵌的nn.Module类算子,并自动计算它们的梯度,并在优化时更新它们的参数。


torch.nn.Module在pytorch中的相应内容是什么?请简要介绍。

torch.nn.Module是所有神经网络模块的基类,所有的神经网络模型都应该继承这个基类,即pytorch 里面一切自定义操作基本上都是继承nn.Module类来实现的,并进行重载_init_ (初始化)和 forward(前向传播)函数。即在_init_ 进行模型子模块的构建,在forward进行子模块的拼接。每个类都有一个对应的nn.funcational函数,类定义了所需要的arguments和模块的parameters ,在forward函数中将arguments和parameters传给nn.funcational的对应的函数来实现forward功能。

注意事项:

  • 一般把网络中具有可学习参数的层(如全连接层、卷积层等)(模块类的初始化) 放在构造函数__init__()中

  • forward方法是必须要重写的,它是实现模型的功能,实现各个层之间的连接关系的核心

4.3.1 利用预定义算子重新实现前馈神经网络

下面我们使用Pytorch的预定义算子来重新实现二分类任务。
主要使用到的预定义算子为torch.nn.Linear

CLASS torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

torch.nn.Linear算子可以接受一个形状为 [ ∗ , H i n ] [*,H_{in}] [,Hin]输入张量,其中 " ∗ " "*" ""表示张量中可以有任意的其它额外维度,并计算它与形状为 [ o u t _ f e a t u r e s , i n _ f e a t u r e s ] [out\_features, in\_features] [out_features,in_features]权重矩阵的乘积,然后生成形状为 [ ∗ , H o u t ] [*,H_{out}] [,Hout]输出张量 torch.nn.Linear算子默认有偏置参数,可以通过bias=False设置不带偏置。 代码实现如下:

import torch.nn as nn
import torch.nn.functional as F
from torch.nn.init import constant_, normal_, uniform_
 
 
class Model_MLP_L2_V2(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V2, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        normal_(self.fc1.weight, mean=0., std=1.)
        constant_(self.fc1.bias, val=0.0)
        self.fc2 = nn.Linear(hidden_size, output_size)
        normal_(self.fc2.weight, mean=0., std=1.)
        constant_(self.fc2.bias, val=0.0)
        self.act_fn = torch.sigmoid
 
    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2

4.3.2 完善Runner类

基于上一节实现的 RunnerV2_1 类,本节的 RunnerV2_2类在训练过程中使用自动梯度计算;模型保存时,使用state_dict方法获取模型参数;模型加载时,使用set_state_dict方法加载模型参数.

import torch
 
class RunnerV2_2(object):
    def __init__(self, model, optimizer, metric, loss_fn, **kwargs):
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric
 
        # 记录训练过程中的评估指标变化情况
        self.train_scores = []
        self.dev_scores = []
 
        # 记录训练过程中的评价指标变化情况
        self.train_loss = []
        self.dev_loss = []
 
    def train(self, train_set, dev_set, **kwargs):
        # 将模型切换为训练模式
        self.model.train()
 
        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_epochs = kwargs.get("log_epochs", 100)
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")
 
        # log打印函数,如果没有传入则默认为"None"
        custom_print_log = kwargs.get("custom_print_log", None)
 
        # 记录全局最优指标
        best_score = 0
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            X, y = train_set
            # 获取模型预测
            logits = self.model(X)
            # 计算交叉熵损失
            trn_loss = self.loss_fn(logits, y)
            self.train_loss.append(trn_loss.item())
            # 计算评估指标
            trn_score = self.metric(logits, y).item()
            self.train_scores.append(trn_score)
 
            # 自动计算参数梯度
            trn_loss.backward()
            if custom_print_log is not None:
                # 打印每一层的梯度
                custom_print_log(self)
 
            # 参数更新
            self.optimizer.step()
            # 清空梯度
            self.optimizer.zero_grad()
 
            dev_score, dev_loss = self.evaluate(dev_set)
            # 如果当前指标为最优指标,保存该模型
            if dev_score > best_score:
                self.save_model(save_path)
                print(f"[Evaluate] best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")
                best_score = dev_score
 
            if log_epochs and epoch % log_epochs == 0:
                print(f"[Train] epoch: {epoch}/{num_epochs}, loss: {trn_loss.item()}")
 
    # 模型评估阶段,使用'torch.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def evaluate(self, data_set):
        # 将模型切换为评估模式
        self.model.eval()
 
        X, y = data_set
        # 计算模型输出
        logits = self.model(X)
        # 计算损失函数
        loss = self.loss_fn(logits, y).item()
        self.dev_loss.append(loss)
        # 计算评估指标
        score = self.metric(logits, y).item()
        self.dev_scores.append(score)
        return score, loss
 
    def predict(self, X):
        # 将模型切换为评估模式
        self.model.eval()
        return self.model(X)
 
    # 使用'model.state_dict()'获取模型参数,并进行保存
    def save_model(self, saved_path):
        torch.save(self.model.state_dict(), saved_path)
 
    # 使用'model.set_state_dict'加载模型参数
    def load_model(self, model_path):
        state_dict = torch.load(model_path)
        self.model.set_state_dict(state_dict)

4.3.3 模型训练

实例化RunnerV2类,并传入训练配置,代码实现如下:

from metric import accuracy
 
# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V2(input_size=input_size, hidden_size=hidden_size, output_size=output_size)
 
# 设置损失函数
loss_fn = F.binary_cross_entropy
 
# 设置优化器
learning_rate = 0.2
optimizer = torch.optim.SGD(model.parameters(), learning_rate)
 
# 设置评价指标
metric = accuracy
 
# 其他参数
epoch_num = 1000
saved_path = 'best_model.pdparams'
 
# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=epoch_num, log_epochs=50, save_path="best_model.pdparams")

运行结果:

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000
[Train] epoch: 0/1000, loss: 0.8091480135917664
[Evaluate] best accuracy performence has been updated: 0.50000 --> 0.51250
[Evaluate] best accuracy performence has been updated: 0.51250 --> 0.52500
[Evaluate] best accuracy performence has been updated: 0.52500 --> 0.54375
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.60000
[Evaluate] best accuracy performence has been updated: 0.60000 --> 0.62500
[Evaluate] best accuracy performence has been updated: 0.62500 --> 0.65625
[Evaluate] best accuracy performence has been updated: 0.65625 --> 0.69375
[Evaluate] best accuracy performence has been updated: 0.69375 --> 0.70625
[Evaluate] best accuracy performence has been updated: 0.70625 --> 0.71250
[Evaluate] best accuracy performence has been updated: 0.71250 --> 0.72500
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73750
[Train] epoch: 50/1000, loss: 0.5222678184509277
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.74375
[Train] epoch: 100/1000, loss: 0.48690247535705566
[Evaluate] best accuracy performence has been updated: 0.74375 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Train] epoch: 150/1000, loss: 0.4598698616027832
[Train] epoch: 200/1000, loss: 0.4409264028072357
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76250
[Evaluate] best accuracy performence has been updated: 0.76250 --> 0.76875
[Train] epoch: 250/1000, loss: 0.4285047650337219
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.77500
[Train] epoch: 300/1000, loss: 0.4206601083278656
[Train] epoch: 350/1000, loss: 0.4157894551753998
[Train] epoch: 400/1000, loss: 0.41277965903282166
[Train] epoch: 450/1000, loss: 0.4109122157096863
[Train] epoch: 500/1000, loss: 0.4097384512424469
[Train] epoch: 550/1000, loss: 0.4089827537536621
[Train] epoch: 600/1000, loss: 0.4084780216217041
[Train] epoch: 650/1000, loss: 0.40812379121780396
[Train] epoch: 700/1000, loss: 0.4078601896762848
[Train] epoch: 750/1000, loss: 0.4076516032218933
[Train] epoch: 800/1000, loss: 0.40747684240341187
[Train] epoch: 850/1000, loss: 0.40732353925704956
[Train] epoch: 900/1000, loss: 0.40718430280685425
[Train] epoch: 950/1000, loss: 0.4070546627044678

将训练过程中训练集与验证集的准确率变化情况进行可视化。

import matplotlib.pyplot as plt
# 可视化观察训练集与验证集的指标变化情况
def plot(runner, fig_name):
    plt.figure(figsize=(10,5))
    epochs = [i for i in range(len(runner.train_scores))]

    plt.subplot(1,2,1)
    plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')

    plt.subplot(1,2,2)
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')
    
    plt.savefig(fig_name)
    plt.show()

plot(runner, 'fw-acc.pdf')

运行结果:
深度学习 前馈神经网络(2)自动梯度计算和优化问题_第1张图片

4.3.4 性能评价

使用测试数据对训练完成后的最优模型进行评价,观察模型在测试集上的准确率以及loss情况。代码如下:

# 模型评价
torch.load("best_model.pdparams")
score, loss = runner.evaluate([X_test, y_test])
print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))

运行结果:

[Test] score/loss: 0.7950/0.4897

在上述问题的基础上:增加一个3个神经元的隐藏层,再次实现二分类,并与其做对比。

import torch.nn as nn
import torch.nn.functional as F
from torch.nn.init import constant_, normal_, uniform_
import torch
from metric import accuracy
import matplotlib.pyplot as plt
from dataset import make_moons
 
# 采样1000个样本
n_samples = 1000
X, y = make_moons(n_samples=n_samples, shuffle=True, noise=0.5)
num_train = 640
num_dev = 160
num_test = 200
X_train, y_train = X[:num_train], y[:num_train]
X_dev, y_dev = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]
X_test, y_test = X[num_train + num_dev:], y[num_train + num_dev:]
y_train = y_train.reshape([-1, 1])
y_dev = y_dev.reshape([-1, 1])
y_test = y_test.reshape([-1, 1])
 
 
class Model_MLP_L2_V4(nn.Module):
    def __init__(self, input_size, hidden_size, hidden_size_3, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        normal_(self.fc1.weight, mean=0., std=1.)
        constant_(self.fc1.bias, val=0.0)
        self.fc2 = nn.Linear(hidden_size, hidden_size_3)
        normal_(self.fc2.weight, mean=0., std=1.)
        constant_(self.fc2.bias, val=0.0)
        self.fc3 = nn.Linear(hidden_size_3, output_size)
        normal_(self.fc3.weight, mean=0., std=1.)
        constant_(self.fc3.bias, val=0.0)
        self.act_fn = torch.sigmoid
 
    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs.to(torch.float32))
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        z3 = self.fc3(a2)
        a3 = self.act_fn(z3)
        return a3
 
 
class RunnerV2_2(object):
    def __init__(self, model, optimizer, metric, loss_fn, **kwargs):
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric
 
        # 记录训练过程中的评估指标变化情况
        self.train_scores = []
        self.dev_scores = []
 
        # 记录训练过程中的评价指标变化情况
        self.train_loss = []
        self.dev_loss = []
 
    def train(self, train_set, dev_set, **kwargs):
        # 将模型切换为训练模式
        self.model.train()
 
        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_epochs = kwargs.get("log_epochs", 100)
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")
 
        # log打印函数,如果没有传入则默认为"None"
        custom_print_log = kwargs.get("custom_print_log", None)
 
        # 记录全局最优指标
        best_score = 0
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            X, y = train_set
            # 获取模型预测
            logits = self.model(X)
            # 计算交叉熵损失
            trn_loss = self.loss_fn(logits, y)
            self.train_loss.append(trn_loss.item())
            # 计算评估指标
            trn_score = self.metric(logits, y).item()
            self.train_scores.append(trn_score)
 
            # 自动计算参数梯度
            trn_loss.backward()
            if custom_print_log is not None:
                # 打印每一层的梯度
                custom_print_log(self)
 
            # 参数更新
            self.optimizer.step()
            # 清空梯度
            self.optimizer.zero_grad()
 
            dev_score, dev_loss = self.evaluate(dev_set)
            # 如果当前指标为最优指标,保存该模型
            if dev_score > best_score:
                self.save_model(save_path)
                print(f"[Evaluate] best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")
                best_score = dev_score
 
            if log_epochs and epoch % log_epochs == 0:
                print(f"[Train] epoch: {epoch}/{num_epochs}, loss: {trn_loss.item()}")
 
    # 模型评估阶段,使用'torch.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def evaluate(self, data_set):
        # 将模型切换为评估模式
        self.model.eval()
 
        X, y = data_set
        # 计算模型输出
        logits = self.model(X)
        # 计算损失函数
        loss = self.loss_fn(logits, y).item()
        self.dev_loss.append(loss)
        # 计算评估指标
        score = self.metric(logits, y).item()
        self.dev_scores.append(score)
        return score, loss
 
    def predict(self, X):
        # 将模型切换为评估模式
        self.model.eval()
        return self.model(X)
 
    # 使用'model.state_dict()'获取模型参数,并进行保存
    def save_model(self, saved_path):
        torch.save(self.model.state_dict(), saved_path)
 
    # 使用'model.set_state_dict'加载模型参数
    def load_model(self, model_path):
        state_dict = torch.load(model_path)
        self.model.set_state_dict(state_dict)
 
 
# 设置模型
input_size = 2
hidden_size = 5
hidden_size_3 = 3
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size, hidden_size_3=3, output_size=output_size)
 
# 设置损失函数
loss_fn = F.binary_cross_entropy
 
# 设置优化器
learning_rate = 0.2
optimizer = torch.optim.SGD(model.parameters(), learning_rate)
 
# 设置评价指标
metric = accuracy
 
# 其他参数
epoch_num = 1000
saved_path = 'best_model.pdparams'
 
# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=epoch_num, log_epochs=50, save_path="best_model.pdparams")
 
 
# 可视化观察训练集与验证集的指标变化情况
def plot(runner, fig_name):
    plt.figure(figsize=(10, 5))
    epochs = [i for i in range(len(runner.train_scores))]
 
    plt.subplot(1, 2, 1)
    plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')
 
    plt.subplot(1, 2, 2)
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')
 
    plt.savefig(fig_name)
    plt.show()
 
 
plot(runner, 'fw-acc.pdf')
 
# 模型评价
torch.load("best_model.pdparams")
score, loss = runner.evaluate([X_test, y_test])
print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))

运行结果:

outer_circ_x.shape: torch.Size([500]) outer_circ_y.shape: torch.Size([500])
outer_circ_x.shape: torch.Size([500]) inner_circ_y.shape: torch.Size([500])
after concat shape: torch.Size([1000])
X shape: torch.Size([1000, 2])
y shape: torch.Size([1000])
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.45000
[Train] epoch: 0/1000, loss: 0.6945466995239258
[Evaluate] best accuracy performence has been updated: 0.45000 --> 0.48750
[Evaluate] best accuracy performence has been updated: 0.48750 --> 0.54375
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.63750
[Evaluate] best accuracy performence has been updated: 0.63750 --> 0.66250
[Evaluate] best accuracy performence has been updated: 0.66250 --> 0.69375
[Evaluate] best accuracy performence has been updated: 0.69375 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.78125
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.81250
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Train] epoch: 50/1000, loss: 0.6421956419944763
[Train] epoch: 100/1000, loss: 0.5900207757949829
[Train] epoch: 150/1000, loss: 0.5342556238174438
[Train] epoch: 200/1000, loss: 0.4871096611022949
[Train] epoch: 250/1000, loss: 0.45485734939575195
[Train] epoch: 300/1000, loss: 0.43542352318763733
[Train] epoch: 350/1000, loss: 0.4243297576904297
[Train] epoch: 400/1000, loss: 0.4180295467376709
[Train] epoch: 450/1000, loss: 0.41435056924819946
[Train] epoch: 500/1000, loss: 0.41209086775779724
[Train] epoch: 550/1000, loss: 0.4106135964393616
[Train] epoch: 600/1000, loss: 0.40958476066589355
[Train] epoch: 650/1000, loss: 0.40882715582847595
[Train] epoch: 700/1000, loss: 0.4082435965538025
[Train] epoch: 750/1000, loss: 0.40777772665023804
[Train] epoch: 800/1000, loss: 0.4073949456214905
[Train] epoch: 850/1000, loss: 0.4070724546909332
[Train] epoch: 900/1000, loss: 0.4067947268486023
[Train] epoch: 950/1000, loss: 0.4065505862236023
[Test] score/loss: 0.8150/0.4290

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第2张图片

通过对比,我们发现增加隐藏层,可以提高准确率

4.4 优化问题

在本节中,我们通过实践来发现神经网络模型的优化问题,并思考如何改进。

4.4.1 参数初始化

实现一个神经网络前,需要先初始化模型参数。如果对每一层的权重和偏置都用0初始化,那么通过第一遍前向计算,所有隐藏层神经元的激活值都相同;在反向传播时,所有权重的更新也都相同,这样会导致隐藏层神经元没有差异性,出现对称权重现象

接下来,将模型参数全都初始化为0,看实验结果。这里重新定义了一个类,两个线性层的参数全都初始化为0。

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.init import constant_, normal_, uniform_
 
 
class Model_MLP_L2_V4(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        constant_(self.fc1.weight, val=0.0)
        constant_(self.fc1.bias, val=0.0)
        self.fc2 = nn.Linear(hidden_size, output_size)
        constant_(self.fc2.weight, val=0.0)
        constant_(self.fc2.bias, val=0.0)
        self.act_fn = torch.sigmoid
 
    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2
def print_weights(runner):
    print('The weights of the Layers:')
    for item in runner.model.named_parameters():
        print(item)

利用Runner类训练模型:

# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size, output_size=output_size)
 
# 设置损失函数
loss_fn = F.binary_cross_entropy
 
# 设置优化器
learning_rate = 0.2  # 5e-2
optimizer = torch.optim.SGD(model.parameters(), learning_rate)
 
# 设置评价指标
metric = accuracy
 
# 其他参数
epoch = 2000
saved_path = 'best_model.pdparams'
 
# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=5, log_epochs=50, save_path="best_model.pdparams", custom_print_log=print_weights)

运行结果:

The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[0., 0., 0., 0., 0.]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([0.], requires_grad=True))
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000
[Train] epoch: 0/5, loss: 0.6931472420692444
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[0., 0., 0., 0., 0.]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([2.7940e-10], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[0., 0., 0., 0., 0.]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([5.5879e-10], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[0., 0., 0., 0., 0.]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([8.3819e-10], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[0., 0., 0., 0., 0.]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([1.1176e-09], requires_grad=True))

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第3张图片
这里我把训练次数提升到5000,
运行结果:

# 这里只输出最后五次的结果
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.2695, -0.2695, -0.2695, -0.2695, -0.2695], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-1.6042, -1.6042, -1.6042, -1.6042, -1.6042]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([3.4718], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522],
        [-0.7841,  1.3522]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.2695, -0.2695, -0.2695, -0.2695, -0.2695], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-1.6043, -1.6043, -1.6043, -1.6043, -1.6043]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([3.4719], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.2695, -0.2695, -0.2695, -0.2695, -0.2695], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-1.6043, -1.6043, -1.6043, -1.6043, -1.6043]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([3.4721], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521],
        [-0.7840,  1.3521]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.2695, -0.2695, -0.2695, -0.2695, -0.2695], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-1.6044, -1.6044, -1.6044, -1.6044, -1.6044]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([3.4722], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.7840,  1.3520],
        [-0.7840,  1.3520],
        [-0.7840,  1.3520],
        [-0.7840,  1.3520],
        [-0.7840,  1.3520]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.2695, -0.2695, -0.2695, -0.2695, -0.2695], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-1.6044, -1.6044, -1.6044, -1.6044, -1.6044]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([3.4723], requires_grad=True))

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第4张图片

从输出结果看,前2000次训练,w的值一直是0,也可能改变了但是性能没有改变,这里我们分析一下:
这里的w一直在局部极小点徘徊,当到了2000左右,跳到了全局极小点,然后性能进行了提高!!!

我们可以提高学习率来加快 w w w 的跳出

这里我把学习率改成5,训练轮数改为1000:
运行结果:

The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.3856, -0.3856, -0.3856, -0.3856, -0.3856], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-2.0969, -2.0969, -2.0969, -2.0969, -2.0969]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([4.0437], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.4510, -0.4510, -0.4510, -0.4510, -0.4510], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-2.0390, -2.0390, -2.0390, -2.0390, -2.0390]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([4.1869], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.3856, -0.3856, -0.3856, -0.3856, -0.3856], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-2.0970, -2.0970, -2.0970, -2.0970, -2.0970]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([4.0437], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128],
        [-0.6149,  1.0128]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.4510, -0.4510, -0.4510, -0.4510, -0.4510], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-2.0391, -2.0391, -2.0391, -2.0391, -2.0391]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([4.1868], requires_grad=True))
The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367],
        [-0.5812,  1.0367]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([-0.3857, -0.3857, -0.3857, -0.3857, -0.3857], requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-2.0970, -2.0970, -2.0970, -2.0970, -2.0970]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([4.0437], requires_grad=True))

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第5张图片
这里很快就跳出来了

从输出结果看,二分类准确率为50%左右,说明模型没有学到任何内容。训练和验证loss几乎没有怎么下降。

为了避免对称权重现象,可以使用高斯分布或均匀分布初始化神经网络的参数。

gausian_weights = torch.normal(mean=0.0, std=1.0, size=[10000])
uniform_weights = torch.Tensor(10000)
uniform_weights.uniform_(-1, 1)
# 绘制两种参数分布
plt.figure()
plt.subplot(1, 2, 1)
plt.title('Gausian Distribution')
plt.hist(gausian_weights, bins=200, density=True, color='#f19ec2')
plt.subplot(1, 2, 2)
plt.title('Uniform Distribution')
plt.hist(uniform_weights, bins=200, density=True, color='#e4007f')
plt.savefig('fw-gausian-uniform.pdf')
plt.show()

运行结果:
深度学习 前馈神经网络(2)自动梯度计算和优化问题_第6张图片

4.4.2 梯度消失问题

在神经网络的构建过程中,随着网络层数的增加,理论上网络的拟合能力也应该是越来越好的。但是随着网络变深,参数学习更加困难,容易出现梯度消失问题。

由于Sigmoid型函数的饱和性,饱和区的导数更接近于0,误差经过每一层传递都会不断衰减。当网络层数很深时,梯度就会不停衰减,甚至消失,使得整个网络很难训练,这就是所谓的梯度消失问题。
在深度神经网络中,减轻梯度消失问题的方法有很多种,一种简单有效的方式就是使用导数比较大的激活函数,如:ReLU。

下面通过一个简单的实验观察前馈神经网络的梯度消失现象和改进方法。

4.4.2.1 模型构建

定义一个前馈神经网络,包含4个隐藏层和1个输出层,通过传入的参数指定激活函数。代码实现如下:

a = torch.Tensor(10000)
 
 
class Model_MLP_L5(nn.Module):
    def __init__(self, input_size, output_size, act='sigmoid', w_init=normal_(a, mean=0.0, std=0.01), b_init=constant_(a, val=1.0)):
        super(Model_MLP_L5, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 3)
        self.fc2 = torch.nn.Linear(3, 3)
        self.fc3 = torch.nn.Linear(3, 3)
        self.fc4 = torch.nn.Linear(3, 3)
        self.fc5 = torch.nn.Linear(3, output_size)
        # 定义网络使用的激活函数
        if act == 'sigmoid':
            self.act = torch.sigmoid
        elif act == 'relu':
            self.act = F.relu
        elif act == 'lrelu':
            self.act = F.leaky_relu
        else:
            raise ValueError("Please enter sigmoid relu or lrelu!")
        # 初始化线性层权重和偏置参数
        self.init_weights(w_init, b_init)
 
    # 初始化线性层权重和偏置参数
    def init_weights(self, w_init, b_init):
        # 使用'named_sublayers'遍历所有网络层
        for n, m in self.named_parameters():
            # 如果是线性层,则使用指定方式进行参数初始化
            if isinstance(m, nn.Linear):
                w_init(m.weight)
                b_init(m.bias)
 
    def forward(self, inputs):
        outputs = self.fc1(inputs)
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        outputs = self.fc3(outputs)
        outputs = self.act(outputs)
        outputs = self.fc4(outputs)
        outputs = self.act(outputs)
        outputs = self.fc5(outputs)
        outputs = torch.sigmoid(outputs)
        return outputs

4.4.2.2 使用Sigmoid型函数进行训练

使用Sigmoid型函数作为激活函数,为了便于观察梯度消失现象,只进行一轮网络优化。代码实现如下:

定义梯度打印函数

def print_grads(runner):
    # 打印每一层的权重的模
    print('The gradient of the Layers:')
    for name, item in runner.model.named_parameters():
        if len(item.size()) == 2:
             print(name, torch.norm(input=item, p=2))
# 学习率大小
lr = 0.01
# 定义网络,激活函数使用sigmoid
model = Model_MLP_L5(input_size=2, output_size=1, act='sigmoid')
# 定义优化器
optimizer = torch.optim.SGD(lr=lr, params=model.parameters())
# 定义损失函数,使用交叉熵损失函数
loss_fn = F.binary_cross_entropy
# 定义评价指标
metric = accuracy
# 指定梯度打印函数
custom_print_log=print_grads
# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=1, log_epochs=None, save_path="best_model.pdparams", custom_print_log=custom_print_log)

实例化RunnerV2_2类,并传入训练配置。代码实现如下:

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

模型训练,打印网络每层梯度值的 ℓ 2 ℓ2 2范数。代码实现如下:

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=1, log_epochs=None, save_path="best_model.pdparams", custom_print_log=custom_print_log)

运行结果:

The gradient of the Layers:
fc1.weight tensor(1.0447, grad_fn=)
fc2.weight tensor(1.2803, grad_fn=)
fc3.weight tensor(0.8694, grad_fn=)
fc4.weight tensor(1.0071, grad_fn=)
fc5.weight tensor(0.5389, grad_fn=)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000

下图展示了使用不同激活函数时,网络每层梯度值的ℓ2范数情况。从结果可以看到,5层的全连接前馈神经网络使用Sigmoid型函数作为激活函数时,梯度经过每一个神经层的传递都会不断衰减,最终传递到第一个神经层时,梯度几乎完全消失。改为ReLU激活函数后,梯度消失现象得到了缓解,每一层的参数都具有梯度值。
深度学习 前馈神经网络(2)自动梯度计算和优化问题_第7张图片

4.4.3 死亡 ReLU 问题

ReLU激活函数可以一定程度上改善梯度消失问题,但是ReLU函数在某些情况下容易出现死亡 ReLU问题,使得网络难以训练。这是由于当x<0时,ReLU函数的输出恒为0。在训练过程中,如果参数在一次不恰当的更新后,某个ReLU神经元在所有训练数据上都不能被激活(即输出为0),那么这个神经元自身参数的梯度永远都会是0,在以后的训练过程中永远都不能被激活。而一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU的变种。

4.4.3.1 使用ReLU进行模型训练

使用第4.4.2节中定义的多层全连接前馈网络进行实验,使用ReLU作为激活函数,观察死亡ReLU现象和优化方法。当神经层的偏置被初始化为一个相对于权重较大的负值时,可以想像,输入经过神经层的处理,最终的输出会为负值,从而导致死亡ReLU现象。

# 定义网络,并使用较大的负值来初始化偏置
model = Model_MLP_L5(input_size=2, output_size=1, act='relu', b_init=constant_(a, val=-8.0))

实例化RunnerV2类,启动模型训练,打印网络每层梯度值的 ℓ 2 ℓ2 2范数。代码实现如下:

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=1, log_epochs=0, save_path="best_model.pdparams", custom_print_log=custom_print_log)

运行结果:

The gradient of the Layers:
fc1.weight tensor(0.8176, grad_fn=)
fc2.weight tensor(0.9802, grad_fn=)
fc3.weight tensor(0.9874, grad_fn=)
fc4.weight tensor(1.0451, grad_fn=)
fc5.weight tensor(0.4850, grad_fn=)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000

从输出结果可以发现,使用 ReLU 作为激活函数,当满足条件时,会发生死亡ReLU问题,网络训练过程中 ReLU 神经元的梯度始终为0,参数无法更新。

针对死亡ReLU问题,一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU 的变种。接下来,观察将激活函数更换为 Leaky ReLU时的梯度情况。

4.4.3.2 使用Leaky ReLU进行模型训练

将激活函数更换为Leaky ReLU进行模型训练,观察梯度情况。代码实现如下:

# 重新定义网络,使用Leaky ReLU激活函数
model = Model_MLP_L5(input_size=2, output_size=1, act='lrelu', b_init=constant_(a, val=-8.0))
 
# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)
 
# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=1, log_epochps=None, save_path="best_model.pdparams", custom_print_log=custom_print_log)

运行结果:

The gradient of the Layers:
fc1.weight tensor(0.7638, grad_fn=)
fc2.weight tensor(1.1522, grad_fn=)
fc3.weight tensor(1.0465, grad_fn=)
fc4.weight tensor(1.0655, grad_fn=)
fc5.weight tensor(0.5679, grad_fn=)
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.50000
[Train] epoch: 0/1, loss: 0.7000465989112854

从输出结果可以看到,将激活函数更换为Leaky ReLU后,死亡ReLU问题得到了改善,梯度恢复正常,参数也可以正常更新。但是由于 Leaky ReLU 中,x<0时的斜率默认只有0.01,所以反向传播时,随着网络层数的加深,梯度值越来越小。如果想要改善这一现象,将 Leaky ReLU 中,x<0时的斜率调大即可。

习题

1. 使用pytorch的预定义算子来重新实现二分类任务。

代码:

import torch.nn as nn
import torch.nn.functional as F
import os
import torch
from abc import abstractmethod
import math
import numpy as np


n_samples = 1000
X, y = make_moons(n_samples=n_samples, shuffle=True, noise=0.15)

num_train = 640
num_dev = 160
num_test = 200

X_train, y_train = X[:num_train], y[:num_train]
X_dev, y_dev = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]
X_test, y_test = X[num_train + num_dev:], y[num_train + num_dev:]

y_train = y_train.reshape([-1,1])
y_dev = y_dev.reshape([-1,1])
y_test = y_test.reshape([-1,1])

class Model_MLP_L2_V4(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        w=torch.normal(0,0.1,size=(hidden_size,input_size),requires_grad=True)
        self.fc1.weight = nn.Parameter(w)

        self.fc2 = nn.Linear(hidden_size, output_size)
        w = torch.normal(0, 0.1, size=(output_size, hidden_size), requires_grad=True)
        self.fc2.weight = nn.Parameter(w)

        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = torch.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs.to(torch.float32))
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2


# def print_weights(runner):
#     print('The weights of the Layers:')
#
#     for item in runner.model.sublayers():
#         print(item.full_name()
#         for param in item.parameters():
#             print(param.numpy())


class RunnerV2_2(object):
    def __init__(self, model, optimizer, metric, loss_fn, **kwargs):
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric

        # 记录训练过程中的评估指标变化情况
        self.train_scores = []
        self.dev_scores = []

        # 记录训练过程中的评价指标变化情况
        self.train_loss = []
        self.dev_loss = []

    def train(self, train_set, dev_set, **kwargs):
        # 将模型切换为训练模式
        self.model.train()

        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_epochs = kwargs.get("log_epochs", 100)
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")

        # log打印函数,如果没有传入则默认为"None"
        custom_print_log = kwargs.get("custom_print_log", None)

        # 记录全局最优指标
        best_score = 0
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            X, y = train_set

            # 获取模型预测
            logits = self.model(X.to(torch.float32))
            # 计算交叉熵损失
            trn_loss = self.loss_fn(logits, y)
            self.train_loss.append(trn_loss.item())
            # 计算评估指标
            trn_score = self.metric(logits, y).item()
            self.train_scores.append(trn_score)

            # 自动计算参数梯度
            trn_loss.backward()
            if custom_print_log is not None:
                # 打印每一层的梯度
                custom_print_log(self)

            # 参数更新
            self.optimizer.step()
            # 清空梯度
            self.optimizer.zero_grad()   # reset gradient

            dev_score, dev_loss = self.evaluate(dev_set)
            # 如果当前指标为最优指标,保存该模型
            if dev_score > best_score:
                self.save_model(save_path)
                print(f"[Evaluate] best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")
                best_score = dev_score

            if log_epochs and epoch % log_epochs == 0:
                print(f"[Train] epoch: {epoch}/{num_epochs}, loss: {trn_loss.item()}")
    @torch.no_grad()
    def evaluate(self, data_set):
        # 将模型切换为评估模式
        self.model.eval()

        X, y = data_set
        # 计算模型输出
        logits = self.model(X)
        # 计算损失函数
        loss = self.loss_fn(logits, y).item()
        self.dev_loss.append(loss)
        # 计算评估指标
        score = self.metric(logits, y).item()
        self.dev_scores.append(score)
        return score, loss

    # 模型测试阶段,使用'torch.no_grad()'控制不计算和存储梯度
    @torch.no_grad()
    def predict(self, X):
        # 将模型切换为评估模式
        self.model.eval()
        return self.model(X)

    # 使用'model.state_dict()'获取模型参数,并进行保存
    def save_model(self, saved_path):
        torch.save(self.model.state_dict(), saved_path)

    # 使用'model.set_state_dict'加载模型参数
    def load_model(self, model_path):
        state_dict = torch.load(model_path)
        self.model.load_state_dict(state_dict)


# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy

# 设置优化器
learning_rate = 0.2 #5e-2
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

# 设置评价指标
metric = accuracy

# 其他参数
epoch = 2000
saved_path = 'best_model.pdparams'

# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev], num_epochs = epoch, log_epochs=50, save_path="best_model.pdparams")

plot(runner, 'fw-acc.pdf')

#模型评价
runner.load_model("best_model.pdparams")
score, loss = runner.evaluate([X_test, y_test])
print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))

这里补充几个函数:

make_moons函数代码:

import torch
# 新增make_moons函数
def make_moons(n_samples=1000, shuffle=True, noise=None):
    n_samples_out = n_samples // 2
    n_samples_in = n_samples - n_samples_out
    outer_circ_x = torch.cos(torch.linspace(0, math.pi, n_samples_out))
    outer_circ_y = torch.sin(torch.linspace(0, math.pi, n_samples_out))
    inner_circ_x = 1 - torch.cos(torch.linspace(0, math.pi, n_samples_in))
    inner_circ_y = 0.5 - torch.sin(torch.linspace(0, math.pi, n_samples_in))
    X = torch.stack(
        [torch.cat([outer_circ_x, inner_circ_x]),
         torch.cat([outer_circ_y, inner_circ_y])],
         axis=1
    )
    y = torch.cat(
        [torch.zeros([n_samples_out]), torch.ones([n_samples_in])]
    )
    if shuffle:
        idx = torch.randperm(X.shape[0])
        X = X[idx]
        y = y[idx]
    if noise is not None:
        X += np.random.normal(0.0, noise, X.shape)

    return X, y

accuracy函数代码:

def accuracy(preds, labels):
    # 判断是二分类任务还是多分类任务,preds.shape[1]=1时为二分类任务,preds.shape[1]>1时为多分类任务
    if preds.shape[1] == 1:
        preds=(preds>=0.5).to(torch.float32)

    else:
        preds = torch.argmax(preds,dim=1).int()

    return torch.mean((preds == labels).float())

plot函数代码:

import matplotlib.pyplot as plt
def plot(runner, fig_name):
    plt.figure(figsize=(10, 5))
    epochs = [i for i in range(len(runner.train_scores))]

    plt.subplot(1, 2, 1)
    plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')

    plt.subplot(1, 2, 2)
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')
    plt.savefig(fig_name)
    plt.show()

运行结果:

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.46250
[Train] epoch: 0/2000, loss: 0.7185732126235962
[Train] epoch: 50/2000, loss: 0.6900826096534729
[Evaluate] best accuracy performence has been updated: 0.46250 --> 0.46875
[Evaluate] best accuracy performence has been updated: 0.46875 --> 0.47500
[Evaluate] best accuracy performence has been updated: 0.47500 --> 0.49375
[Evaluate] best accuracy performence has been updated: 0.49375 --> 0.50625
[Evaluate] best accuracy performence has been updated: 0.50625 --> 0.52500
[Evaluate] best accuracy performence has been updated: 0.52500 --> 0.53125
[Train] epoch: 100/2000, loss: 0.6788896918296814
[Evaluate] best accuracy performence has been updated: 0.53125 --> 0.55000
[Evaluate] best accuracy performence has been updated: 0.55000 --> 0.55625
[Evaluate] best accuracy performence has been updated: 0.55625 --> 0.56250
[Evaluate] best accuracy performence has been updated: 0.56250 --> 0.56875
[Evaluate] best accuracy performence has been updated: 0.56875 --> 0.58125
[Evaluate] best accuracy performence has been updated: 0.58125 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.60625
[Evaluate] best accuracy performence has been updated: 0.60625 --> 0.61250
[Evaluate] best accuracy performence has been updated: 0.61250 --> 0.61875
[Evaluate] best accuracy performence has been updated: 0.61875 --> 0.63750
[Evaluate] best accuracy performence has been updated: 0.63750 --> 0.64375
[Evaluate] best accuracy performence has been updated: 0.64375 --> 0.65000
[Evaluate] best accuracy performence has been updated: 0.65000 --> 0.65625
[Evaluate] best accuracy performence has been updated: 0.65625 --> 0.66875
[Evaluate] best accuracy performence has been updated: 0.66875 --> 0.67500
[Evaluate] best accuracy performence has been updated: 0.67500 --> 0.68125
[Evaluate] best accuracy performence has been updated: 0.68125 --> 0.68750
[Evaluate] best accuracy performence has been updated: 0.68750 --> 0.69375
[Evaluate] best accuracy performence has been updated: 0.69375 --> 0.70000
[Evaluate] best accuracy performence has been updated: 0.70000 --> 0.70625
[Evaluate] best accuracy performence has been updated: 0.70625 --> 0.71250
[Evaluate] best accuracy performence has been updated: 0.71250 --> 0.72500
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.73750
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.78125
[Train] epoch: 150/2000, loss: 0.6434668898582458
[Train] epoch: 200/2000, loss: 0.5647386312484741
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.78750
[Train] epoch: 250/2000, loss: 0.4727853834629059
[Evaluate] best accuracy performence has been updated: 0.78750 --> 0.79375
[Evaluate] best accuracy performence has been updated: 0.79375 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.80625
[Train] epoch: 300/2000, loss: 0.40855950117111206
[Evaluate] best accuracy performence has been updated: 0.80625 --> 0.81250
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Evaluate] best accuracy performence has been updated: 0.82500 --> 0.83125
[Evaluate] best accuracy performence has been updated: 0.83125 --> 0.83750
[Evaluate] best accuracy performence has been updated: 0.83750 --> 0.85000
[Evaluate] best accuracy performence has been updated: 0.85000 --> 0.85625
[Train] epoch: 350/2000, loss: 0.36973756551742554
[Evaluate] best accuracy performence has been updated: 0.85625 --> 0.86250
[Evaluate] best accuracy performence has been updated: 0.86250 --> 0.86875
[Train] epoch: 400/2000, loss: 0.3448896110057831
[Train] epoch: 450/2000, loss: 0.3279474079608917
[Evaluate] best accuracy performence has been updated: 0.86875 --> 0.87500
[Evaluate] best accuracy performence has been updated: 0.87500 --> 0.88125
[Train] epoch: 500/2000, loss: 0.3161616027355194
[Evaluate] best accuracy performence has been updated: 0.88125 --> 0.88750
[Train] epoch: 550/2000, loss: 0.30801355838775635
[Train] epoch: 600/2000, loss: 0.3024461269378662
[Evaluate] best accuracy performence has been updated: 0.88750 --> 0.89375
[Train] epoch: 650/2000, loss: 0.2986697256565094
[Evaluate] best accuracy performence has been updated: 0.89375 --> 0.90000
[Train] epoch: 700/2000, loss: 0.2961108088493347
[Train] epoch: 750/2000, loss: 0.29436975717544556
[Evaluate] best accuracy performence has been updated: 0.90000 --> 0.90625
[Train] epoch: 800/2000, loss: 0.2931761145591736
[Train] epoch: 850/2000, loss: 0.2923488914966583
[Train] epoch: 900/2000, loss: 0.291767954826355
[Train] epoch: 950/2000, loss: 0.29135289788246155
[Train] epoch: 1000/2000, loss: 0.29105034470558167
[Train] epoch: 1050/2000, loss: 0.290824294090271
[Train] epoch: 1100/2000, loss: 0.29065054655075073
[Train] epoch: 1150/2000, loss: 0.2905128598213196
[Train] epoch: 1200/2000, loss: 0.29040008783340454
[Train] epoch: 1250/2000, loss: 0.29030489921569824
[Train] epoch: 1300/2000, loss: 0.2902221083641052
[Train] epoch: 1350/2000, loss: 0.29014816880226135
[Train] epoch: 1400/2000, loss: 0.290080726146698
[Train] epoch: 1450/2000, loss: 0.290018230676651
[Train] epoch: 1500/2000, loss: 0.2899594306945801
[Train] epoch: 1550/2000, loss: 0.28990358114242554
[Train] epoch: 1600/2000, loss: 0.2898501753807068
[Train] epoch: 1650/2000, loss: 0.2897987961769104
[Train] epoch: 1700/2000, loss: 0.2897491455078125
[Train] epoch: 1750/2000, loss: 0.28970101475715637
[Train] epoch: 1800/2000, loss: 0.2896542549133301
[Train] epoch: 1850/2000, loss: 0.28960874676704407
[Train] epoch: 1900/2000, loss: 0.28956443071365356
[Train] epoch: 1950/2000, loss: 0.28952115774154663
[Test] score/loss: 0.9050/0.2641

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第8张图片

2. 增加一个3个神经元的隐藏层,再次实现二分类,并与1做对比。

改为3个神经元的隐藏层,我们需要增加一个隐藏层的参数
初始化:

input_size = 2
hidden_size = 5
# 增加
hidden_size2 = 3
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size,hidden_size2=hidden_size2, output_size=output_size)

模型上要增加函数


class Model_MLP_L2_V4(torch.nn.Module):
    def __init__(self, input_size, hidden_size, hidden_size2, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        w1=torch.normal(0,0.1,size=(hidden_size,input_size),requires_grad=True)
        self.fc1.weight = nn.Parameter(w1)

        self.fc2 = nn.Linear(hidden_size, hidden_size2)
        w2 = torch.normal(0, 0.1, size=(hidden_size2, hidden_size), requires_grad=True)
        self.fc2.weight = nn.Parameter(w2)

        self.fc3 = nn.Linear(hidden_size2, output_size)
        w3 = torch.normal(0, 0.1, size=(output_size, hidden_size2), requires_grad=True)
        self.fc3.weight = nn.Parameter(w3)

        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = torch.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs.to(torch.float32))
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        z3 = self.fc3(a2)
        a3 = self.act_fn(z3)
        return a3

运行结果:

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.51250
[Train] epoch: 0/2000, loss: 0.6943337321281433
[Train] epoch: 50/2000, loss: 0.6928403973579407
[Train] epoch: 100/2000, loss: 0.6928308606147766
[Train] epoch: 150/2000, loss: 0.6928211450576782
[Train] epoch: 200/2000, loss: 0.6928109526634216
[Train] epoch: 250/2000, loss: 0.6928002238273621
[Train] epoch: 300/2000, loss: 0.6927885413169861
[Train] epoch: 350/2000, loss: 0.692775547504425
[Train] epoch: 400/2000, loss: 0.6927613019943237
[Train] epoch: 450/2000, loss: 0.6927453875541687
[Train] epoch: 500/2000, loss: 0.6927274465560913
[Train] epoch: 550/2000, loss: 0.6927071213722229
[Train] epoch: 600/2000, loss: 0.6926836967468262
[Train] epoch: 650/2000, loss: 0.6926565170288086
[Train] epoch: 700/2000, loss: 0.6926250457763672
[Train] epoch: 750/2000, loss: 0.692588210105896
[Train] epoch: 800/2000, loss: 0.6925449371337891
[Train] epoch: 850/2000, loss: 0.6924934983253479
[Train] epoch: 900/2000, loss: 0.6924319267272949
[Train] epoch: 950/2000, loss: 0.6923579573631287
[Train] epoch: 1000/2000, loss: 0.6922681927680969
[Train] epoch: 1050/2000, loss: 0.6921581029891968
[Train] epoch: 1100/2000, loss: 0.6920220255851746
[Train] epoch: 1150/2000, loss: 0.691851794719696
[Train] epoch: 1200/2000, loss: 0.6916364431381226
[Train] epoch: 1250/2000, loss: 0.6913599967956543
[Train] epoch: 1300/2000, loss: 0.6909999847412109
[Train] epoch: 1350/2000, loss: 0.6905229687690735
[Train] epoch: 1400/2000, loss: 0.6898793578147888
[Train] epoch: 1450/2000, loss: 0.688992440700531
[Train] epoch: 1500/2000, loss: 0.6877409219741821
[Train] epoch: 1550/2000, loss: 0.6859274506568909
[Evaluate] best accuracy performence has been updated: 0.51250 --> 0.51875
[Evaluate] best accuracy performence has been updated: 0.51875 --> 0.52500
[Evaluate] best accuracy performence has been updated: 0.52500 --> 0.53750
[Evaluate] best accuracy performence has been updated: 0.53750 --> 0.54375
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.55625
[Evaluate] best accuracy performence has been updated: 0.55625 --> 0.56875
[Evaluate] best accuracy performence has been updated: 0.56875 --> 0.57500
[Evaluate] best accuracy performence has been updated: 0.57500 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.60000
[Evaluate] best accuracy performence has been updated: 0.60000 --> 0.60625
[Evaluate] best accuracy performence has been updated: 0.60625 --> 0.61875
[Evaluate] best accuracy performence has been updated: 0.61875 --> 0.62500
[Evaluate] best accuracy performence has been updated: 0.62500 --> 0.63125
[Evaluate] best accuracy performence has been updated: 0.63125 --> 0.63750
[Train] epoch: 1600/2000, loss: 0.6832197904586792
[Evaluate] best accuracy performence has been updated: 0.63750 --> 0.65000
[Evaluate] best accuracy performence has been updated: 0.65000 --> 0.65625
[Evaluate] best accuracy performence has been updated: 0.65625 --> 0.66875
[Evaluate] best accuracy performence has been updated: 0.66875 --> 0.68125
[Evaluate] best accuracy performence has been updated: 0.68125 --> 0.69375
[Evaluate] best accuracy performence has been updated: 0.69375 --> 0.70000
[Evaluate] best accuracy performence has been updated: 0.70000 --> 0.70625
[Evaluate] best accuracy performence has been updated: 0.70625 --> 0.71250
[Evaluate] best accuracy performence has been updated: 0.71250 --> 0.71875
[Evaluate] best accuracy performence has been updated: 0.71875 --> 0.72500
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.73750
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.74375
[Evaluate] best accuracy performence has been updated: 0.74375 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76250
[Evaluate] best accuracy performence has been updated: 0.76250 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.77500
[Evaluate] best accuracy performence has been updated: 0.77500 --> 0.78125
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.79375
[Evaluate] best accuracy performence has been updated: 0.79375 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.80625
[Evaluate] best accuracy performence has been updated: 0.80625 --> 0.81250
[Train] epoch: 1650/2000, loss: 0.6790398955345154
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Evaluate] best accuracy performence has been updated: 0.82500 --> 0.83125
[Evaluate] best accuracy performence has been updated: 0.83125 --> 0.83750
[Evaluate] best accuracy performence has been updated: 0.83750 --> 0.84375
[Train] epoch: 1700/2000, loss: 0.6723501086235046
[Train] epoch: 1750/2000, loss: 0.661268413066864
[Train] epoch: 1800/2000, loss: 0.6425549387931824
[Train] epoch: 1850/2000, loss: 0.6118574142456055
[Train] epoch: 1900/2000, loss: 0.5674184560775757
[Train] epoch: 1950/2000, loss: 0.515984296798706
[Test] score/loss: 0.8100/0.6730

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第9张图片

这里我们可以发现,训练模型在测试集上的准确率得到了进一步的提升。

3.【思考题】自定义梯度计算和自动梯度计算:从计算性能、计算结果等多方面比较,谈谈自己的看法。

深度学习 前馈神经网络(2)自动梯度计算和优化问题_第10张图片
深度学习 前馈神经网络(2)自动梯度计算和优化问题_第11张图片
使用前面实验Model_MLP_L2类

激活函数:

def backward(self, grads):
	outputs_grad_inputs = torch.multiply(self.outputs, (1.0 - self.outputs))
	return torch.multiply(grads, outputs_grad_inputs)

损失函数:

def backward(self):
	loss_grad_predicts = -1.0 * (self.labels / self.predicts - (1 - self.labels) / (1 - self.predicts)) / self.num
	self.model.backward(loss_grad_predicts)

自动梯度求导:
1
自定义梯度求导:
2

自定义梯度计算是一步到位,计算速度很快
对比了一下,发现二者更新的结果还是有很大出入的,不仅仅体现在测试集准确率上,这权重梯度也很不一样。
所以,在时间充裕爱好数学的方式下,尽量采用自定义求导。

需要导入的包

深度学习 第3章线性分类 实验四 pytorch实现 Softmax回归 鸢尾花分类任务 下篇
深度学习 前馈神经网络(2)自动梯度计算和优化问题_第12张图片
make_moons函数代码:

import torch
# 新增make_moons函数
def make_moons(n_samples=1000, shuffle=True, noise=None):
    n_samples_out = n_samples // 2
    n_samples_in = n_samples - n_samples_out
    outer_circ_x = torch.cos(torch.linspace(0, math.pi, n_samples_out))
    outer_circ_y = torch.sin(torch.linspace(0, math.pi, n_samples_out))
    inner_circ_x = 1 - torch.cos(torch.linspace(0, math.pi, n_samples_in))
    inner_circ_y = 0.5 - torch.sin(torch.linspace(0, math.pi, n_samples_in))
    X = torch.stack(
        [torch.cat([outer_circ_x, inner_circ_x]),
         torch.cat([outer_circ_y, inner_circ_y])],
         axis=1
    )
    y = torch.cat(
        [torch.zeros([n_samples_out]), torch.ones([n_samples_in])]
    )
    if shuffle:
        idx = torch.randperm(X.shape[0])
        X = X[idx]
        y = y[idx]
    if noise is not None:
        X += np.random.normal(0.0, noise, X.shape)

    return X, y

补充

1. torch.optim.SGD()

其中的SGD就是optim中的一个算法(优化器):随机梯度下降算法

torch.optim是一个实现了各种优化算法的库。大部分常用的方法得到支持,并且接口具备足够的通用性,使得未来能够集成更加复杂的方法。

动手学深度学习-多层感知机中:updater = torch.optim.SGD(params, lr=lr)。其中的updater就是一个optimizer对象。

随机梯度下降(Stochastic gradient descent,SGD)是对批量梯度下降(Batch Gradient Descent)的改进,与批量梯度下降法相比,SGD每次更新时只随机取一个样本计算梯度,所以速度较快。但是,在频繁更新的情况下,会导致结果不稳定,波动较大,所以现在的SGD一般都指随机小批量梯度下降(Mini-Batch Gradient Descent),即随机抽取一批样本,以此更新参数。通常一小批数据含有的样本数量为50~256。

2. torch.nn.init

这里我们主要用来初始化一些参数,
比如:

方法 作用 参数
torch.nn.init.uniform(tensor, a=0, b=1) 从均匀分布U(a, b)中生成值,填充输入的张量或变量 * tensor - n维的torch.Tensor
* a - 均匀分布的下界
* b - 均匀分布的上界
torch.nn.init.normal(tensor, mean=0, std=1) 从给定均值和标准差的正态分布N(mean, std)中生成值,填充输入的张量或变量 * tensor – n维的torch.Tensor
* mean – 正态分布的均值
* std – 正态分布的标准差
torch.nn.init.constant(tensor, val) 用val的值填充输入的张量或变量 * tensor – n维的torch.Tensor或autograd.Variable
* val – 用来填充张量的值

3. 梯度消失和梯度爆炸产生的原因即解决办法

首先解释一下什么是梯度消失和梯度爆炸

  • 深度神经网络训练的时候,采用的是反向传播方式,该方式使用链式求导,计算每层梯度的时候会涉及一些连乘操作,因此如果网络过深。
  • 那么如果连乘的因子大部分小于1,最后乘积的结果可能趋于0,也就是梯度消失,后面的网络层的参数不发生变化.
  • 那么如果连乘的因子大部分大于1,最后乘积可能趋于无穷,这就是梯度爆炸。

梯度消失:(1)隐藏层的层数过多;(2)采用了不合适的激活函数(更容易产生梯度消失,但是也有可能产生梯度爆炸)
梯度爆炸:(1)隐藏层的层数过多;(2)权重的初始化值过大

解决:

梯度消失:

  1. sigmoid容易发生,更换激活函数为 ReLU即可。
  2. 权重初始化用高斯初始化

梯度爆炸:

  1. 设置梯度剪切阈值,如果超过了该阈值,直接将梯度置为该值。
  2. 使用ReLU,maxout等替代sigmoid

总结

  1. 通过本次实验的内容,详细学习了torch.nn.Module的使用,以及自定义梯度计算和自动梯度计算之间的区别。
  2. 更加熟悉了paddle与pytorch之间的转换
  3. 知晓了在遇到前馈神经网络中参数为0的情况下,对结果数据结果的分析

参考

NNDL 实验4(上)
NNDL 实验五 前馈神经网络(2)自动梯度计算 & 优化问题
torch.optim.SGD()
torch.nn.init


深度学习 前馈神经网络(2)自动梯度计算和优化问题_第13张图片

创作不易,如果对你有帮助,求求你给我个赞!!!
点赞 + 收藏 + 关注!!!
如有错误与建议,望告知!!!(将于下篇文章更正)
请多多关注我!!!谢谢!!!

你可能感兴趣的:(深度学习,python)