pytorch-2.Tutorials-Learning Pytorch-Deep Learning with PyTorch:LEARNING PYTORCH WITH EXAMPLES

LEARNING PYTORCH WITH EXAMPLES

  • 一.Tensors
    • 1.Warm-up:numpy预热/热身赛
    • 2.PyTorch: Tensors
  • 二.Autograd
    • 1.PyTorch: Tensors and autograd
    • 2.PyTorch: Defining new autograd functions定义新的autograd函数
  • 三.nn module:nn模块
    • 1.PyTorch: nn
    • 2.PyTorch: optim
    • 3.PyTorch: Custom nn Modules自定义神经网络模块
    • 4.PyTorch: Control Flow + Weight Sharing控制数据流+权重共享

这一部分通过一些例子来引入Pytorch的一些基本概念

Pytorch提供两个基本的特征:

  • n维Tensors,类似于numpy,但是可以在GPUs上运行
  • 可以自动微分来构建和训练神经网络

这里将全连接ReLU网络作为运行示例。示例中的网络只有一个隐藏层,网络训练中使用梯度下降法来fit随机生成的数据,同时不断极小化网络输出和真实值之间的Euclidean distance欧氏距离。

一.Tensors

1.Warm-up:numpy预热/热身赛

在介绍Pytorch之前,我们先用numpy实现一个神经网络

# -*- coding: utf-8 -*-
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights初始化权重
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum() #np.square()各元素的平方
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    #反向传播计算w1和w2对应loss的梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    #随机梯度下降法,SGD
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

2.PyTorch: Tensors

Numpy不能利用GPUs去加速它的数值运算。在现代modern deep neural networks中,GPUs可以提供50倍以上的加速计算。
A PyTorch Tensor在概念上和numpy array是相同的:a Tensor is an n-dimensional array,and PyTorch provides many functions for operating on these Tensors。Tensors可以记录计算图a computational graph和梯度gradients,同时Tensors对科学计算也很有用。
PyTorch Tensors can utilize GPUs to accelerate their numeric computations。PyTorch Tensors可以使用GPUs加速。在GPU上运行PyTorch Tensors,仅需要将Tensors投射cast为另一个数据类型datatype。

下面我们用PyTorch Tensors fit a two-layer network to random data,这里依然需要我们来手动实现forward and backward passes。

# -*- coding: utf-8 -*-

import torch

dtype = torch.float
#torch.Tensor是多维矩阵,其中包含的元素是a single data type.
#Torch定义了nine CPU tensor types and nine GPU tensor types.
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment(取消批注) this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0) #clamp,用夹锁锁住(车);保持(电信号)电压极限在规定水平
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item() #
    #torch.Tensor.item()是得到一个元素张量里面的元素值,用于将一个零维张量转换成浮点数,比如计算loss,accuracy的值
    #Returns the value of this tensor as a standard Python number. This only works for tensors with one element.
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

二.Autograd

1.PyTorch: Tensors and autograd

前面例子中手动实现了forward and backward passes of our neural network。Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.对于大型且复杂的网络,手动实现the backward pass非常麻烦。

The autograd package in PyTorch提供了 automatic differentiation 自动微分来使得backward passes计算自动化 automate the computation of backward passes in neural networks。
使用autograd时,the forward pass of your network会定义一个computational graph计算图,图中的nodes节点会变为Tensors,edges边缘是produce output Tensors from input Tensors的functions,图中的Backpropagating可以用来计算 gradients。

每个tensor表现为图中的每个node。If x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.如果x.requires_grad=True ,那么x.grad是另一个Tensor,且x.grad持有x相对于某些标量值的梯度。

下面我们用PyTorch Tensors and autograd o implement our two-layer network。

# -*- coding: utf-8 -*-
import torch

dtype = torch.float
#device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.生成输入和输出数据的随机Tensors
# Setting requires_grad=False indicates that we do not need to compute gradients with respect to these Tensors during the backward pass.
#设定requires_grad=False表示不计算反向传播中这些tensors对应的gradients
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.生成权重的随机Tensors
# Setting requires_grad=True indicates that we want to compute gradients with respect to these Tensors during the backward pass.
#
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors; 
#these are exactly the same operations we used to compute the forward 
#pass using Tensors, 
#but we do not need to keep references to intermediate values since we are not implementing the backward pass by hand.
#不需要引入中间值,因为不需要手动实现反向传播
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.计算且打印出loss
    # Now loss is a Tensor of shape (1,),loss是一个一维tensor,只有一行
    # loss.item() gets the scalar value held in the loss.用loss.item()函数从loss tensor中获得scalar value标量值
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. 用autograd计算反向传播
    #This call will compute the gradient of loss with respect to all Tensors with requires_grad=True.
    #设定requires_grad=True,下面的命令会计算所有tensor相对于loss的梯度
    # After this call w1.grad and w2.grad will be Tensors holding the gradient of the loss with respect to w1 and w2 respectively.
    #执行下面命令后,w1.grad and w2.grad将是持有loss相对于w1 and w2参数的梯度的tensor
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad() 用梯度下降法手动更新梯度--包装在torch.no_grad()中
    # because weights have requires_grad=True, but we don't need to track this in autograd.权重设定requires_grad=True,不需要在autograd中追踪权重
    # An alternative way is to operate on weight.data and weight.grad.data.一个方法是在weight.data and weight.grad.data上操作
    # Recall that tensor.data gives a tensor that shares the storage with tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.可以用torch.optim.SGD实现权重参数更新
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights更新权重参数之后将梯度手动归零
        w1.grad.zero_()
        w2.grad.zero_()

2.PyTorch: Defining new autograd functions定义新的autograd函数

每个原始的autograd operator实际上都是操作Tensors的两个函数。The forward function computes output Tensors from input Tensors.The backward function receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensors with respect to that same scalar value.(backward function接收output Tensors相对于一些标量值的梯度,并且计算input Tensors相对于相同标量值的梯度)

我们可以定义自己的autograd operator,只需要通过定义torch.autograd.Function的子类和实现forward and backward functions就能实现。我们通过构造示例和像调用函数式调用它,然后传导保存输入数据的tensors就可以使用我们自己定义的new autograd operator 。

下面例子中,我们定义自己的autograd function ,运行the ReLU nonlinearity,然后实现两层网络。

# -*- coding: utf-8 -*-
import torch

class MyReLU(torch.autograd.Function): #继承torch.autograd.Function父类
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes which operate on Tensors.
    """
    @staticmethod  #翻译为静态方法
    #声明了静态方法forward(),从而可以实现实例化使用 MyRelu().forward(),当然也可以不实例化调用该方法MyRelu.forward()。
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        翻译:ctx一个可以用来存放反向传播计算的上下文信息对象,你可以用ctx.save_for_backward()缓存反向传播中的任意对象。
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.float
#device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # To apply our Function, we use Function.apply method. We alias this as 'relu'.
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

三.nn module:nn模块

1.PyTorch: nn

Computational graphs and autograd是定义complex operators和automatically taking derivatives自行衍生的非常强大的范例,但是,raw autograd对large neural networks(大神经网络)来说太低级了。

构建神经网络时,我们经常考虑将计算安排到layers中,一些层中的学习参数会在学习中被优化。

在TensorFlow中,packages like Keras, TensorFlow-Slim, and TFLearn都是以raw computational graphs为基础,提供了higher-level abstractions。

在PyTorch中,nn package和TensorFlow中的那些包一样,服务于相同的目的。nn package包含a set of Modules(子模块),这些模块大致相当于neural network layers。A Module receives input Tensors and computes output Tensors,同时也持有内部状态(比如:Tensors containing learnable parameters)。nn package也包含a set of useful loss functions(这些损失函数在训练网络时会经常用到)。

下面例子中我们用nn package实现two-layer network:

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. 
# 使用nn package定义神经网络中的层
# nn.Sequential is a Module which contains other Modules,
# nn.Sequential是包含其他Modules的Module.Sequentil意为顺序排列
# and applies them in sequence to produce its output. 
#按顺序应用这些Modules来生成输出
#Each Linear Module computes output from input using a linear function, 
# Linear Module用linear function,从input来计算output
#and holds internal Tensors for its weight and bias.
# 持有weight and bias的internal Tensors内部张量
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),)

# The nn package also contains definitions of popular loss functions; 
#nn package也包含loss functions
#in this case we will use Mean Squared Error (MSE) as our loss function.
#这里使用Mean Squared Error (MSE)作为loss functions
#reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none'
#参数:reduction(string, optional)-声明reduction:'none'|'mean' |'sum'|'none'.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
#Forward pass: compute predicted y by passing x to the model. 
#Module objects override the __call__ operator so you can call them like functions. 
#模块对象覆盖了操作符,所以你可以像调用函数一样调用它们。
#When doing so you pass a Tensor of input data to the Module and it produce a Tensor of output data.
    y_pred = model(x)

# Compute and print loss. 
#We pass Tensors containing the predicted and truevalues of y, and the loss function returns a Tensor containing the loss.
#传入包含predicted and truevalues of y的tensors,损失函数返回一个包含loss的tensors
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    #在操作反向传播之前将gradients归零
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable parameters of the model. 
    #反向传播:计算loss相对于模型中learnable parameters学习参数的梯度
    #Internally, the parameters of each Module are stored in Tensors with requires_grad=True, 
    #设定requires_grad=True,每个子模型的参数存储在tensors中
    #so this call will compute gradients for all learnable parameters in the model.
    #这个命令会计算所有学习参数的梯度
    loss.backward()

    # Update the weights using gradient descent.Each parameter is a Tensor,so we can access its gradients like we did before.
    #用梯度下降来更新权重。每个餐宿是一个tensor,我们能像之前一样存取梯度。
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

2.PyTorch: optim

到目前为止,我们通过手动改变持有学习参数的Tensors,来更新模型的参数(设定with torch.no_grad() or .data 来避免在autograd中跟踪历史)。对简单的优化算法(比如stochastic gradient descent随机梯度下降)来说,这不会造成什么负担,但是实际上我们经常用更复杂的优化方法(比如AdaGrad, RMSProp, Adam)来训练神经网络。

The optim packagea抽象了优化算法的思想,并且提供了用于optimization algorithms的实现方法。

下面例子中,我们用nn package来定义我们之前的模型,这里我们用optim package提供的Adam algorithm来优化模型。

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of the model for us. 
# 用optim package来定义Optimizer优化器,Optimizer优化器能更新模型的权重。
# Here we will use Adam; the optim package contains many other optimization algorithms. 
# 这里用Adam,optim package也保存了许多其他优化算法。
# The first argument to the Adam constructor tells the optimizer which Tensors it should update.
# Adam构造函数的第一个参数说明了优化器要更新的Tensors-参数包含在tensors中
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

# Before the backward pass, use the optimizer object to zero all of the gradients for the variables it will update (which are the learnable weights of the model). 
# 反向传播之前,用优化器将所有变量(模型的学习权重)的梯度归零(the gradients for the variables).
# This is because by default, gradients are accumulated in buffers( i.e, not overwritten) whenever .backward() is called. 
# 这样做是因为,默认设定是,.backward()被调用时,梯度会在缓存中累加起来(下次的值不会覆盖上次的值)
# Checkout docs of torch.autograd.backward for more details.
# 
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its parameters
    # 在优化器中,调用step function阶梯函数来更新它的参数
    optimizer.step()

3.PyTorch: Custom nn Modules自定义神经网络模块

有时候你想自定义一个比现存模型更复杂的模型,那样的话,你要通过继承nn.Module和定义前向传播来定义自己的模型,这个前向传播用其他模型或其他autograd operations来receives input Tensors and produces output Tensors。

下面例子中,我们自定义一个模型:

# -*- coding: utf-8 -*-
import torch

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as member variables.
        在构造器中,我们初始化两个nn.Linear modules,并赋值变量
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return a Tensor of output data. We can use Modules defined in the constructor as well as arbitrary operators on Tensors.
        前向传播函数中接收输入数据的tensor,并返回输出数据的tensor。我们可以使用构造器中定义的模型和任意tensors的operators
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
# 通过初始化上面定义的类来构造模型
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. 
#The call to model.parameters() in the SGD constructor will contain the learnable parameters of the two nn.Linear modules which are members of the model.
# SGD constructor构造器中的自定义模型的参数(在这里就是model.parameters())会保存两个nn.Linear模型的学习参数
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

4.PyTorch: Control Flow + Weight Sharing控制数据流+权重共享

为了举例说明dynamic graphs and weight sharing,我们将实现一个非常奇怪的网络:a fully-connected ReLU network。网络中的每一个forward pass会选择1到4之间的随机数字,网络用许多隐藏层,重复多次使用相同权重参数去计算最深的隐藏层。

模型中,我们用normal Python flow control常规Python流程控制去实现循环,且定义前向传播时,在最深层中,我们仅仅重复多次使用相同模型来实现参数共享。

# -*- coding: utf-8 -*-
import random
import torch

class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use in the forward pass.
        构造器中,我们构造了三个nn.Linear实例,这三个实例用在前向传播中
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3 and reuse the middle_linear Module that many times to compute hidden layer representations.
        前向传播中,我们随机选择0,1,2,3并重复多次使用中间层去计算隐藏层的表示。
        Since each forward pass builds a dynamic computation graph, we can use normal Python control-flow operators like loops or conditional statements when defining the forward pass of the model.
        每个前向传播构建一个动态计算图.定义前向传播时,我们使用Python标准控制流操作符(比如loop循环或conditional statements条件语句)
        Here we also see that it is perfectly safe to reuse the same Module many times when defining a computational graph. This is a big improvement from LuaTorch, where each Module could be used only once.
        定义计算图时,重复使用相同模型是非常安全的。这相对于LuaTorch是一个很大的进步,在LuaTorch中,每个模型只能用一次。
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)): #3个隐藏层,重复一个相同模型3次,循环
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. 
#Training this strange model with vanilla stochastic gradient descent is tough, so we use momentum
#用vanilla stochastic gradient descent法来训练模型时很困难的,所以这里用momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

你可能感兴趣的:(机器学习-深度学习)