torch.nn.MSELoss()及torch.optim.SGD的理解

文章目录

      • 一个简单的例子
      • MSELoss的直观实现方法
      • SGD的直观实现方法
      • 参考

一个简单的例子

import torch
import torch.nn as nn

x = torch.randn(10, 3)
y = torch.randn(10, 2)
# Build a fully connected layer.
linear = nn.Linear(3, 2)

# Build loss function and optimizer.
criterion = nn.MSELoss()

# 优化方法选用随机梯度下降,学习率为0.01
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print('loss:', loss.item())

# Backward pass.
loss.backward()
print('dL/dw: ', linear.weight.grad)
print('dL/db: ', linear.bias.grad)

# 1-step gradient descent.
optimizer.step()

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print(loss.item())

MSELoss的直观实现方法

# MSELoss()等同于:
def mseLoss(pred, y):
   return ((pred - y) ** 2).mean()

M S E = 1 m ∑ i = 1 M ( y i ^ − y i ) 2 MSE=\frac{1}{m}\sum_{i=1}^M(\hat{y_i}-y_i)^2 MSE=m1i=1M(yi^yi)2

SGD的直观实现方法

optimizer.step()
# optimizer.step()等同于:
linear.weight.data.sub_(0.01 * linear.weight.grad.data)
linear.bias.data.sub_(0.01 * linear.bias.grad.data)

其中,0.01lr, sub_()方法是原地减,就像t_()方法是原地转置一样
wb的初始值是随机选取的,然后按照
w 1 ← w 0 − η d L d w ∣ w = w 0 , b = b 0 w^1\leftarrow w^0-\eta\frac{dL}{dw}|_{w=w^0,b=b^0}\\ w1w0ηdwdLw=w0,b=b0

b 1 ← b 0 − η d L d b ∣ w = w 0 , b = b 0 b^1\leftarrow b^0-\eta\frac{dL}{db}|_{w=w^0,b=b^0} b1b0ηdbdLw=w0,b=b0

参考

pytorch-tutorials

你可能感兴趣的:(mes,sgd,pytorch)