【Pytorch实战(二)】梯度及优化算法

一、计算梯度的简单示例

import torch

x = torch.tensor([1., 2.], requires_grad=True)
y = x[0] ** 2 + x[1] ** 2
print('y = {}'.format(y))
y.backward()
print('grad = {}'.format(x.grad))
# 输出结果 
# y = 5.0
# grad = tensor([2., 4.])

注意问题:

  • 构造张量x时应将参数requires_grad设置为True,这样求梯度时才会考虑x
  • 调用backward()方法求梯度,x.grad中就存储了所求梯度的数值

二、梯度下降法的简单示例

import torch.optim

x = torch.tensor([3., 2.], requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1, momentum=0) # SGD能够实现梯度下降的逻辑
for step in range(11):
    if step:
        optimizer.zero_grad()  # 清空上次迭代中储存的数据
        y.backward()           # 求解梯度
        optimizer.step()       # 更新值
    y = x[0] ** 2 + x[1] ** 2
    print('step {}: x = {}, y= {}'.format(step, x.tolist(), y))
"""
运行结果:
step 0: x = [3.0, 2.0], y= 13.0
step 1: x = [2.4000000953674316, 1.600000023841858], y= 8.320000648498535
step 2: x = [1.9200000762939453, 1.2799999713897705], y= 5.32480001449585
step 3: x = [1.5360000133514404, 1.0239999294281006], y= 3.407871961593628
step 4: x = [1.2288000583648682, 0.8191999197006226], y= 2.1810381412506104
step 5: x = [0.9830400347709656, 0.6553599238395691], y= 1.3958643674850464
step 6: x = [0.7864320278167725, 0.5242879390716553], y= 0.8933531641960144
step 7: x = [0.629145622253418, 0.41943034529685974], y= 0.5717460513114929
step 8: x = [0.5033165216445923, 0.33554428815841675], y= 0.36591750383377075
step 9: x = [0.40265321731567383, 0.26843541860580444], y= 0.23418718576431274
step 10: x = [0.32212257385253906, 0.2147483378648758], y= 0.1498797982931137
"""

基本的梯度下降法有如下缺陷:

  • 可能会陷入局部最小值或鞍点----->解决:引入动量
  • 可能会在最优点附近震荡但始终无法有效地接近最优点----->解决:动态调整学习率

三、优化算法

可使用torch.optim.SGDtorch.optim.RMSproptorch.optim.Adam等等。

你可能感兴趣的:(pytorch)