我们从有关特征学习中并不常见的问题入手
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-D5FItvwT-1663327954422)(image/g5.png)]
%matplotlib inline
import math
import torch
from d2l import torch as d2l
def adagrad_2d(x1,x2,s1,s2):
eps = 1e-6
g1,g2 = 0.2 * x1,4 * x2
s1 += g1 ** 2
s2 += g2 ** 2
x1 -= eta / math.sqrt(s1 + eps) * g1
x2 -= eta / math.sqrt(s2 + eps) * g2
return x1,x2,s1,s2
def f_2d(x1,x2):
return 0.1 * x1 ** 2 + 2 * x2 ** 2
eta = 0.4
d2l.show_trace_2d(f_2d,d2l.train_2d(adagrad_2d))
epoch 20, x1: -2.382563, x2: -0.158591
C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JQGTNVvr-1663327954422)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209161924238.svg)]
我们将学习率提高到2,可以看到更好的表现。这已经表明,即使在无噪声的情况下,学习率的降低可能相当剧烈,我们需要确保参数能够适当地收敛
eta = 2
d2l.show_trace_2d(f_2d,d2l.train_2d(adagrad_2d))
epoch 20, x1: -0.002295, x2: -0.000000
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nHJEC5gp-1663327954422)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209161924239.svg)]
同动量法一样,AdaGrad算法需要对每个自变量维护同它一样形状的状态变量
def init_adagrad_states(feature_dim):
s_w = torch.zeros(feature_dim,1)
s_b = torch.zeros(1)
return (s_w,s_b)
def adagrad(params,states,hyperparams):
eps = 1e-6
for p,s in zip(params,states):
with torch.no_grad():
s[:] += torch.square(p.grad)
p[:] -= hyperparams['lr'] * p.grad / torch.sqrt(s + eps)
p.grad.data.zero_()
这里使用较大的学习率来训练模型
data_iter, feature_dim = d2l.get_data_ch11(batch_size=10)
d2l.train_ch11(adagrad, init_adagrad_states(feature_dim),
{'lr': 0.1}, data_iter, feature_dim);
loss: 0.243, 0.005 sec/epoch
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sQlJDg0k-1663327954423)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209161924240.svg)]
我们可以直接使用深度学习框架中提供的AdaGrad算法来训练模型
trainer = torch.optim.Adagrad
d2l.train_concise_ch11(trainer,{'lr':0.1},data_iter)
loss: 0.242, 0.005 sec/epoch
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gQKgG4Pb-1663327954423)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209161924241.svg)]