大部分的机器学习算法的本质都是建立优化模型,通过最优化方法对目标函数(或损失函数)进行优化,从而训练出最好的模型。优化算法可以加快收敛速度(未加入优化的神经网络训练时间比加入优化后时间更短),甚至得到一个更好更小的损失函数值。优化算法能帮你快速高效地训练模型。
在Pytorch中,优化算法封装在torch.optim模块中。
一、Optimizer用法
optimizer = torch.optim.xxx()
##### 常见用法 #####
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
##### Conjugate Gradient and LBFGS #####
for input, target in dataset:
def closure():
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
return loss
optimizer.step(closure)
二、SGD
torch.optim.SGD(params, lr=
, momentum=0, dampening=0, weight_decay=0, nesterov=False)
三、Adam
torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
《Adam: A Method for Stochastic Optimization.》
四、Adadelta
torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
《 ADADELTA: An Adaptive Learning Rate Method 》
五、Adagrad
torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)
《Adaptive Subgradient Methods for Online Learning and Stochastic Optimization》
六、RMSprop
torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
七、LBFGS
torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-05, tolerance_change=1e-09, history_size=100, line_search_fn=None)
八、ASGD
torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)
《Acceleration of stochastic approximation by averaging.》
九、Adamax
torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
a variant of Adam based on infinity norm。
十、SparseAdam
torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)
十一、Rprop
torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))
- https://pytorch.org/docs/stable/optim.html