借助Ray-tune
可以对pytorch自动调参,下面就一步步地改写,从原始的训练代码慢慢变为可以自动调参的代码的教程• 保姆级:
pip install -i https://mirrors.aliyun.com/pypi/simple/ 'ray[default]'
注意安装的内容是'ray[default]'
,因为后来的各版本代码有调整,因此统一指定成了这种安装方式,否则代码运行中会有各种warnning。
为了增加学习效率,这里举一个没有加任何额外的buff的例子。
这里用随机数生成训练数据,然后随手撘一个三层感知机的模型,运行一下确保跑通即可。
假设原有的训练代码如下:
import torch
import torch.nn as nn
import numpy as np
class LinearRegressionModel(nn.Module):
def __init__(self, input_shape, linear1, linear2, output_shape):
super(LinearRegressionModel, self).__init__()
self.linear1 = nn.Linear(input_shape, linear1)
self.linear2 = nn.Linear(linear1, linear2)
self.linear3 = nn.Linear(linear2, output_shape)
def forward(self, x):
l1 = self.linear1(x)
l2 = self.linear2(l1)
l3 = self.linear3(l2)
return l3
def train_model(x_train, y_train, linear1, linear2):
# 指定参数与损失函数
model = LinearRegressionModel(x_train.shape[1], linear1, linear2, 1)
epochs = 1000 # 迭代1000次
learning_rate = 0.01 # 学习率
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # 优化函数
criterion = nn.MSELoss() # Loss使用MSE值,目标是使MSE最小
loss_list = []
for epoch in range(epochs):
epoch += 1
optimizer.zero_grad() # 梯度清零
outputs = model(x_train) # 前向传播
loss = criterion(outputs, y_train) # 计算损失
loss.backward() # 返向传播
loss_list.append(loss.detach().numpy())
optimizer.step() # 更新权重参数
mean_loss = np.mean(loss_list)
print("loss: ", mean_loss)
if __name__ == '__main__':
x_train = torch.randn(100, 4) # 生成100个4维的随机数,作为训练集的 X
y_train = torch.randn(100, 1) # 作为训练集的label
train_model(x_train, y_train, 32, 8)
代码一定要跑通无报错哈!要不然会无限报错。
跑通模型后我们得到一个值,可能是loss: 0.6363947
,我们之后的目的是希望找到一组参数linear1 与 linear2
(模型中代表隐层数量),使这个loss
值越小越好!
只需要有如下修改即可:
from ray import tune
一共有3
处修改地点,修改后的代码如下:
import torch
import torch.nn as nn
import numpy as np
from ray import tune # 修改地方1:导包
class LinearRegressionModel(nn.Module):
def __init__(self, input_shape, linear1, linear2, output_shape):
super(LinearRegressionModel, self).__init__()
self.linear1 = nn.Linear(input_shape, linear1)
self.linear2 = nn.Linear(linear1, linear2)
self.linear3 = nn.Linear(linear2, output_shape)
def forward(self, x):
l1 = self.linear1(x)
l2 = self.linear2(l1)
l3 = self.linear3(l2)
return l3
def train_model(x_train, y_train, linear1, linear2):
# 指定参数与损失函数
model = LinearRegressionModel(x_train.shape[1], linear1, linear2, 1)
epochs = 1000 # 迭代1000次
learning_rate = 0.01 # 学习率
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # 优化函数
criterion = nn.MSELoss() # Loss使用MSE值,目标是使MSE最小
loss_list = []
for epoch in range(epochs):
epoch += 1
optimizer.zero_grad() # 梯度清零
outputs = model(x_train) # 前向传播
loss = criterion(outputs, y_train) # 计算损失
loss.backward() # 返向传播
loss_list.append(loss.detach().numpy())
optimizer.step() # 更新权重参数
mean_loss = np.mean(loss_list)
# print("loss: ", mean_loss) # 修改地方2:不要有任何print()操作影响控制台的观察效果
tune.report(my_loss=mean_loss) # 修改地方3:加入tune.report(xxx=value),这里xxx可以自定义
if __name__ == '__main__':
x_train = torch.randn(100, 4) # 生成100个4维的随机数,作为训练集的 X
y_train = torch.randn(100, 1) # 作为训练集的label
train_model(x_train, y_train, 32, 8)
这一步主要是指定调参的参数范围和一些调参规则
步骤如下:
config
替代所有的参数列表main
中直接训练模型的代码注释掉,咱们要修改成使用ray启动的方法修改后的代码与注释如下:
import torch
import torch.nn as nn
import numpy as np
from ray import tune
class LinearRegressionModel(nn.Module):
def __init__(self, input_shape, linear1, linear2, output_shape):
super(LinearRegressionModel, self).__init__()
self.linear1 = nn.Linear(input_shape, linear1)
self.linear2 = nn.Linear(linear1, linear2)
self.linear3 = nn.Linear(linear2, output_shape)
def forward(self, x):
l1 = self.linear1(x)
l2 = self.linear2(l1)
l3 = self.linear3(l2)
return l3
def train_model(config): # 修改1:修改参数,所有的参数都要借助config传递
# 指定参数与损失函数
model = LinearRegressionModel(x_train.shape[1], config['linear1'], config['linear2'], 1)
epochs = 1000 # 迭代1000次
learning_rate = 0.01 # 学习率
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # 优化函数
criterion = nn.MSELoss() # Loss使用MSE值,目标是使MSE最小
loss_list = []
for epoch in range(epochs):
epoch += 1
optimizer.zero_grad() # 梯度清零
outputs = model(x_train) # 前向传播
loss = criterion(outputs, y_train) # 计算损失
loss.backward() # 返向传播
loss_list.append(loss.detach().numpy())
optimizer.step() # 更新权重参数
mean_loss = np.mean(loss_list)
tune.report(my_loss=mean_loss)
if __name__ == '__main__':
x_train = torch.randn(100, 4) # 生成100个4维的随机数,作为训练集的 X
y_train = torch.randn(100, 1) # 作为训练集的label
# train_model(x_train, y_train, 32, 8) # 修改2:就不需要这样启动了,注释掉这一行
# 修改3:下面就是封装的方法
config = {
"linear1": tune.sample_from(lambda _: np.random.randint(2, 5)), # 自定义采样
"linear2": tune.choice([2, 4, 8, 16]), # 从给定值中随机选择
}
result = tune.run( # 执行训练过程,执行到这里就会根据config自动调参了
train_model, # 要训练的模型
resources_per_trial={"cpu": 8, }, # 指定训练资源
config=config,
num_samples=20, # 迭代的次数
)
# 得到最后的结果
print("======================== Result =========================")
print(result.results_df)
跑通后可以得到如下结果:
+-------------------------+------------+-------+-----------+-----------+--------+------------------+-----------+
| Trial name | status | loc | linear1 | linear2 | iter | total time (s) | my_loss |
|-------------------------+------------+-------+-----------+-----------+--------+------------------+-----------|
| train_model_559fb_00000 | TERMINATED | | 2 | 8 | 1 | 0.325345 | 1.05185 |
| train_model_559fb_00001 | TERMINATED | | 3 | 16 | 1 | 0.32275 | 1.06859 |
| train_model_559fb_00002 | TERMINATED | | 4 | 8 | 1 | 0.292687 | 1.04742 |
| train_model_559fb_00003 | TERMINATED | | 2 | 16 | 1 | 0.291877 | 1.06596 |
| train_model_559fb_00004 | TERMINATED | | 3 | 2 | 1 | 0.298721 | 1.04942 |
| train_model_559fb_00005 | TERMINATED | | 4 | 2 | 1 | 0.294107 | 1.06227 |
| train_model_559fb_00006 | TERMINATED | | 2 | 16 | 1 | 0.310592 | 1.04826 |
| train_model_559fb_00007 | TERMINATED | | 4 | 2 | 1 | 0.31578 | 1.0608 |
| train_model_559fb_00008 | TERMINATED | | 3 | 2 | 1 | 0.286066 | 1.05879 |
| train_model_559fb_00009 | TERMINATED | | 3 | 8 | 1 | 0.290412 | 1.05573 |
| train_model_559fb_00010 | TERMINATED | | 2 | 2 | 1 | 0.282055 | 1.04826 |
| train_model_559fb_00011 | TERMINATED | | 4 | 4 | 1 | 0.288696 | 1.05269 |
| train_model_559fb_00012 | TERMINATED | | 2 | 2 | 1 | 0.311075 | 1.06867 |
| train_model_559fb_00013 | TERMINATED | | 3 | 2 | 1 | 0.312876 | 1.06269 |
| train_model_559fb_00014 | TERMINATED | | 3 | 2 | 1 | 0.273914 | 1.06514 |
| train_model_559fb_00015 | TERMINATED | | 4 | 2 | 1 | 0.270813 | 1.05434 |
| train_model_559fb_00016 | TERMINATED | | 4 | 16 | 1 | 0.292606 | 1.06485 |
| train_model_559fb_00017 | TERMINATED | | 3 | 16 | 1 | 0.286667 | 1.05534 |
| train_model_559fb_00018 | TERMINATED | | 4 | 8 | 1 | 0.272169 | 1.06625 |
| train_model_559fb_00019 | TERMINATED | | 2 | 8 | 1 | 0.270342 | 1.0556 |
+-------------------------+------------+-------+-----------+-----------+--------+------------------+-----------+
最后的result.results_df
记录了所有的参数与结果
在结尾处打个断点还能看到可视化的图表:View the Ray dashboard at http://127.0.0.1:8265
,点进连接就可以看到运行状态了,可以通过这个连接拆看运行中的调参程序的运行状态。
程序跑通了就是信心十足,之后会解锁更多调参的内容与技巧