Optuna-超参数优化框架 入门使用及参数可视化

Optuna: 一个超参数优化框架

Optuna 是一个特别为机器学习设计的自动超参数优化软件框架。它具有命令式的,define-by-run 风格的 API。由于这种 API 的存在,用 Optuna 编写的代码模块化程度很高,Optuna 的用户因此也可以动态地构造超参数的搜索空间。

主要特点

Optuna 有如下现代化的功能:

  • 轻量级、多功能和跨平台架构
  • 并行的分布式优化
  • 对不理想实验 (trial) 的剪枝 (pruning)

基本概念

我们以如下方式使用 studytrial 这两个术语:

  • Study: 基于目标函数的优化过程
  • Trial: 目标函数的单次执行过程

一个优化小例子

import optuna
#待优化函数
def objective(trial):
    #suggest_uniform()给定范围均匀选择参数
    x = trial.suggest_uniform('x', -10, 10)
    return (x - 2) ** 2
#开始优化过程,创建一个study对象,并将目标函数传递给她的一个方法.optimize
study = optuna.create_study()
study.optimize(objective, n_trials=100)

结果:

�[32m[I 2021-06-13 11:03:32,868]�[0m A new study created in memory with name: no-name-92b95d43-a1f4-4520-91af-9ab759adfd93�[0m
···
�[32m[I 2021-06-13 11:03:33,126]�[0m Trial 99 finished with value: 3.0832545651220173 and parameters: {'x': 3.755919862955601}. Best is trial 89 with value: 2.1818699199841563e-05.�[0m
#获取最佳参数
study.best_params

结果:

{'x': 1.9953289509529613}
  • 当Optuna被用于机器学习中的超参数搜索时,目标函数通常是对应模型的损失 (loss) 或者准确度 (accuracy).
#获取目标函数值
study.best_value

结果:

2.1818699199841563e-05
#获得最佳 trial:
study.best_trial

结果:

FrozenTrial(number=89, values=[2.1818699199841563e-05], datetime_start=datetime.datetime(2021, 6, 13, 11, 3, 33, 94211), datetime_complete=datetime.datetime(2021, 6, 13, 11, 3, 33, 96606), params={'x': 1.9953289509529613}, distributions={'x': UniformDistribution(high=10.0, low=-10.0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=89, state=TrialState.COMPLETE, value=None)
#获得所有trials
study.trials

结果:

[FrozenTrial(number=0, values=[8.902710095411337], datetime_start=datetime.datetime(2021, 6, 13, 11, 3, 32, 869319), datetime_complete=datetime.datetime(2021, 6, 13, 11, 3, 32, 871302), params={'x': -0.9837409564858905}, distributions={'x': UniformDistribution(high=10.0, low=-10.0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=0, state=TrialState.COMPLETE, value=None),
···
 FrozenTrial(number=99, values=[3.0832545651220173], datetime_start=datetime.datetime(2021, 6, 13, 11, 3, 33, 124493), datetime_complete=datetime.datetime(2021, 6, 13, 11, 3, 33, 126463), params={'x': 3.755919862955601}, distributions={'x': UniformDistribution(high=10.0, low=-10.0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=99, state=TrialState.COMPLETE, value=None)]
#获得 trial 的数目:
len(study.trials)

结果:

100
#(在优化结束后)通过再次执行 optimize(),我们可以继续优化过程
study.optimize(objective, n_trials=100)

结果:

�[32m[I 2021-06-13 11:03:33,169]�[0m Trial 100 finished with value: 1.537235278040247 and parameters: {'x': 0.7601470740284366}. Best is trial 89 with value: 2.1818699199841563e-05.�[0m
···
�[32m[I 2021-06-13 11:03:33,486]�[0m Trial 199 finished with value: 1.1332274505781779 and parameters: {'x': 0.9354684360817769}. Best is trial 89 with value: 2.1818699199841563e-05.�[0m
#获得更新(再次优化后)的 trial 数量:
len(study.trials)

结果:

200

基于pytorch的优化实例

github地址

在这个实例中,我们使用PyTorch和FashionMNIST优化手写数字识别验证集准确率,我们优化神经网络结构同样也配置优化器,使用FishionMNIST数据集太费时间,我们使用子数据集。

导入需要的包

import os
import optuna
from optuna.trial import TrialState
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets
from torchvision import transforms

设置参数

DEVICE = torch.device("cpu")
BATCHSIZE = 128
CLASSES = 10     #10分类
DIR = os.getcwd()#'/Users/zhou/code/optuna'
EPOCHS = 10
LOG_INTERVAL = 10
N_TRAIN_EXAMPLES = BATCHSIZE * 30  #128*30
N_VALID_EXAMPLES = BATCHSIZE * 10  #128*10

定义模型并选择参数范围

def define_model(trial):
    # We optimize the number of layers, hidden units and dropout ratio in each layer.
    # 我们优化layers层数量、每一层的隐藏单元和dropout率
    n_layers = trial.suggest_int("n_layers", 1, 3) #设置层数为1-3
    layers = []

    in_features = 28 * 28
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)#隐藏单元4-128
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
#         dropout_ratio越大舍弃的信息越多,loss下降的越慢,准确率增加的越慢
        p = trial.suggest_float("dropout_l{}".format(i), 0.2, 0.5)#dropout_ratio从0.2-0.5
        layers.append(nn.Dropout(p))

        in_features = out_features
    layers.append(nn.Linear(in_features, CLASSES))
    layers.append(nn.LogSoftmax(dim=1))

    return nn.Sequential(*layers)

获取数据集

def get_mnist():
    # Load FashionMNIST dataset.
    train_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=True, download=True, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )
    valid_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=False, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )

    return train_loader, valid_loader

优化器

def objective(trial):

    # Generate the model.
    model = define_model(trial).to(DEVICE)

    # Generate the optimizers.
    #生成优化器
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "RMSprop", "SGD"])
    #设置学习率 1e-5到1e-1
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)

    # Get the FashionMNIST dataset.
    train_loader, valid_loader = get_mnist()

    # Training of the model.
    for epoch in range(EPOCHS):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            # Limiting training data for faster epochs.
            if batch_idx * BATCHSIZE >= N_TRAIN_EXAMPLES:#batch_idx*128>30*128
                break

            data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
            
            optimizer.zero_grad()#清空过往梯度,设为0
            output = model(data)
            loss = F.nll_loss(output, target)#计算损失
            loss.backward()#反向传播,计算当前梯度
            optimizer.step()#根据梯度更新网络参数

        # 评估模型
        model.eval()
        correct = 0
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(valid_loader):
                # Limiting validation data.
                if batch_idx * BATCHSIZE >= N_VALID_EXAMPLES: #batch_idx*128>=30*128
                    break
                data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
                output = model(data)
                # Get the index of the max log-probability.
                #获得最大log概率的索引
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()

        accuracy = correct / min(len(valid_loader.dataset), N_VALID_EXAMPLES)

        trial.report(accuracy, epoch)

        # Handle pruning based on the intermediate value.
        #根据中间值处理修剪
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()
    return accuracy
if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100, timeout=600)

    pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
    complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

    print("Study statistics: ")
    print("  Number of finished trials: ", len(study.trials))
    print("  Number of pruned trials: ", len(pruned_trials))
    print("  Number of complete trials: ", len(complete_trials))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: ", trial.value)

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
�[32m[I 2021-06-13 11:03:34,345]�[0m A new study created in memory with name: no-name-710ccb27-9c8c-4aff-8ec5-bdecee1b6c9e�[0m
�···
�[32m[I 2021-06-13 11:05:22,702]�[0m Trial 98 pruned. �[0m
�[32m[I 2021-06-13 11:05:23,009]�[0m Trial 99 pruned. �[0m


Study statistics: 
  Number of finished trials:  100
  Number of pruned trials:  75
  Number of complete trials:  25
Best trial:
  Value:  0.84453125
  Params: 
    n_layers: 2
    n_units_l0: 122
    dropout_l0: 0.27085269973254716
    n_units_l1: 110
    dropout_l1: 0.4061131541476391
    optimizer: Adam
    lr: 0.008711744845419098

可视化高维参数关系

import optuna

# 这个冗长的更改只是为了简化笔记本的输出
optuna.logging.set_verbosity(optuna.logging.WARNING)  # This verbosity change is just to simplify the notebook output.

study = optuna.create_study(direction='maximize', pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=100)

等高线图

  • 在 study.optimize 执行结束以后,通过调用optuna.visualization.plot_contour,并将 study 和需要可视化的参数传入该方法,Optuna 将返回一张等高线图。
  • 如果不指定 params 也是可以的,optuna 将画出所有的参数之间的关系,在这里两者等价。
optuna.visualization.plot_contour(study)
  • 例如,当在上面的例子中,我们想要查看参数 n_layer 和 lr 的关系以及它们对于函数值贡献的话,只需要执行下面的语句即可:

选择可视化参数

optuna.visualization.plot_contour(study, params=['n_layers', 'lr'])
image
plot_contour(study, params=['n_units_l0', 'n_units_l1'])
image

可视化优化历史

from optuna.visualization import plot_optimization_history
plot_optimization_history(study)
image

可视化Trials的学习曲线

from optuna.visualization import plot_intermediate_values

plot_intermediate_values(study)
image

可视化高维度参数的关系

from optuna.visualization import plot_parallel_coordinate

plot_parallel_coordinate(study)
image

选择参数可视化

plot_parallel_coordinate(study, params=['lr', 'n_layers'])
image

可视化独立参数

from optuna.visualization import plot_slice

plot_slice(study)
image

选择参数可视化

plot_slice(study, params=['n_units_l0', 'n_units_l1'])
image

可视化参数的重要性

from optuna.visualization import plot_param_importances

plot_param_importances(study)
image

你可能感兴趣的:(Optuna-超参数优化框架 入门使用及参数可视化)