optuna作为调参工具适合绝大多数的机器学习框架,sklearn,xgb,lgb,pytorch等。
主要的调参原理如下:
1 采样算法
利用 suggested 参数值和评估的目标值的记录,采样器基本上不断缩小搜索空间,直到找到一个最佳的搜索空间,
其产生的参数会带来 更好的目标函数值。
study = optuna.create_study()
print(f"Sampler is {study.sampler.__class__.__name__}")
study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")
study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")
2 剪枝算法
自动在训练的早期(也就是自动化的 early-stopping)终止无望的 trial
激活 Pruner
要打开剪枝特性的话,你需要在迭代式训练的每一步后调用 report() 和 should_prune(). report() 定期监控目标函数的中间值. should_prune() 确定终结那些没有达到预先设定条件的 trial.
import logging
import sys
import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection
def objective(trial):
iris = sklearn.datasets.load_iris()
classes = list(set(iris.target))
train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
iris.data, iris.target, test_size=0.25, random_state=0)
alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)
clf = sklearn.linear_model.SGDClassifier(alpha=alpha)
for step in range(100):
clf.partial_fit(train_x, train_y, classes=classes)
# Report intermediate objective value.
intermediate_value = 1.0 - clf.score(valid_x, valid_y)
trial.report(intermediate_value, step)
# Handle pruning based on the intermediate value.
if trial.should_prune():
raise optuna.TrialPruned()
return 1.0 - clf.score(valid_x, valid_y)
# Add stream handler of stdout to show the messages
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)
对 optuna.samplers.RandomSampler 而言 optuna.pruners.MedianPruner 是最好的。
对于 optuna.samplers.TPESampler 而言 optuna.pruners.Hyperband 是最好的。
当 Optuna 被用于机器学习时,目标函数通常返回模型的损失或者准确度。
study.best_params
study.best_values
通过可选的 step
与 log
参数,我们可以对整形或者浮点型参数进行离散化或者取对数操作。
这里的step比较好理解,对于整型就是步长,对于float就是离散化程度(分箱)
log开始不是特别理解,查看了optuna的源码:
对于float:If log is true, the value is sampled from the range in the log domain.
Otherwise, the value is sampled from the range in the linear domain.
还是很懵逼,看看numpy里面是怎么搞的,numpy里面有三种抽样方式:
logspace
Similar to geomspace, but with endpoints specified using log and base.
linspace
Similar to geomspace, but with arithmetic instead of geometric progression.
geomspace
Similar to logspace, but with endpoints specified directly.
举个例子比较直观:
np.linspace(0.02, 2.0, num=20)
np.geomspace(0.02, 2.0, num=20)
np.logspace(0.02, 2.0, num=20)
linspace是一列等差数列,
[ 0.02 0.12421053 0.22842105 0.33263158 0.43684211 0.54105263
0.64526316 0.74947368 0.85368421 0.95789474 1.06210526 1.16631579
1.27052632 1.37473684 1.47894737 1.58315789 1.68736842 1.79157895
1.89578947 2. ]
geomspace是一列等比数列
[0.02 , 0.0254855 , 0.03247553, 0.04138276, 0.05273302,
0.06719637, 0.08562665, 0.1091119 , 0.13903856, 0.17717336,
0.22576758, 0.28768998, 0.36659614, 0.46714429, 0.59527029,
0.75853804, 0.96658605, 1.23169642, 1.56951994, 2.]
logspace会计算默认计算一个 b a s e s t a r t base^{start} basestart和 b a s e e n d base^{end} baseend, base默认为10,计算了start和end
s t a r t = 1 0 0.02 = 1.047 , e n d = 1 0 2 = 100. start=10^{0.02} =1.047, end=10^{2} =100. start=100.02=1.047,end=102=100.
[ 1.04712855 1.33109952 1.69208062 2.15095626 2.73427446
3.47578281 4.41838095 5.61660244 7.13976982 9.07600522
11.53732863 14.66613875 18.64345144 23.69937223 30.12640904
38.29639507 48.68200101 61.88408121 78.6664358 100. ]
代码示例:
import optuna
def objective(trial):
# Categorical parameter
optimizer = trial.suggest_categorical("optimizer", ["MomentumSGD", "Adam"])
# Integer parameter
num_layers = trial.suggest_int("num_layers", 1, 3)
# Integer parameter (log)
num_channels = trial.suggest_int("num_channels", 32, 512, log=True)
# Integer parameter (discretized)
num_units = trial.suggest_int("num_units", 10, 100, step=5)
# Floating point parameter
dropout_rate = trial.suggest_float("dropout_rate", 0.0, 1.0)
# Floating point parameter (log)
learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
# Floating point parameter (discretized)
drop_path_rate = trial.suggest_float("drop_path_rate", 0.0, 1.0, step=0.1)
在 Optuna 中,我们使用和 Python 语法类似的方式来定义搜索空间,其中包含条件和循环语句。
类似地,你也可以根据参数值采用分支或者循环。
# 分支
import sklearn.ensemble
import sklearn.svm
def objective(trial):
classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
if classifier_name == "SVC":
svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
classifier_obj = sklearn.svm.SVC(C=svc_c)
else:
rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)
# 循环
import torch
import torch.nn as nn
def create_model(trial, in_size):
n_layers = trial.suggest_int("n_layers", 1, 3)
layers = []
for i in range(n_layers):
n_units = trial.suggest_int("n_units_l{}".format(i), 4, 128, log=True)
layers.append(nn.Linear(in_size, n_units))
layers.append(nn.ReLU())
in_size = n_units
layers.append(nn.Linear(in_size, 10))
return nn.Sequential(*layers)
关于参数个数的注意事项
随着参数个数的增长,优化的难度约呈指数增长。也就是说,当你增加参数的个数的时候,优化所需要的 trial 个数会呈指数增长。因此我们不推荐增加不必要的参数。
Reference:
1.官网
2.github examples
3.Difference in output between numpy linspace and numpy logspace
4.np.geomspace