1)swig安装,macos下用brew装一下即可。
2)内存调整,autosklearn的model fit对内存要求较高,把limit调到300000,否则报错。
1)数据加载及库引入,还是用iris数据集
"""
Created on Sat Apr 16 15:26:21 2022
@author: johnny
"""
import autosklearn
from autosklearn.classification import AutoSklearnClassifier
from sklearn.metrics import accuracy_score
#数据引入
from sklearn.datasets import load_iris
data = load_iris()
x = data.data
y = data.target
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y = train_test_split(x,y,test_size=0.3,random_state=0)
2)模型配置
# define search
model = AutoSklearnClassifier(memory_limit=300000,
time_left_for_this_task=5*60,
per_run_time_limit=50,
n_jobs=4,
tmp_folder='/Users/johnny/Downloads/CreditMaster/temp/autosklearn_classification_example_tmp')
# perform the search
import time
start = time.time()
model.fit(train_x, train_y)
end = time.time()
print (str(end-start))
time_left_for_this_task意思是此任务的最长时间,并为其分配5分钟。如果没有为此参数指定任何内容,则该过程将运行一个小时。
per_run_time_limit
参数将分配给每个模型评估的时间设置为 50 秒。
ensemble_size
、initial_configurations_via_metalearning
,可用于微调分类器。默认情况下,上述搜索命令会创建一组表现最佳的模型。为了避免过度拟合,我们可以通过更改设置ensemble_size = 1
和initial_configurations_via_metalearning = 0
来禁用它。
3)运行效果
运行时间:311.1847469806671
#搜索最佳性能摘要
print(model.sprint_statistics())
#模型排行榜
print(model.leaderboard())
print(model.sprint_statistics())
auto-sklearn results:
Dataset name: 2b87dd22-be00-11ec-83d9-acde48001122
Metric: accuracy
Best validation score: 0.971429
Number of target algorithm runs: 57
Number of successful target algorithm runs: 57
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 0
Number of target algorithms that exceeded the memory limit: 0
print(model.leaderboard())
rank ensemble_weight type cost duration
model_id
56 1 0.12 lda 0.028571 6.529774
55 2 0.06 qda 0.028571 6.576634
43 3 0.06 liblinear_svc 0.057143 4.191880
27 4 0.04 qda 0.057143 2.138941
53 5 0.04 k_nearest_neighbors 0.085714 5.523627
52 6 0.12 qda 0.085714 4.785708
6 7 0.02 random_forest 0.085714 6.293557
9 8 0.04 libsvm_svc 0.085714 3.043304
48 9 0.06 liblinear_svc 0.114286 4.556017
45 10 0.04 liblinear_svc 0.114286 5.430604
35 11 0.02 lda 0.114286 0.922487
3 12 0.02 random_forest 0.114286 5.901496
23 13 0.02 passive_aggressive 0.114286 1.694792
20 14 0.02 extra_trees 0.114286 2.763696
13 15 0.04 mlp 0.114286 4.000428
5 16 0.08 passive_aggressive 0.114286 2.807672
4 17 0.02 random_forest 0.114286 5.275651
28 18 0.04 liblinear_svc 0.114286 2.893949
34 19 0.02 extra_trees 0.114286 5.547182
49 20 0.04 liblinear_svc 0.142857 5.844000
46 21 0.08 adaboost 0.171429 5.250963
表示一共跑了57个模型,最优得分0.97。
m1_acc_score= accuracy_score(test_y, y_pred)
m1_acc_score
from sklearn.metrics import confusion_matrix, accuracy_score
import seaborn as sns
y_pred= model.predict(test_x)
conf_matrix= confusion_matrix(y_pred, test_y)
sns.heatmap(conf_matrix, annot=True)
是一种综合集成的自动机器学习方法,用来偷懒挺合适。