Automated(AutoML) Machine Learning 探索: TPOT文档阅读

http://epistasislab.github.io/tpot
花了半天时间探索自动机器学习工具包,主要探索了tpot,其他很著名的还有suto sklearn, datarobot(付费),还有基于java和图形界面的Auto-WEKA。更多见这里:
https://www.evget.com/article/2017/10/30/27128.html

  1. 概述:

    • 采用遗传算法,genetic programming generation
    • 相比于auto sklearn(mac安装还没成功哈哈),基于贝叶斯优化
  2. 怎么进行模型评价(通常我们需要多元的评估而不是单一标准)?

    • 使用交叉验证sklearn.model_selection.cross_val_score
    • 可以选择或者自定义评分方法,tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, scoring=sklearn.metrics.auc)
  3. 几个版本

    • 标准版本Default TPOT
    • light版本 tpot light:不会搜索所有,会推荐最简便和快速的版本
    • TPOT MDR:TPOT will search over a series of feature selectors and Multifactor Dimensionality Reduction models to find a series of operators that maximize prediction accuracy. The TPOT MDR configuration is specialized for genome-wide association studies (GWAS), and is described in detail online here. 专门为基因序列任务
    • TPOT sparse:专门应对稀疏矩阵
  4. API用法特别之处

    • 前面几个都是遗传算法的参数,
    • scoring: ‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’,’precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’ 当然也可以自己写
    • n_jobs: 支持并行处理
    • max_eval_time_mins:还可以限制总时间,防止太多代时间太久了哈哈哈哈
    • random_state: random seed
    • config_dict:版本,我发现light很好用,Possible inputs are:
Python dictionary, TPOT will use your custom configuration,
    string 'TPOT light', TPOT will use a built-in configuration with only fast models and preprocessors, or
    string 'TPOT MDR', TPOT will use a built-in configuration specialized for genomic studies, or
    string 'TPOT sparse': TPOT will use a configuration dictionary with a one-hot encoder and the operators normally included in TPOT that also support sparse matrices, or
    None, TPOT will use the default TPOTClassifier configuration.
  • Attributes
fitted_pipeline_:输出结果
pareto_front_fitted_pipelines_
evaluated_individuals_
  • Functions:
fit(features, classes[, sample_weight, groups]) Run the TPOT optimization process on the given training data.
predict(features)   Use the optimized pipeline to predict the classes for a feature set.
predict_proba(features) Use the optimized pipeline to estimate the class probabilities for a feature set.
score(testing_features, testing_classes)    Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
export(output_file_name)    Export the optimized pipeline as Python code.

案例:见网站

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
                      config_dict='TPOT light' )
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')

print(time.time()-s)

你可能感兴趣的:(机器学习)