Scikit-learn:模型选择之调参grid search

http://blog.csdn.net/pipisorry/article/details/52268947

Scikit-learn:并行调参Grid Search

Grid Search: Searching for estimator parameters

scikit-learn中提供了pipeline(for estimator connection) & grid_search(searching best parameters)进行并行调参

如使用scikit-learn做文本分类时:vectorizer取多少个word呢?预处理时候要过滤掉tf>max_df的words,max_df设多少呢?tfidftransformer只用tf还是加idf呢?classifier分类时迭代几次?学习率怎么设? “循环一个个试”,这就是grid search要做的基本东西。

某小皮



调整模型的超参数

Hyper-parameters are parameters that are not directly learnt within estimators.In scikit-learn they are passed as arguments to the constructor of theestimator classes.

It is possible and recommended to search the hyper-parameter space for the best Cross-validation: evaluating estimator performance score.

Any parameter provided when constructing an estimator may be optimized in thismanner. Specifically, to find the names and current values for all parametersfor a given estimator, use:

estimator.get_params()

A search consists of:

  • an estimator (regressor or classifier such as sklearn.svm.SVC());
  • a parameter space;
  • a method for searching or sampling candidates;
  • a cross-validation scheme; and
  • a score function.
GridSearchCV exhaustively considersall parameter combinations, while RandomizedSearchCV can sample agiven number of candidates from a parameter space with a specifieddistribution.

穷尽网格搜索GridSearchCV

Gird Search:具体说,就是每种参数确定好几个要尝试的值,然后像一个网格一样,把所有参数值的组合遍历一下。优点是实现简单暴力,如果能全部遍历的话,结果比较可靠。缺点是太费时间了,特别像神经网络,一般尝试不了太多的参数组合。

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

最好的实例 Nested versus non-nested cross-validationfor an example of Grid Search within a cross validation loop on the irisdataset

随机参数优化RandomizedSearchCV

Random Search:先用Gird Search的方法,得到所有候选参数,然后每次从中随机选择进行训练。

sklearn.model_selection.RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score='raise', return_train_score=True)

two main benefits over an exhaustive search:

  • A budget can be chosen independent of the number of parameters and possible values.
  • Adding parameters that do not influence the performance does not decrease efficiency.
{'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
  'kernel': ['rbf'], 'class_weight':['balanced', None]}
In principle, any function can be passed that provides a rvs (randomvariate sample) method to sample a value.

实例Comparing randomized search and grid search for hyperparameter estimation compares the usage and efficiencyof randomized search and grid search.

参数搜索的tips

Specifying an objective metric

[机器学习模型的评价指标和方法 ]

Composite estimators and parameter spaces

estimator类必须有的方法是有:get_params, set_params(**params), fit(x,y), predict(new_samples), score(x, y_true)。其中有的可以直接从from sklearn.base import BaseEstimator中继承。

使用pipline方法 Pipeline: chaining estimators

Model selection: development and evaluation

使用验证集(也就是开发集吧)来进行模型选择,输入到grid_search中。development set (tobe fed to the GridSearchCV instance)

Parallelism

GridSearchCV and RandomizedSearchCV 都是并行运行的,by using the keyword n_jobs=-1.

Robustness to failure

参数输入后模型出错会导致整个grid serach失败,但是可以通过Setting error_score=0(or =np.NaN)来解决。失败的issuing awarning and setting the score for that fold to 0 (or NaN)。

调参参数的选择

[机器学习模型选择:调参参数选择 ]

GridSearch流程图

Scikit-learn:模型选择之调参grid search_第1张图片

某小皮


Alternatives to brute force parameter search

特定模型的交叉验证

...

信息准则

Some models can offer an information-theoretic closed-form formula of theoptimal estimate of the regularization parameter by computing a singleregularization path (instead of several when using cross-validation).

Here is the list of models benefitting from the Aikike InformationCriterion (AIC) or the Bayesian Information Criterion (BIC) for automatedmodel selection:

linear_model.LassoLarsIC([criterion, ...]) Lasso model fit with Lars using BIC or AIC for model selection

可以参考prml。

Out of Bag Estimates

集成方法因为有数据抽样,多余的可以直接用于模型选择,而不需要额外独立的验证集。This left out portion can be used to estimate the generalization errorwithout having to rely on a separate validation set. This estimatecomes “for free” as no additional data is needed and can be used formodel selection.

ensemble.RandomForestClassifier([...]) A random forest classifier.
ensemble.RandomForestRegressor([...]) A random forest regressor.
ensemble.ExtraTreesClassifier([...]) An extra-trees classifier.
ensemble.ExtraTreesRegressor([n_estimators, ...]) An extra-trees regressor.
ensemble.GradientBoostingClassifier([loss, ...]) Gradient Boosting for classification.
ensemble.GradientBoostingRegressor([loss, ...]) Gradient Boosting for regression.

贝叶斯优化Bayesian Optimization

考虑到了不同参数对应的实验结果值,因此更节省时间。和网络搜索相比简直就是老牛和跑车的区别。具体原理可以参考这个论文: Practical Bayesian Optimization of Machine Learning Algorithms ,这里同时推荐两个实现了贝叶斯调参的Python库,可以上手即用:

  • jaberg/hyperopt, 比较简单。
  • fmfn/BayesianOptimization, 比较复杂,支持并行调参。

GrideSearch示例

[Auto-scaling scikit-learn with Apache Spark]

from: http://blog.csdn.net/pipisorry/article/details/52268947

ref: [3.2. Tuning the hyper-parameters of an estimator]*

[python并行调参——scikit-learn grid_search]*

[Parameter estimation using grid search with cross-validation*]

参数资料
Practical recommendations for gradient-based training of deep architectures by Yoshua Bengio (2012)
Efficient BackProp, by Yann LeCun, Léon Bottou, Genevieve Orr and Klaus-Robert Müller
Neural Networks: Tricks of the Trade, edited by Grégoire Montavon, Geneviève Orr, and Klaus-Robert Müller.


你可能感兴趣的:(Scikit-Learn)