GridSearch & Kfold & cross validation

what’s cross validation?

Cross-validation is a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set. Cross-validation is largely used in settings where the target is prediction and it is necessary to estimate the accuracy of the performance of a predictive model. The prime reason for the use of cross-validation rather than conventional validation is that there is not enough data available for partitioning them into separate training and test sets (as in conventional validation). This results in a loss of testing and modeling capability.

Cross-validation is also known as rotation estimation.

summary of cross validation

  • generate a data set based on statistical analysis
  • cross-validation for evaluation the model effectively.
  • not enough data

what’s the grid search?

Grid Search for Parameter Selection.

kfold?

kfold is the method to split the data into k folds.

what’s the role of training/validate/test ?

GridSearch & Kfold & cross validation_第1张图片
about the validate set, will take advantage of kfold and cross-validation technology.

practice

	kfold = KFold(n_splits=10)
	
	parameters = {"max_depth":[1,3,5,15,None], "criterion":["gini","entropy"],"splitter":["random","best"]}
	
	scoring_fnc = make_scorer(accuracy_score)
	print("parameters:", parameters)
	grid = GridSearchCV(classifier, parameters, scoring_fnc, cv=kfold)
	grid = grid.fit(X_train, y_train)
	reg = grid.best_estimator_
	print('best score: %f'%grid.best_score_)
	print('best parameters:')
	for key in parameters.keys():
	    print('\t%s: %s'%(key, reg.get_params()[key]))
	print('test score: %f'%reg.score(X_test, y_test))

the code is here.

summary

  1. grid-search functions as finding the best parameters
  2. cv is used for the evaluting the model fully if the data is not enough
  3. grid-search, cv are accompied by.

GridSearch & Kfold & cross validation_第2张图片
GridSearch & Kfold & cross validation_第3张图片

reference

  1. https://amueller.github.io/ml-training-intro/slides/03-cross-validation-grid-search.html#1

  2. https://stackabuse.com/cross-validation-and-grid-search-for-model-selection-in-python/

  3. https://towardsdatascience.com/why-and-how-to-cross-validate-a-model-d6424b45261f

你可能感兴趣的:(数据挖掘,基础知识,机器学习)