DS Interview Question--Cross Validation

Q: What is cross validation? How to do it right?

A:

Cross-validation is a technique to evaluate predictive models and estimate how accurately it will perform in practice, by partitioning the original sample into a training set and a validation set.

K-fold CV Steps:

1. Split dataset into training dataset and test dataset;

2. Leave test dataset aside and partition training dataset equally into k set;

3. For k = 1,2,...,K, fit the model with (k-1) sets and calculate the test error rate with k-th set, repeat this step for k times;

4. Calculate the average of prediction errors calculated by validation dataset, and take it as the estimate of model performance;

5. Select the model with lowest prediction error and train the model on the whole training dataset.


Interview questions are from DataAppLab (Wechat: Datalaus)

Jun.28th, 2017  Seattle

你可能感兴趣的:(DS Interview Question--Cross Validation)