Probabilistic Model Selection with AIC, BIC, and MDL

1. Introduction

Model selection is the problem of choosing one from among a set of candidate models. An approach to model selection involves using probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model.

Examples include the Akaike and Bayesian Information Criterion and the Minimum Description Length. The benefit of these information criterion statistics is that they do not require a hold-out test set, although a limitation is that they do not take the uncertainty of the models into account. 

In this tutorial, you will know:

    - Model selection is the challenge of choosing one among a set of candidate models.

    - Akaike and Bayesian Information Criterion are two ways of scoring a model based on its log-likelihood and complexity.

    - Minimum Description Length provides another scoring method from information theory that can be shown to be equivalent to BIC.

 

2. The Challenge of Model Selection 

Model selection is the process of fitting multiple models on a given dataset and choosing one over all others.

"Model selection: estimating the performance of different models in order to choose the best one."

This may apply in unsupervised learning, e.g., choosing a clustering model, or supervised learning, e.g., choosing a predictive model for a regression or classification task. There are many common approaches that may be used for model selection. For example, in the case of supervised learning, the three most common approaches are:

    - Train, validate, and test datasets;

    - Resampling methods;

    - Probabilistic statistics.

A third approach to model selection attempts to combine the complexity of the model with the performance of the model into a score.

 

3. Probabilistic Model Selection

Probabilistic model selection (or information criteria) provides an an anlytical technique for scoring and chossing among candidate models. Models are scored both on their performance on the training dataset and based on the complexity of the model.

    - Model performance. How well a candidate model has performed on the training datasets.

    - Model complexity. How complicated the trained candidate model is after training.

Model performance may be evaluated using a probabilistic framework, such as log-likelihood under the framework of maximum likelihood estimation. Model complexity may be evaluated as the number of degrees of freedom or parameters in the model.

 

4. Akaike Information Criterion

Akaike infomration criterion (AIC) is derived from a frequentist framework and defined generally for logistic regression as:

                                            AIC = - \frac{2}{N} \times LL + 2 \times \frac{k}{N}

where N is the number of examples in the training datasets, LL is the log-likelihood of the model on the training datsets, and k is the number of parameters in the model.

def calculate_aic(n, mse, num_params):
    aic = n * log(mse) + 2 * num_params
    return aic

 

5. Bayesain Information Criterion 

                                          BIC = -2 \times LL + log(N) \times k

Unlike AIC, BIC penalizes the model more for its complexity, meaning that more complex models will have a worse score.

def calculate_bic(n, mse, num_params):
    bic = n * log(mse) + num_params * log(n)
    return bic

 

6. Minimum Description Length

                                        MDL = - \rm log (P(\theta)) - log(P(y|X, \theta))

 

 

 

 

 

 

 

你可能感兴趣的:(Probabilistic Model Selection with AIC, BIC, and MDL)