
前段时间的group reading 上,一个senior lecturer发出上述疑问尴尬 我想不光是她,很多人都有或有过类似的想法。正好research gate上有这样的问答,我觉得解释的很清楚了---尽管很简短

If I have a dataset, how can I prove that this dataset has a nature of Gaussian Mixture Model GMM?

One reason why GMMs are used without asking the question if the data is GMM distributed lies in the fact that a GMM is a universal function approximator. That is, whatever the original distribution of the data was, when allowing a significant number of mixture components, it is expected that the GMM approaches the true distribution.

As Marco Huber said, GMMs are flexible approximators of density functions (not for general functions --> in this case you need to look at the very similar concept of Gaussian radial basis function networks; but these have no normalization constraints as in densities). The question is not if the density that you would like to model with a GMM originally stems from a GMM. It is more a question of how accurate you want to model it by using a general modeling tool like GMMs. The more components you use, the more accurate your approximation will usually be. There are also concepts to automatically learn the number of components: See e.g. "The Variational Approximation for Bayesian Inference" by Dimitris G. Tzikas et al.



