高斯,为什么总是高斯分布?

前段时间的group reading 上,一个senior lecturer发出上述疑问尴尬 我想不光是她,很多人都有或有过类似的想法。正好research gate上有这样的问答,我觉得解释的很清楚了---尽管很简短


If I have a dataset, how can I prove that this dataset has a nature of Gaussian Mixture Model GMM?

One reason why GMMs are used without asking the question if the data is GMM distributed lies in the fact that a GMM is a universal function approximator. That is, whatever the original distribution of the data was, when allowing a significant number of mixture components, it is expected that the GMM approaches the true distribution.


As Marco Huber said, GMMs are flexible approximators of density functions (not for general functions --> in this case you need to look at the very similar concept of Gaussian radial basis function networks; but these have no normalization constraints as in densities). The question is not if the density that you would like to model with a GMM originally stems from a GMM. It is more a question of how accurate you want to model it by using a general modeling tool like GMMs. The more components you use, the more accurate your approximation will usually be. There are also concepts to automatically learn the number of components: See e.g. "The Variational Approximation for Bayesian Inference" by Dimitris G. Tzikas et al.


基本意思就是,就算产生实验数据的真实分布不是高斯分布,也可以用足够多的不同幅值的高斯混合分量去无限制的逼近原真实分布。

想想傅立叶变换或傅立叶级数的原理和应用

注意一点上面提到的,如果不是概率分布的情况,要用类似的高斯径向基网络来模拟逼近任意真实函数(不需要归一化)

你可能感兴趣的:(高斯,为什么总是高斯分布?)