核方法回归

参考论文-DENSITY ESTIMATION FOR STATISTICS AND DATA ANALYSIS

给定数据集,来估计概率密度函数

 Histograms

核方法回归_第1张图片核方法回归_第2张图片

The naive estimator

也是分成段的平行x轴直线连接起来

核方法回归_第3张图片核方法回归_第4张图片核方法回归_第5张图片

The kernel estimator

其中kernel可以是高斯核,结果图:

核方法回归_第6张图片核方法回归_第7张图片核方法回归_第8张图片

可以见到,高斯核估计,就是把x轴分成多个段,再分别对每个区间进行高斯分布拟合,再加起来。

缺点: suffer from a slight drawback when applied to data from long-tailed distributions. Because the
window width is fixed across the entire sample, there is a tendency for spurious noise to appear in the tails of the estimates; if
the estimates are smoothed sufficiently to deal with this, then essential detail in the main part of the distribution is masked.

The nearest neighbour method

理解:就是你有100个数据集点,我按大小排序分成k份,然后分别对每一份进行拟合。

原理:While the naive estimator is based on the number of observations falling in a box of fixed width centred at the point of interest,the nearest neighbour estimate is inversely proportional to the size of the box needed to contain a given number of observations.In the tails of the distribution, the distance d k (t) will be larger than in the main part of the distribution, and so the problem of undersmoothing in the tails should be reduced.

缺点: In contrast to the kernel estimate, the nearest neighbour estimate will not itself be a probability density, since it will not integrate to unity.For t less than the smallest data point, we will have d k (t) = X (n-k+1) and for t > X (n) we will have d k (t) = t - X (n-k+1) . Substituting into (2.3), it follows that-(t)dt is infinite and that the tails of die away at rate t -1 , in other words extremely slowly. Thusthe nearest neighbour estimate is unlikely to be appropriate if an estimate of the entire density is required. Figure2.10 gives anearest neighbour density estimate for the Old Faithful data. The heavy tails and the discontinuities in the derivative are clear.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(神经网络技术)