计算KL距离的几个例子

原理

生成两个分布,并且生成它们的ksdensity, 和histogram, 最后计算ksdensity 和 histogram与真实分布的KL距离


真实分布是用normpdf计算出来的

计算KL距离的几个例子_第1张图片


Kernel Density, 默认宽度

计算KL距离的几个例子_第2张图片

Kernel Density, 默认宽度/2

计算KL距离的几个例子_第3张图片

Histogram

计算KL距离的几个例子_第4张图片

代码

clearvars
%generate random data
class_a = randn(30,1);
class_b = 5+randn(30,1);
x = [class_a; class_b];

%calculate the params for normpdf
mu_a = mean(class_a);
mu_b = mean(class_b);
sig_a = std(class_a);
sig_b = std(class_b);

testpoints = linspace(min(x), max(x));
%generate mix gaussians
p_mix = normpdf(testpoints,mu_a,sig_a)/2 + normpdf(testpoints,mu_b,sig_b)/2;

%calculate two kernel density
[p_ks_default,dum,width_default] = ksdensity(x,testpoints);
p_ks_half_default = ksdensity(x,testpoints,'bandwidth',width_default/2);

%calculate histogram probability
[c_hist,centers_hist] = hist(x,20);
p_hist = c_hist/60;%we have 60 data instances
p_hist = p_hist + 0.00001;%avoid all the zeros

%we need to generate true distribution vector over 20 instances provided by histogram
p_mix_20 = normpdf(centers_hist,mu_a,sig_a)/2 + normpdf(centers_hist,mu_b,sig_b)/2;


kld_ks_default = sum(p_mix .* log(p_mix ./ p_ks_default));
kld_ks_half_default = sum(p_mix .* log(p_mix ./ p_ks_half_default));
kld_histo = sum(p_mix_20 .* log(p_mix_20 ./ p_hist));

figure
plot(testpoints,p_mix);
title('True distribution');

figure
plot(testpoints,p_ks_default);
title(['Kernel density (default width), KLD = ' num2str(kld_ks_default)]);
figure
plot(testpoints,p_ks_half_default);
title(['Kernel density (half default width), KLD = ' num2str(kld_ks_half_default)]);

figure
hold on
hist(x,20);
%plot(centers_hist,p_mix_20);
%plot(centers_hist,p_hist);
title(['Histogram (20 bins), KLD = ' num2str(kld_histo)]);


你可能感兴趣的:(机器学习)