Pinsker’s inequality 与 Kullback-Leibler (KL) divergence / KL散度

文章目录

  • Pinsker’s inequality
  • Kullback-Leibler (KL) divergence
    • KL散度在matlab中的计算
  • KL散度在隐蔽通信概率推导中的应用

Pinsker’s inequality

Pinsker’s Inequality是信息论中的一个不等式,通常用于量化两个概率分布之间的差异。这个不等式是由苏联数学家Mark Pinsker于1964年提出的。

考虑两个概率分布 (P) 和 (Q) 在同一样本空间上的概率密度函数,Pinsker’s Inequality可以表示为:

[ D KL ( P ∥ Q ) ≥ 1 2 ( ∫ ( p ( x ) − q ( x ) ) 2   d x ) 2 D_{\text{KL}}(P \parallel Q) \geq \frac{1}{2} \left(\int \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \, dx\right)^2 DKL(PQ)21((p(x) q(x) )2dx)2 ]

其中:

  • ( D KL ( P ∥ Q ) D_{\text{KL}}(P \parallel Q) DKL(PQ)) 是P和Q之间的 K u l l b a c k − L e i b l e r Kullback-Leibler KullbackLeibler散度,表示两个概率分布之间的差异。
  • ( p ( x ) p(x) p(x)) 和 ( q ( x ) q(x) q(x)) 分别是P和Q在样本点 ( x x x) 处的概率密度函数。

Pinsker’s Inequality表明,KL散度的平方根下界是两个概率分布在L2范数(平方积分的平方根)上的差异。这个不等式在信息论和统计学中有广泛的应用,用于量化概率分布之间的距离。

Kullback-Leibler (KL) divergence

KL散度(Kullback-Leibler散度),也称为相对熵,是一种用于衡量两个概率分布之间差异的指标。给定两个概率分布 ( P P P) 和 ( Q Q Q),KL散度的定义如下:

[ D KL ( P ∥ Q ) = ∫ P ( x ) log ⁡ ( P ( x ) Q ( x ) )   d x D_{\text{KL}}(P \parallel Q) = \int P(x) \log\left(\frac{P(x)}{Q(x)}\right) \,dx DKL(PQ)=P(x)log(Q(x)P(x))dx ]

这个积分表示在样本空间上对 (P) 的每个事件的概率进行加权,权重是 ( P P P) 对应事件的概率,然后乘以 ( P P P) 和 ( Q Q Q) 概率比的自然对数。

KL散度有一些重要的性质:

  1. 非负性:( D KL ( P ∥ Q ) ≥ 0 D_{\text{KL}}(P \parallel Q) \geq 0 DKL(PQ)0),等号成立当且仅当 ( P P P) 和 ( Q Q Q) 在所有点上都相等。
  2. 不对称性:一般情况下,( D KL ( P ∥ Q ) ≠ D KL ( Q ∥ P ) D_{\text{KL}}(P \parallel Q) \neq D_{\text{KL}}(Q \parallel P) DKL(PQ)=DKL(QP))。它衡量了从 ( Q Q Q) 到 ( P P P) 的信息损失,和从 ( P P P) 到 ( Q Q Q) 的信息损失是不同的。
  3. 不满足三角不等式:( D KL ( P ∥ R ) ≰ D KL ( P ∥ Q ) + D KL ( Q ∥ R ) D_{\text{KL}}(P \parallel R) \nleq D_{\text{KL}}(P \parallel Q) + D_{\text{KL}}(Q \parallel R) DKL(PR)DKL(PQ)+DKL(QR))。这意味着KL散度不满足三角不等式,因此不能被解释为标准的距离度量。

KL散度的应用广泛,包括在信息论、统计学、机器学习等领域,例如在变分推断、最大似然估计和生成模型中。

KL散度在matlab中的计算

KL(Kullback-Leibler)散度是衡量两个概率分布之间差异的一种方法。在Matlab中,你可以使用kldiv函数来计算两个概率分布的KL散度。这个函数通常包含在Statistics and Machine Learning Toolbox中,因此你需要确保你的Matlab版本中包含了这个工具箱。

以下是一个简单的示例,演示如何使用kldiv函数计算两个离散概率分布之间的KL散度:

% 定义两个离散概率分布
P = [0.3, 0.4, 0.3]; % 第一个分布
Q = [0.5, 0.2, 0.3]; % 第二个分布

% 计算KL散度
kl_divergence = kldiv(P, Q);

% 显示结果
disp(['KL散度:', num2str(kl_divergence)]);

请确保你的Matlab环境中已经安装了Statistics and Machine Learning Toolbox,以便使用kldiv函数。如果没有安装,你可以通过MathWorks官方网站获取该工具箱或者使用其他方法计算KL散度,例如手动实现KL散度的计算公式。

KL散度在隐蔽通信概率推导中的应用

Robust Beamfocusing for FDA-Aided Near-Field
Covert Communications With Uncertain Location
2023 IEEE ICC

Let ( D w , θ w ) \left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right) (Dw,θw) denote the location of Willie. We assume Willie is synchronized with Alice with the full knowledge of the carrier frequencies, and the channel vector h H ( D w , θ w ) \mathbf{h}^{H}\left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right) hH(Dw,θw) . This is the worst case for legitimate nodes to analyze the lower bound of covert communications performance. The hypothesis test at Willie is given by

{ H 0 : y w ( n ) = z w ( n ) , H 1 : y w ( n ) = h w H w s ( n ) + z w ( n ) , \left\{\begin{array}{l} \mathcal{H}_{0}: y_{\mathrm{w}}^{(n)}=z_{\mathrm{w}}^{(n)}, \\ \mathcal{H}_{1}: y_{\mathrm{w}}^{(n)}=\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w} s^{(n)}+z_{\mathrm{w}}^{(n)}, \end{array}\right. {H0:yw(n)=zw(n),H1:yw(n)=hwHws(n)+zw(n),

where h w H \mathbf{h}_{\mathrm{w}}^{H} hwH is short for h H ( D w , θ w ) \mathbf{h}^{H}\left(D_{\mathrm{w}}, \theta_{\mathrm{w}}\right) hH(Dw,θw) , and z w ( n ) ∼ C N ( 0 , σ w 2 ) z_{\mathrm{w}}^{(n)} \sim \mathcal{C N}\left(0, \sigma_{\mathrm{w}}^{2}\right) zw(n)CN(0,σw2) is the AWGN at Willie with noise power σ w 2 \sigma_{\mathrm{w}}^{2} σw2 . From (5), the probability distribution functions (PDFs) of y w = [ y w ( 1 ) , y w ( 2 ) , … , y w ( N ) ] T \mathbf{y}_{\mathrm{w}}= \left[y_{\mathrm{w}}^{(1)}, y_{\mathrm{w}}^{(2)}, \ldots, y_{\mathrm{w}}^{(N)}\right]^{T} yw=[yw(1),yw(2),,yw(N)]T under H 0 \mathcal{H}_{0} H0 and H 1 \mathcal{H}_{1} H1 can be derived as

P 0 ≜ P ( y w ∣ H 0 ) = 1 π N σ w 2 N e − y w H y w σ w 2 (6) \mathbb{P}_{0} \triangleq \mathbb{P}\left(\mathbf{y}_{\mathrm{w}} \mid \mathcal{H}_{0}\right)=\frac{1}{\pi^{N} \sigma_{\mathrm{w}}^{2 N}} e^{-\frac{\mathbf{y}_{\mathrm{w}}^{H} \mathbf{y}_{\mathrm{w}}}{\sigma_{\mathrm{w}}^{2}}} \tag{6} P0P(ywH0)=πNσw2N1eσw2ywHyw(6)

and:

P 1 ≜ P ( y w ∣ H 1 ) = 1 π N ( ∣ h w H w ∣ 2 + σ w 2 ) N e − y w H y w ∣ h w H ∣ 2 + σ w 2 (7) \mathbb{P}_{1} \triangleq \mathbb{P}\left(\mathbf{y}_{\mathrm{w}} \mid \mathcal{H}_{1}\right)=\frac{1}{\pi^{N}\left(\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}+\sigma_{\mathrm{w}}^{2}\right)^{N}} e^{-\frac{\mathbf{y}_{\mathrm{w}}^{H} \mathbf{y}_{\mathrm{w}}}{\left|\mathbf{h}_{\mathrm{w}}^{H}\right|^{2}+\sigma_{\mathrm{w}}^{2}}} \tag{7} P1P(ywH1)=πN(hwHw2+σw2)N1ehwH2+σw2ywHyw(7)

respectively. Let D 0 \mathcal{D}_{0} D0 and D 1 \mathcal{D}_{1} D1 denote the decisions in favor of H 0 \mathcal{H}_{0} H0 and H 1 \mathcal{H}_{1} H1 , respectively. The false alarm and missed detection probabilities are defined as P F A ≜ P ( D 1 ∣ H 0 ) \mathbb{P}_{F A} \triangleq \mathbb{P}\left(\mathcal{D}_{1} \mid \mathcal{H}_{0}\right) PFAP(D1H0) and P M D ≜ P ( D 0 ∣ H 1 ) \mathbb{P}_{M D} \triangleq \mathbb{P}\left(\mathcal{D}_{0} \mid \mathcal{H}_{1}\right) PMDP(D0H1) , respectively. The detection performance of Willie is characterized by the sum of the detection error probabilities ξ = P F A + P M D \xi=\mathbb{P}_{F A}+\mathbb{P}_{M D} ξ=PFA+PMD . Under the optimal detection, ξ \xi ξ is minimized, which is denoted by ξ ∗ \xi^{*} ξ . Then the covertness constraint of the system is expressed as ξ ∗ ≜ P F A + P M D ≥ 1 − ϵ \xi^{*} \triangleq \mathbb{P}_{F A}+\mathbb{P}_{M D} \geq 1-\epsilon ξPFA+PMD1ϵ , where
ϵ ∈ [ 0 , 1 ] \epsilon \in[0,1] ϵ[0,1] is an arbitrarily small positive constant indicating the level of covertness. Smaller \epsilon corresponds to stricter covertness requirement. Specially, when ϵ = 0 \epsilon=0 ϵ=0 , we have ξ ∗ = 1 \xi^{*}=1 ξ=1 , which renders Willie’s detection to a blind guess. Moreover, according to Pinsker’s inequality [14], [15], we have ξ ∗ ≥ 1 − D ( P 1 ∥ P 0 ) 2 \xi^{*} \geq 1-\sqrt{\frac{\mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)}{2}} ξ12D(P1P0) , where D ( P 1 ∥ P 0 ) = ∫ y P 1 log ⁡ P 1 P 0   d y \mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)=\int_{\mathbf{y}} \mathbb{P}_{1} \log \frac{\mathbb{P}_{1}}{\mathbb{P}_{0}} \mathrm{~d} \mathbf{y} D(P1P0)=yP1logP0P1 dy is the Kullback-Leibler (KL) divergence of P 1 \mathbb{P}_{1} P1 and P 0 \mathbb{P}_{0} P0 . It can be easily verified that the original covertness constraint is satisfied as long as D ( P 1 ∥ P 0 ) ≤ 2 ϵ 2 \mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right) \leq 2 \epsilon^{2} D(P1P0)2ϵ2 . Furthermore, by substituting (6) and (7) into the expression of D ( P 1 ∥ P 0 ) \mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right) D(P1P0) , we have D ( P 1 ∥ P 0 ) = N ζ ( ∣ h w H w ∣ 2 σ w 2 ) \mathcal{D}\left(\mathbb{P}_{1} \| \mathbb{P}_{0}\right)=N \zeta\left(\frac{\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}}{\sigma_{\mathrm{w}}^{2}}\right) D(P1P0)=(σw2hwHw2) , where ζ ( x ) = x − log ⁡ ( 1 + x ) \zeta(x)=x-\log (1+x) ζ(x)=xlog(1+x) for x ≥ 0 x \geq 0 x0 is a monotonically increasing function w.r.t. x x x . Then the original covertness constraint can be simplified by

∣ h w H w ∣ 2 σ w 2 ≤ ζ − 1 ( 2 ϵ 2 N ) (8) \frac{\left|\mathbf{h}_{\mathrm{w}}^{H} \mathbf{w}\right|^{2}}{\sigma_{\mathrm{w}}^{2}} \leq \zeta^{-1}\left(\frac{2 \epsilon^{2}}{N}\right) \tag{8} σw2 hwHw 2ζ1(N2ϵ2)(8)

 where  ζ − 1 ( x )  is the inverse function of  ζ ( x ) .  \text { where } \zeta^{-1}(x) \text { is the inverse function of } \zeta(x) \text {. }  where ζ1(x) is the inverse function of ζ(x)

你可能感兴趣的:(笔记)