1. 条件概率: p ( X = x ∣ Y = y ) p(X=x\vert Y=y) p(X=x∣Y=y)是在已知 Y = y Y=y Y=y的条件下,计算 X = x X=x X=x的概率。
p ( x ∣ y ) = p ( x , y ) p ( y ) p ( x , y ) = p ( x ∣ y ) p ( y ) = p ( y ∣ x ) p ( x ) \begin{equation} p(x|y) = \frac{p(x,y)}{p(y)}\\ p(x,y) = p(x|y)p(y) = p(y|x)p(x) \end{equation} p(x∣y)=p(y)p(x,y)p(x,y)=p(x∣y)p(y)=p(y∣x)p(x)
如果 x x x和 y y y相互独立,则:
p ( x ∣ y ) = p ( x ) \begin{equation} p(x|y)=p(x) \end{equation} p(x∣y)=p(x)
多个变量情况下:
P ( x , y , z ) = P ( z ∣ y , x ) P ( y , x ) = P ( z ∣ y , x ) P ( y ∣ x ) P ( x ) P ( y , z ∣ x ) = P ( x , y , z ) P ( x ) = P ( y ∣ x ) P ( z ∣ x , y ) \begin{equation} \begin{aligned} P(x,y,z)=P(z|y,x)P(y,x)=P(z|y,x)P(y|x)P(x)\\ P(y,z|x)= \frac{P(x,y,z)}{P(x)}=P(y|x)P(z|x,y) \end{aligned} \end{equation} P(x,y,z)=P(z∣y,x)P(y,x)=P(z∣y,x)P(y∣x)P(x)P(y,z∣x)=P(x)P(x,y,z)=P(y∣x)P(z∣x,y)
2.基于马尔可夫假设的条件概率:
如果满足马尔科夫链关系 A − > B − > C A->B->C A−>B−>C,那么有
P ( x , y , z ) = P ( z ∣ y , x ) P ( y , x ) = P ( z ∣ y ) P ( y ∣ x ) P ( x ) P ( y , z ∣ x ) = P ( y ∣ x ) P ( z ∣ y ) \begin{equation} \begin{aligned} P(x,y,z)=&P(z|y,x)P(y,x)=P(z|y)P(y|x)P(x)\\ &P(y,z|x)=P(y|x)P(z|y) \end{aligned} \end{equation} P(x,y,z)=P(z∣y,x)P(y,x)=P(z∣y)P(y∣x)P(x)P(y,z∣x)=P(y∣x)P(z∣y)
3.全概率公式:
离散情况下:
p ( x ) = ∑ y p ( x , y ) = ∑ y p ( x ∣ y ) p ( y ) \begin{equation} p(x) = \sum_yp(x,y)=\sum_yp(x|y)p(y) \end{equation} p(x)=y∑p(x,y)=y∑p(x∣y)p(y)
连续情况下:
p ( x ) = ∫ p ( x , y ) d y = ∫ p ( x ∣ y ) p ( y ) d y \begin{equation} p(x)=\int p(x, y) d y=\int p(x|y) p(y) d y \end{equation} p(x)=∫p(x,y)dy=∫p(x∣y)p(y)dy
4.贝叶斯公式:
基于条件概率公式和全概率公式,可以推导出贝叶斯公式:
P ( x , y ) = P ( x ∣ y ) P ( y ) = P ( y ∣ x ) P ( x ) P ( x ∣ y ) = P ( y ∣ x ) P ( x ) P ( y ) = c a u s a l k n o w l e d g e ⋅ p r i o r k n o w l e d g e p r i o r k n o w l e d g e \begin{aligned} P(x,y)&=P(x|y)P(y)=P(y|x)P(x)\\ P(x|y)&=\frac{P(y|x)P(x)}{P(y)}=\frac{causal\;knowledge \cdot\;prior\;knowledge}{prior \; knowledge} \end{aligned} P(x,y)P(x∣y)=P(x∣y)P(y)=P(y∣x)P(x)=P(y)P(y∣x)P(x)=priorknowledgecausalknowledge⋅priorknowledge
当利用多种信息对一个状态进行猜测和推理时:
P ( x ∣ y , z ) = P ( x , y , z ) P ( y , z ) = P ( y ∣ x , z ) P ( x , z ) P ( y ∣ z ) P ( z ) = P ( y ∣ x , z ) P ( x ∣ z ) P ( z ) P ( y ∣ z ) P ( z ) = P ( y ∣ x , z ) P ( x ∣ z ) P ( y ∣ z ) \begin{equation} \begin{aligned} P(x|y,z)&=\frac{P(x,y,z)}{P(y,z)}\\ &=\frac{P(y|x,z)\;P(x,z)}{P(y|z)\;P(z)}\\ &=\frac{P(y|x,z)\;P(x|z)\;P(z)}{P(y|z)\;P(z)}\\ &=\frac{P(y|x,z)\;P(x|z)}{P(y|z)} \\ \end{aligned} \end{equation} P(x∣y,z)=P(y,z)P(x,y,z)=P(y∣z)P(z)P(y∣x,z)P(x,z)=P(y∣z)P(z)P(y∣x,z)P(x∣z)P(z)=P(y∣z)P(y∣x,z)P(x∣z)
若希望从高斯分布 N ( μ , σ ) N(\mu,\sigma) N(μ,σ) 中采样,可以先从标准正态分布 N ( 0 , I ) N(0,\mathbf{I}) N(0,I) 中采样出 z z z ,再得到 σ ∗ z + μ \sigma * z + \mu σ∗z+μ 。这样做的好处是将随机性转移到了 z z z这个常量上,而 σ \sigma σ和 μ \mu μ则是仿射变换网络的一部分。
1.标准高斯分布概率分布:
N ( x ; μ , σ 2 ) = 1 2 π σ exp ( − ( x − μ ) 2 2 σ 2 ) \begin{equation} \mathcal{N}\left(x ; \mu, \sigma^{2}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right) \end{equation} N(x;μ,σ2)=2πσ1exp(−2σ2(x−μ)2)
2.KL散度:
D KL ( q ( x ) ∣ ∣ p ( x ) ) = E q ( x ) log [ q ( x ) / p ( x ) ] \begin{equation} D_{\text{KL}}(q(x) || p(x)) = \mathbb{E}_{q(x)} \log [q(x) / p(x)] \end{equation} DKL(q(x)∣∣p(x))=Eq(x)log[q(x)/p(x)]
当 q ( x ) q(x) q(x)和 p ( x ) p(x) p(x)均为高斯分布的时候,KL散度具有封闭形式的公式:
D KL ( N ( μ 1 , σ 1 2 ) ∣ ∣ N ( μ 2 , σ 2 2 ) ) = log σ 2 σ 1 + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 − 1 2 \begin{equation} \begin{aligned} & D_{\text{KL}}(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) = \log\frac{\sigma_2}{\sigma_1} + \frac{{\sigma_1^2} + (\mu_1 - \mu_2)^2}{2\sigma_2^2} - \frac{1}{2} \end{aligned} \end{equation} DKL(N(μ1,σ12)∣∣N(μ2,σ22))=logσ1σ2+2σ22σ12+(μ1−μ2)2−21
以上是在Diffusion Model(DDPM)推到过程中会用到的公式。
References:
deep_thoughts bilibili