Understanding[1] 中讲 variational diffusion models(VDM)的 evidence lower bound(ELBO)推导时,(53) 式有一个容易引起误会的记号: … = E q ( x 1 : T ∣ x 0 ) [ log p ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) + log ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t ∣ x 0 ) q ( x t − 1 ∣ x 0 ) ] ( 53 ) = E q ( x 1 : T ∣ x 0 ) [ log p ( x T ) p θ ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) + log q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) + log ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] ( 54 ) \begin{aligned} \dots &= \mathbb{E}_{q\left(\boldsymbol{x}_{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}_T\right) p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}_1\right)}{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}+\log \prod_{t=2}^T \frac{p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t\right)}{\frac{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right) \cancel{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)}}{\cancel{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_0\right)}} }\right] & (53) \\ &= \mathbb{E}_{q\left(\boldsymbol{x}_{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}_T\right) p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}_1\right)}{\cancel{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}}+\log \frac{\cancel{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}}{q\left(\boldsymbol{x}_T \mid \boldsymbol{x}_0\right)}+\log \prod_{t=2}^T \frac{p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t\right)}{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] &(54) \end{aligned} …=Eq(x1:T∣x0) logq(x1∣x0)p(xT)pθ(x0∣x1)+logt=2∏Tq(xt−1∣x0) q(xt−1∣xt,x0)q(xt∣x0) pθ(xt−1∣xt) =Eq(x1:T∣x0)[logq(x1∣x0) p(xT)pθ(x0∣x1)+logq(xT∣x0)q(x1∣x0) +logt=2∏Tq(xt−1∣xt,x0)pθ(xt−1∣xt)](53)(54)
其中 (53) 式第二项看起来像是消项,但显然两者并不相等!而明显 (54) 中第二项显然非零,却不知从何冒出来。听 [2] 讲到「拆开」、看 [3] 的评论问答才知道:这个推导是将 (53) 式「划掉」的部分拿出来单独放在一项 log \log log 中,而这一项内部可以消项,消剩的就是 (54) 式中的第二项,即: (53)第二项 = log ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t ∣ x 0 ) q ( x t − 1 ∣ x 0 ) = log ∏ t = 2 T [ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ] = log ∏ t = 2 T p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⏟ (54) 第三项 + log ∏ t = 2 T q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) = log [ q ( x 1 ∣ x 0 ) q ( x 2 ∣ x 0 ) × q ( x 2 ∣ x 0 ) q ( x 3 ∣ x 0 ) × ⋯ × q ( x T − 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] + (54) 第三项 = log q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ⏟ (54) 第二项 + (54) 第三项 \begin{aligned} \text{(53)第二项} &= \log \prod_{t=2}^T \frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{\frac{q\left(x_{t-1} \mid x_t, x_0\right)q\left(x_t \mid x_0\right)}{q\left(x_{t-1} \mid x_0\right)} } \\ &= \log \prod_{t=2}^T \left[\frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{q\left(x_{t-1} \mid x_t, x_0\right)} \cdot \frac{q\left(x_{t-1} \mid x_0\right)}{q\left(x_t \mid x_0\right)}\right] \\ &= \underbrace{\log \prod_{t=2}^T \frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{q\left(x_{t-1} \mid x_t, x_0\right)}}_{\text{(54) 第三项}} + \log \prod_{t=2}^T \frac{q\left(x_{t-1} \mid x_0\right)}{q\left(x_t \mid x_0\right)} \\ &= \log \left[ \frac{q\left(x_1 \mid x_0\right)}{\cancel{q\left(x_2 \mid x_0\right)}} \times \frac{\cancel{q\left(x_2 \mid x_0\right)}}{\cancel{q\left(x_3 \mid x_0\right)}} \times \cdots \times \frac{\cancel{q\left(x_{T-1} \mid x_0\right)}}{q\left(x_T \mid x_0\right)}\right] + \text{(54) 第三项} \\ &= \underbrace{\log \frac{q\left(x_1 \mid x_0\right)}{q\left(x_T \mid x_0\right)}}_{\text{(54) 第二项}} + \text{(54) 第三项} \end{aligned} (53)第二项=logt=2∏Tq(xt−1∣x0)q(xt−1∣xt,x0)q(xt∣x0)pθ(xt−1∣xt)=logt=2∏T[q(xt−1∣xt,x0)pθ(xt−1∣xt)⋅q(xt∣x0)q(xt−1∣x0)]=(54) 第三项 logt=2∏Tq(xt−1∣xt,x0)pθ(xt−1∣xt)+logt=2∏Tq(xt∣x0)q(xt−1∣x0)=log[q(x2∣x0) q(x1∣x0)×q(x3∣x0) q(x2∣x0) ×⋯×q(xT∣x0)q(xT−1∣x0) ]+(54) 第三项=(54) 第二项 logq(xT∣x0)q(x1∣x0)+(54) 第三项
DDIM[7] 在其第 2 节 background 回顾 DDPM[5] 时,(3) 式 q ( x t ∣ x t − 1 ) q(x_t|x_{t-1}) q(xt∣xt−1) 的形式与 Understanding[1] 的 (31) 式、DDPM 的 (2) 式都不同,但对照其 (3) 式下面的公式(无标号那条) 与 Understanding 的 (70)、DDPM 的 (4) 可知:DDIM 中的 α t \alpha_t αt 其实对应 Understanding / DDPM 中的 α ˉ t \bar{\alpha}_t αˉt。其 appendix C.2 也有明确讲到这点。