Links: https://theaisummer.com/diffusion-models/
rvs:
( p ( x 0 , x 1 , ⋯ , x T ) , q ( x 1 , ⋯ , x T ∣ x 0 ) ) (p(x_0,x_1,\cdots,x_T),q(x_1,\cdots,x_{T}|x_0)) (p(x0,x1,⋯,xT),q(x1,⋯,xT∣x0))
where x 1 , ⋯ , x T x_1,\cdots,x_T x1,⋯,xT is unobservable, and
E L B O : = ∫ q ( x T ∣ x 0 ) log p ( x T ) q ( x T ∣ x 0 ) d x T + ∑ t = 2 T ∫ q ( x t − 1 , x t ∣ x 0 ) log p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) d x t − 1 x t + ∫ q ( x 1 ∣ x 0 ) log p ( x 1 ∣ x 0 ) d x 1 ELBO:=\int q(x_{T}|x_{0}) \log \frac{p(x_{T})}{q(x_{T}|x_{0})}\mathrm{d}x_{T}\\ +\sum_{t=2}^T \int q(x_{t-1},x_{t}|x_{0})\log \frac{p(x_{t-1}|x_{t})}{q(x_{t-1}|x_{t}, x_{0})}\mathrm{d}x_{t-1}x_{t}\\+\int q(x_{1}|x_{0})\log p(x_{1}|x_{0})\mathrm{d}x_{1} ELBO:=∫q(xT∣x0)logq(xT∣x0)p(xT)dxT+t=2∑T∫q(xt−1,xt∣x0)logq(xt−1∣xt,x0)p(xt−1∣xt)dxt−1xt+∫q(x1∣x0)logp(x1∣x0)dx1
L o s s : = − E L B O = D K L ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) + ∑ t = 2 T ∫ q ( x t ∣ x 0 ) d x t D K L ( q ( x t − 1 ∣ x t , x 0 ) ∥ p ( x t − 1 ∣ x t ) ) − ∫ q ( x 1 ∣ x 0 ) log p ( x 1 ∣ x 0 ) d x 1 Loss:=-ELBO= D_{KL} (q(x_{T}|x_{0})\| p(x_{T}))\\ +\sum_{t=2}^T \int q(x_{t}|x_{0})\mathrm{d}x_{t}D_{KL}(q(x_{t-1}|x_{t}, x_{0})\|p(x_{t-1}|x_{t}))\\-\int q(x_{1}|x_{0})\log p(x_{1}|x_{0})\mathrm{d}x_{1} Loss:=−ELBO=DKL(q(xT∣x0)∥p(xT))+t=2∑T∫q(xt∣x0)dxtDKL(q(xt−1∣xt,x0)∥p(xt−1∣xt))−∫q(x1∣x0)logp(x1∣x0)dx1
basic assumption
Definition(Diffusion Model)
Parameters:
Fact.
Design I: p ( x t − 1 ∣ x t ) ∼ N ( μ ( t ) , Σ ( t ) ) p(x_{t-1}|x_{t})\sim N(\mu(t),\Sigma(t)) p(xt−1∣xt)∼N(μ(t),Σ(t)):
μ ( t ) = α t ( 1 − α ˉ t − 1 ) x t − β t α ˉ t − 1 x ^ ( x t , t ) 1 − α ˉ t Σ ( t ) = σ 2 ( t ) \mu(t)=\frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_t-\beta_{t}\sqrt{\bar\alpha_{t-1}}\hat{x}(x_t,t)}{1-\bar\alpha_t}\\ \Sigma(t)=\sigma^2(t) μ(t)=1−αˉtαt(1−αˉt−1)xt−βtαˉt−1x^(xt,t)Σ(t)=σ2(t)
Design II: p ( x t − 1 ∣ x t ) ∼ N ( μ ( t ) , Σ ( t ) ) p(x_{t-1}|x_{t})\sim N(\mu(t),\Sigma(t)) p(xt−1∣xt)∼N(μ(t),Σ(t)):
μ ( t ) = 1 α t x t − β t 1 − α ˉ t α t ϵ ^ ( x t , t ) Σ ( t ) = σ 2 ( t ) \mu(t)=\frac{1}{\sqrt{\alpha_t}}x_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}\sqrt{\alpha_t}}\hat{\epsilon}(x_t,t)\\ \Sigma(t)=\sigma^2(t) μ(t)=αt1xt−1−αˉtαtβtϵ^(xt,t)Σ(t)=σ2(t)
Fact.
Under the design I:
D K L ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) = 1 2 σ t 2 ( 1 − α ˉ t − 1 ) β t 2 ( 1 − α ˉ t ) 2 ∥ x ^ ( x t , t ) − x 0 ∥ 2 = 1 2 ( 1 1 − α ˉ t − 1 − 1 1 − α ˉ t ) ∥ x ^ ( x t , t ) − x 0 ∥ 2 D_{KL} (q(x_{t−1}|x_t , x_0) \| p_θ (x_{t−1} |x_t))=\frac{1}{2\sigma_t^2}\frac{(1-\bar{\alpha}_{t-1})\beta_t^2}{(1-\bar{\alpha}_{t})^2}\|\hat{x}(x_t,t)-x_0\|^2\\ =\frac{1}{2}(\frac{1}{1-\bar{\alpha}_{t-1}}-\frac{1}{1-\bar{\alpha}_{t}})\|\hat{x}(x_t,t)-x_0\|^2 DKL(q(xt−1∣xt,x0)∥pθ(xt−1∣xt))=2σt21(1−αˉt)2(1−αˉt−1)βt2∥x^(xt,t)−x0∥2=21(1−αˉt−11−1−αˉt1)∥x^(xt,t)−x0∥2
Under the design II:
D K L ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) = 1 2 σ t 2 β t 2 ( 1 − α ˉ t ) α t 2 ∥ ϵ ^ ( x t , t ) − ϵ 0 ∥ 2 D_{KL} (q(x_{t−1}|x_t , x_0) \| p_θ (x_{t−1} |x_t))=\frac{1}{2\sigma_t^2}\frac{\beta_t^2}{(1-\bar{\alpha}_{t})\alpha_t^2}\|\hat{\epsilon}(x_t,t)-\epsilon_0\|^2 DKL(q(xt−1∣xt,x0)∥pθ(xt−1∣xt))=2σt21(1−αˉt)αt2βt2∥ϵ^(xt,t)−ϵ0∥2
Loss:
L = ∑ t L t L t ≈ ∑ ϵ ∼ N ( 0 , 1 ) ∥ ϵ − ϵ ^ ( x t , t ) ∥ 2 , ( 0 ≤ t < T ) L=\sum_t L_t\\ L_t\approx \sum_{\epsilon\sim N(0,1)}\|\epsilon-\hat{\epsilon}(x_{t},t)\|^2,(0\leq t
where x t : = α ˉ t x 0 + 1 − α ˉ t ϵ x_{t}:=\sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon xt:=αˉtx0+1−αˉtϵ.
train NN ϵ ^ \hat\epsilon ϵ^ by data { ( ϵ ^ ( x t ( x 0 , i , ϵ i l ) , t ) , ϵ i l ) , ϵ i l ∼ N ( 0 , 1 ) , l = 1 , ⋯ , L } \{(\hat{\epsilon}(x_{t}(x_{0,i},\epsilon_{il}),t),\epsilon_{il}),\epsilon_{il}\sim N(0,1),l=1,\cdots, L\} {(ϵ^(xt(x0,i,ϵil),t),ϵil),ϵil∼N(0,1),l=1,⋯,L} with size of N L NL NL for each t t t。
Exercise
References