Kim Y, Zhang K, Rush A M, et al. Adversarially regularized autoencoders[J]. arXiv preprint arXiv:1706.04223, 2017.
GitHub: https://github.com/jakezhaojb/ARAE
adversarially regularized autoencoder (ARAE)
Deep latent variable models (也就是VAE、GAN这种由随机变量作种子的模型)比较方便生成连续的样本。当把他们运用在例如文本、离散图片等离散结构上时,将会遇到很大挑战。本文提出了一个灵活的方法来训练 deep latent variable models of discrete structures。
就是把离散序列 encoder 之后再 decoder,通过 softmax 来进行离散
L r e c ( ϕ , ψ ) = − l o g p ψ ( x ∣ e n c ϕ ( x ) ) L_{rec}(\phi,\psi)=-log~p_{\psi}(x|enc_{\phi}(x)) Lrec(ϕ,ψ)=−log pψ(x∣encϕ(x))
x ^ = a r g m a x x p ψ ( x ∣ e n c ϕ ( x ) ) \hat{x}=argmax_{x}~p_{\psi}(x|enc_{\phi}(x)) x^=argmaxx pψ(x∣encϕ(x))
编码器和解码器是一个 problem-specific(特定的问题),一般可以选择 RNN 作为解码器和编码器。
WGAN:
m i n θ m a x w ∈ W E z ∼ p r [ f w ( z ) ] − E z ~ ∼ p z [ f w ( z ~ ) ] min_{\theta}max_{w\in W}E_{z\sim p_r}[f_w(z)]-E_{\tilde{z}\sim p_z}[f_w(\tilde{z})] minθmaxw∈WEz∼pr[fw(z)]−Ez~∼pz[fw(z~)]
weight-clipping w = [ − ϵ , ϵ ] w=[-\epsilon,\epsilon] w=[−ϵ,ϵ]
ARAE combines a discrete autoencoder with a GAN-regularized latent representation. 模型如下图所示,学习离散空间 P ψ P_{\psi} Pψ。直觉上这种方法用一个更灵活的先验分布提供了一个更平滑的离散编码空间。
模型包含 a discrete autoencoder regularized with a prior distribution,
m i n ϕ , ψ L r e c ( ϕ , ψ ) + λ ( 1 ) W ( P Q , P z ) min_{\phi,\psi}L_{rec}(\phi,\psi)+\lambda^{(1)}W(P_Q,P_z) minϕ,ψLrec(ϕ,ψ)+λ(1)W(PQ,Pz)
其中 W W W表示离散编码空间 P Q P_Q PQ(就是 x x x经过编码后 e n c ϕ ( x ) enc_{\phi}(x) encϕ(x)概率空间)和 P z P_z Pz的 Wasserstein 距离。模型训练相当于对下面几个目标进行求解:
(1)为最小化编码解码器的的重构误差、(2)是优化判别器、(3)是优化生成器
经验上我们发现,先验分布 P z P_z Pz对结果有很强的影响,最简单的选择是固定的高斯分布 N ( 0 , 1 ) N(0,1) N(0,1),但是这种限制很强的条件很容易造成模型的崩溃。我们不固定 P z P_z Pz而是通过一个生成器来学习一个从高斯分布 N ( 0 , 1 ) N(0,1) N(0,1)到 P z P_z Pz的映射。
Algorithm 1 ARAE Training
for each training iteration do
- (1) Train the encoder/decoder for reconstruction KaTeX parse error: Expected 'EOF', got '}' at position 11: (\phi,\psi}̲)
- Sample { x ( i ) } i = 1 m ∼ P r \{x^{(i)}\}^m_{i=1}\sim P_r {x(i)}i=1m∼Pr and compute z ( i ) = e n c ϕ ( x ( i ) ) z^{(i)}=enc_{\phi}(x^{(i)}) z(i)=encϕ(x(i))
- Backprop loss, L r e c = − 1 m ∑ i = 1 m l o g p ψ ( x ( i ) ∣ z ( i ) ) L_{rec}=−\frac{1}{m}\sum^m_{i=1}log~p_{\psi}(x^{(i)}|z^{(i)}) Lrec=−m1∑i=1mlog pψ(x(i)∣z(i))
- (2) Train the critic ( w ) (w) (w)
- Sample { x ( i ) } i = 1 m ∼ P r \{x^{(i)}\}^m_{i=1}\sim P_r {x(i)}i=1m∼Pr and ${s{(i)}}m_{i=1}\sim N(0, I)
- Compute $z{(i)}=enc_{\phi}(x{(i)}) and $\hat{z}{(i)}=g_{\theta}(z{(i)})
- Backprop loss $-\frac{1}{m}\summ_{i=1}f_w(z(i))+frac{1}{m}\summ_{i=1}f_w(\hat{z}{(i)})
- Clip critic w w w to KaTeX parse error: Unexpected character: '' at position 11: [−\epsilon̲, \epsilon]
- (3) Train the encoder/generator adversarially ( ϕ , θ ) (\phi, \theta) (ϕ,θ)
- Sample { x ( i ) } i = 1 m ∼ P r \{x^{(i)}\}^m_{i=1}\sim P_r {x(i)}i=1m∼Pr and { s ( i ) } i = 1 m ∼ N ( 0 , I ) \{s^{(i)}\}^m_{i=1}\sim N(0, I) {s(i)}i=1m∼N(0,I)
- Compute z ( i ) = e n c ϕ ( x ( i ) ) z^{(i)}=enc_{\phi}(x^{(i)}) z(i)=encϕ(x(i)) and z ^ ( i ) = g θ ( s ( i ) ) \hat{z}^{(i)}=g_{\theta}(s^{(i)}) z^(i)=gθ(s(i)).
- Backprop loss 1 m ∑ m i = 1 f w ( z ( i ) ) − 1 m ∑ i = 1 m f w ( z ^ ( i ) ) \frac{1}{m}\sum_m^{i=1} f_w(z^{(i)})− \frac{1}{m}\sum^m_{i=1} f_w(\hat{z}^{(i)}) m1∑mi=1fw(z(i))−m1∑i=1mfw(z^(i))
end for
考虑对齐问题,对解码器增加一个条件变为 p ψ ( x ∣ z , y ) p_{\psi}(x|z,y) pψ(x∣z,y)(没看太明白,以后看代码看看能看明白不),最优化时考虑分类误差
m i n ϕ , ψ L r e c ( ϕ , ψ ) + λ ( 1 ) W ( P Q , P z ) − λ ( 2 ) L c l a s s ( ϕ , u ) min_{\phi,\psi}L_{rec}(\phi,\psi)+\lambda^{(1)}W(P_Q,P_z)-\lambda^{(2)}L_{class}(\phi,u) minϕ,ψLrec(ϕ,ψ)+λ(1)W(PQ,Pz)−λ(2)Lclass(ϕ,u)
本文中 λ ( 2 ) = 1 \lambda^{(2)}=1 λ(2)=1,并且需要在训练时增加两个步骤:(2b) 训练分类器、(3b)为分类器训练解码器
Algorithm 2 ARAE Transfer Extension
Each loop additionally:
- (2b) Train attribute classifier ( u ) (u) (u)
- Sample { x ( i ) } i = 1 m ∼ P r \{x^{(i)}\}^m_{i=1}\sim P_r {x(i)}i=1m∼Pr, lookup $y^{(i)} , and compute z ( i ) = e n c ϕ ( x ( i ) ) z^(i)=enc_{\phi}(x^{(i)}) z(i)=encϕ(x(i))
- Backprop loss − 1 m ∑ i = 1 m l o g p u ( y ( i ) ∣ z ( i ) ) −\frac{1}{m}\sum^m_{i=1}log~p_u(y^{(i)}|z^{(i)}) −m1∑i=1mlog pu(y(i)∣z(i))
- (3b) Train the encoder adversarially ( ϕ ) (\phi) (ϕ)
- Sample { x ( i ) } i = 1 m ∼ P r \{x^{(i)}\}^m_{i=1}\sim P_r {x(i)}i=1m∼Pr, lookup $y^{(i)} , and compute z ( i ) = e n c ϕ ( x ( i ) ) z^(i)=enc_{\phi}(x^{(i)}) z(i)=encϕ(x(i))
- Backprop loss − 1 m ∑ i = 1 m l o g p u ( 1 − y ( i ) ∣ z ( i ) ) −\frac{1}{m}\sum^m_{i=1}log~p_u(1-y^{(i)}|z^{(i)}) −m1∑i=1mlog pu(1−y(i)∣z(i))
在标准的 GAN 中,我们隐式的减小真实分布和模型分布。在本文的情况中,我的理解是隐式的最小化 embedding 空间的真实分布和模型分布,并且最小化模型分布 P r P_r Pr和隐变量分布 p ψ = ∫ z p ψ ( x ∣ z ) p ( z ) d z p_{\psi}=\int_zp_{\psi}(x|z)p(z)dz pψ=∫zpψ(x∣z)p(z)dz。
略去一些很数学的证明
看 github 上作者把 WGAN 方法更新为 WGAN-UP。