深度学习笔记(十三)—— GAN-2

LSGAN

LSGAN(Least Squares GAN)将loss函数改为了 L2损失.G和D的优化目标如下图所示,


image

作业:

在这里,请在下方补充L2Loss的代码来实现L2损失来优化上面的目标.并使用这个loss函数在mnist数据集上训练LSGAN,并显示训练的效果图片及loss变化曲线.

提示:忽略上图的1/2.L2损失即MSEloss(均方误差),传入两个参数input_是指判别器D预测为"真实"的概率值(size为batch_size*1),target为标签1或0(size为batch_size*1).只允许使用pytorch和python的运算实现(不能直接调用MSEloss)

class L2Loss(nn.Module):
    
    def __init__(self):
        super(L2Loss, self).__init__()
    
    def forward(self, input_, target):
        """
        input_: (batch_size*1) 
        target: (batch_size*1) labels, 1 or 0
        """
        return ((input_ - target) ** 2).mean()

完成上方代码后,使用所写的L2Loss在mnist数据集上训练DCGAN.

# hyper params

# z dim
latent_dim = 100

# image size and channel
image_size=32
image_channel=1

# Adam lr and betas
learning_rate = 0.0002
betas = (0.5, 0.999)

# epochs and batch size
n_epochs = 100
batch_size = 32

# device : cpu or cuda:0/1/2/3
device = torch.device('cuda:0')

# mnist dataset and dataloader
train_dataset = load_mnist_data()
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# use L2Loss as loss function
l2loss = L2Loss().to(device)

# G and D model, use DCGAN
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
d_loss_hist, g_loss_hist = run_gan(trainloader, G, D, G_optimizer, D_optimizer, l2loss, n_epochs, device, 
                                   latent_dim)
loss_plot(d_loss_hist, g_loss_hist)
Epoch 0: Train D loss: 0.0631, G loss: 0.9534
image
Epoch 1: Train D loss: 0.0268, G loss: 0.9953
Epoch 2: Train D loss: 0.0002, G loss: 1.0000
Epoch 3: Train D loss: 0.0001, G loss: 1.0000
Epoch 4: Train D loss: 0.0000, G loss: 1.0000
Epoch 5: Train D loss: 0.0000, G loss: 1.0000
Epoch 6: Train D loss: 0.0000, G loss: 1.0000
Epoch 7: Train D loss: 0.0000, G loss: 1.0000
Epoch 8: Train D loss: 0.0000, G loss: 1.0000
Epoch 9: Train D loss: 0.0000, G loss: 1.0000
image
Epoch 10: Train D loss: 0.0000, G loss: 1.0000
Epoch 11: Train D loss: 0.0000, G loss: 1.0000
Epoch 12: Train D loss: 0.0000, G loss: 1.0000
Epoch 13: Train D loss: 0.0000, G loss: 1.0000
Epoch 14: Train D loss: 0.0000, G loss: 0.9999
Epoch 15: Train D loss: 0.0155, G loss: 0.9995
Epoch 16: Train D loss: 0.0000, G loss: 0.9999
Epoch 17: Train D loss: 0.0855, G loss: 0.9992
Epoch 18: Train D loss: 1.0000, G loss: 1.0000
Epoch 19: Train D loss: 1.0000, G loss: 1.0000
image
Epoch 20: Train D loss: 1.0000, G loss: 1.0000
Epoch 21: Train D loss: 1.0000, G loss: 1.0000
Epoch 22: Train D loss: 1.0000, G loss: 1.0000
Epoch 23: Train D loss: 1.0000, G loss: 1.0000
Epoch 24: Train D loss: 0.9999, G loss: 1.0000
Epoch 25: Train D loss: 0.4592, G loss: 1.0000
Epoch 26: Train D loss: 0.0000, G loss: 1.0000
Epoch 27: Train D loss: 0.0000, G loss: 1.0000
Epoch 28: Train D loss: 0.0000, G loss: 1.0000
Epoch 29: Train D loss: 0.0000, G loss: 1.0000

......

image
Epoch 90: Train D loss: 0.0000, G loss: 1.0000
Epoch 91: Train D loss: 0.0000, G loss: 1.0000
Epoch 92: Train D loss: 0.0000, G loss: 1.0000
Epoch 93: Train D loss: 0.0000, G loss: 1.0000
Epoch 94: Train D loss: 0.0000, G loss: 1.0000
Epoch 95: Train D loss: 0.0000, G loss: 1.0000
Epoch 96: Train D loss: 0.0000, G loss: 1.0000
Epoch 97: Train D loss: 0.0000, G loss: 1.0000
Epoch 98: Train D loss: 0.0000, G loss: 1.0000
Epoch 99: Train D loss: 0.0000, G loss: 1.0000
image
image

WGAN

GAN依然存在着训练不稳定,模式崩溃(collapse mode,可以理解为生成的图片多样性极低)的问题(我们的数据集不一定能体现出来).WGAN(Wasserstein GAN)将传统GAN中拟合的JS散度改为Wasserstein距离.WGAN一定程度上解决了GAN训练不稳定以及模式奔溃的问题.

WGAN的判别器的优化目标变为,在满足Lipschitz连续的条件(我们可以限制w不超过某个范围来满足)下,最大化


image

而它会近似于真实分布与生成分布之间的Wasserstein距离.所以我们D和G的loss函数变为:


image

image

具体到在实现上,WGAN主要有3点改变:

  • 判别器D最后一层去掉sigmoid
  • 生成器G和判别器的loss不使用log
  • 每次更新判别器D后,将参数的绝对值截断到某一个固定常数c

所以我们主要重写了WGAN的训练函数,在这里,网络结构使用去除Sigmoid的DCGAN(注意初始化D时将sigmoid设置为False来去掉最后一层sigmoid).

下面是WGAN的代码实现.加入了两个参数,n_d表示每训练一次G训练D的次数,weight_clip表示截断的常数.

def wgan_train(trainloader, G, D, G_optimizer, D_optimizer, device, z_dim, n_d=2, weight_clip=0.01):
    
    """
    n_d: the number of iterations of D update per G update iteration
    weight_clip: the clipping parameters
    """
    
    D.train()
    G.train()
    
    D_total_loss = 0
    G_total_loss = 0
    
    for i, (x, _) in enumerate(trainloader):
        
        x = x.to(device)
        
        # update D network
        # D optimizer zero grads
        D_optimizer.zero_grad()
        
        # D real loss from real images
        d_real = D(x)
        d_real_loss = - d_real.mean()
        
        # D fake loss from fake images generated by G
        z = torch.rand(x.size(0), z_dim).to(device)
        g_z = G(z)
        d_fake = D(g_z)
        d_fake_loss = d_fake.mean()
        
        # D backward and step
        d_loss = d_real_loss + d_fake_loss
        d_loss.backward()
        D_optimizer.step()
        
        # D weight clip
        for params in D.parameters():
            params.data.clamp_(-weight_clip, weight_clip)
            
        D_total_loss += d_loss.item()

        # update G network
        if (i + 1) % n_d == 0:
            # G optimizer zero grads
            G_optimizer.zero_grad()

            # G loss
            g_z = G(z)
            d_fake = D(g_z)
            g_loss = - d_fake.mean()

            # G backward and step
            g_loss.backward()
            G_optimizer.step()
            
            G_total_loss += g_loss.item()
    
    return D_total_loss / len(trainloader), G_total_loss * n_d / len(trainloader)
def run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, latent_dim, n_d, weight_clip):
    d_loss_hist = []
    g_loss_hist = []

    for epoch in range(n_epochs):
        d_loss, g_loss = wgan_train(trainloader, G, D, G_optimizer, D_optimizer, device, 
                               z_dim=latent_dim, n_d=n_d, weight_clip=weight_clip)
        print('Epoch {}: Train D loss: {:.4f}, G loss: {:.4f}'.format(epoch, d_loss, g_loss))

        d_loss_hist.append(d_loss)
        g_loss_hist.append(g_loss)

        if epoch == 0 or (epoch + 1) % 10 == 0:
            visualize_results(G, device, latent_dim) 
    
    return d_loss_hist, g_loss_hist

接下来让我们使用写好的run_wgan来跑我们的家具(椅子)数据集,看看效果如何.

# hyper params

# z dim
latent_dim = 100

# image size and channel
image_size=32
image_channel=3

# Adam lr and betas
learning_rate = 0.0002
betas = (0.5, 0.999)

# epochs and batch size
n_epochs = 300
batch_size = 32

# n_d: the number of iterations of D update per G update iteration
n_d = 2
weight_clip=0.01

# device : cpu or cuda:0/1/2/3
device = torch.device('cuda:0')

# mnist dataset and dataloader
train_dataset = load_furniture_data()
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# G and D model, use DCGAN, note that sigmoid is removed in D
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)

d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                    latent_dim, n_d, weight_clip)
Epoch 0: Train D loss: -0.0106, G loss: -0.0003
image
Epoch 1: Train D loss: -0.0576, G loss: 0.0163
Epoch 2: Train D loss: -0.1321, G loss: 0.0897
Epoch 3: Train D loss: -0.2723, G loss: 0.1958
Epoch 4: Train D loss: -0.4514, G loss: 0.2948
Epoch 5: Train D loss: -0.6250, G loss: 0.3647
Epoch 6: Train D loss: -0.7757, G loss: 0.4329
Epoch 7: Train D loss: -0.7672, G loss: 0.4643
Epoch 8: Train D loss: -0.6148, G loss: 0.4314
Epoch 9: Train D loss: -0.6224, G loss: 0.4193
image
Epoch 10: Train D loss: -0.7804, G loss: 0.4699
Epoch 11: Train D loss: -0.6644, G loss: 0.4546
Epoch 12: Train D loss: -0.6075, G loss: 0.4116
Epoch 13: Train D loss: -0.6073, G loss: 0.4478
Epoch 14: Train D loss: -0.6728, G loss: 0.4871
Epoch 15: Train D loss: -0.6588, G loss: 0.4808
Epoch 16: Train D loss: -0.7344, G loss: 0.4943
Epoch 17: Train D loss: -0.6334, G loss: 0.4702
Epoch 18: Train D loss: -0.6585, G loss: 0.4845
Epoch 19: Train D loss: -0.6050, G loss: 0.4522

......

image
Epoch 280: Train D loss: -0.3420, G loss: 0.2176
Epoch 281: Train D loss: -0.3566, G loss: 0.2435
Epoch 282: Train D loss: -0.3164, G loss: 0.2247
Epoch 283: Train D loss: -0.3413, G loss: 0.2615
Epoch 284: Train D loss: -0.3329, G loss: 0.2564
Epoch 285: Train D loss: -0.3325, G loss: 0.2060
Epoch 286: Train D loss: -0.3658, G loss: 0.2411
Epoch 287: Train D loss: -0.3306, G loss: 0.2545
Epoch 288: Train D loss: -0.3219, G loss: 0.2016
Epoch 289: Train D loss: -0.3500, G loss: 0.2295
image
Epoch 290: Train D loss: -0.3106, G loss: 0.2088
Epoch 291: Train D loss: -0.3219, G loss: 0.1998
Epoch 292: Train D loss: -0.3572, G loss: 0.2716
Epoch 293: Train D loss: -0.3290, G loss: 0.2812
Epoch 294: Train D loss: -0.3273, G loss: 0.2141
Epoch 295: Train D loss: -0.3324, G loss: 0.2854
Epoch 296: Train D loss: -0.3222, G loss: 0.2421
Epoch 297: Train D loss: -0.3475, G loss: 0.2820
Epoch 298: Train D loss: -0.3196, G loss: 0.2251
Epoch 299: Train D loss: -0.3290, G loss: 0.2239
image

WGAN的原理我们知道,D_loss的相反数可以表示生成数据分布与真实分布的Wasserstein距离,其数值越小,表明两个分布越相似,GAN训练得越好.它的值给我们训练GAN提供了一个指标.

运行下方代码观察wgan的loss曲线,可以看到,总体上,D_loss的相反数随着epoch数增加逐渐下降,同时生成的数据也越来越逼近真实数据,这与wgan的原理是相符合的.

loss_plot(d_loss_hist, g_loss_hist)
image

接下来运行下面两个cell的代码,集中展示wgan的参数分布.

from utils import show_weights_hist
def show_d_params(D):
    plist = []
    for params in D.parameters():
        plist.extend(params.cpu().data.view(-1).numpy())
    show_weights_hist(plist)
show_d_params(D)
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6571: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
image

可以看到,参数都被截断在[-c, c]之间,大部分参数集中在-c和c附近.

作业:

尝试使用n_d设置为5, 3, 1等,再次训练wGAN,n_d为多少时的结果最好?

答:

When n_d= 5, we will find that the details of the generated graphics are much clearer than the other two. At this time, every five times D is trained, G is trained again. Therefore, the iterative update of G is based on a better discriminator. This can significantly improve the performance without updating G every time.

n_d = 1

# G and D model, use DCGAN, note that sigmoid is removed in D
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)

d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                    latent_dim, n_d, weight_clip)

loss_plot(d_loss_hist, g_loss_hist)
Epoch 0: Train D loss: -0.0251, G loss: -0.0089
image
Epoch 1: Train D loss: -0.0200, G loss: -0.0058
Epoch 2: Train D loss: -0.0403, G loss: 0.0151
Epoch 3: Train D loss: -0.0840, G loss: 0.0692
Epoch 4: Train D loss: -0.1110, G loss: 0.1149
Epoch 5: Train D loss: -0.0798, G loss: 0.0653
Epoch 6: Train D loss: -0.0668, G loss: 0.0619
Epoch 7: Train D loss: -0.0763, G loss: 0.0924
Epoch 8: Train D loss: -0.1395, G loss: 0.1376
Epoch 9: Train D loss: -0.1790, G loss: 0.1760
image
Epoch 10: Train D loss: -0.1733, G loss: 0.1778
Epoch 11: Train D loss: -0.1643, G loss: 0.2132
Epoch 12: Train D loss: -0.2438, G loss: 0.2327
Epoch 13: Train D loss: -0.2688, G loss: 0.2631
Epoch 14: Train D loss: -0.2538, G loss: 0.2624
Epoch 15: Train D loss: -0.1750, G loss: 0.1571
Epoch 16: Train D loss: -0.2005, G loss: 0.1801
Epoch 17: Train D loss: -0.2626, G loss: 0.1983
Epoch 18: Train D loss: -0.2573, G loss: 0.2271
Epoch 19: Train D loss: -0.2479, G loss: 0.2566
image
Epoch 20: Train D loss: -0.1754, G loss: 0.2312
Epoch 21: Train D loss: -0.2361, G loss: 0.2213
Epoch 22: Train D loss: -0.4678, G loss: 0.3198
Epoch 23: Train D loss: -0.3996, G loss: 0.3100
Epoch 24: Train D loss: -0.4355, G loss: 0.3225
Epoch 25: Train D loss: -0.4151, G loss: 0.3199
Epoch 26: Train D loss: -0.3595, G loss: 0.3087
Epoch 27: Train D loss: -0.4016, G loss: 0.3302
Epoch 28: Train D loss: -0.3243, G loss: 0.2787
Epoch 29: Train D loss: -0.2890, G loss: 0.2380
image
Epoch 30: Train D loss: -0.1935, G loss: 0.1274
Epoch 31: Train D loss: -0.4133, G loss: 0.3306
Epoch 32: Train D loss: -0.2924, G loss: 0.2732
Epoch 33: Train D loss: -0.3298, G loss: 0.3033
Epoch 34: Train D loss: -0.3138, G loss: 0.2745
Epoch 35: Train D loss: -0.4105, G loss: 0.3589
Epoch 36: Train D loss: -0.2292, G loss: 0.2321
Epoch 37: Train D loss: -0.4472, G loss: 0.3496
Epoch 38: Train D loss: -0.3871, G loss: 0.3079
Epoch 39: Train D loss: -0.3574, G loss: 0.3200
image
Epoch 40: Train D loss: -0.4521, G loss: 0.3567
Epoch 41: Train D loss: -0.3822, G loss: 0.3030
Epoch 42: Train D loss: -0.3556, G loss: 0.3106
Epoch 43: Train D loss: -0.4338, G loss: 0.3545
Epoch 44: Train D loss: -0.4273, G loss: 0.3315
Epoch 45: Train D loss: -0.4402, G loss: 0.3320
Epoch 46: Train D loss: -0.3696, G loss: 0.3154
Epoch 47: Train D loss: -0.4215, G loss: 0.3088
Epoch 48: Train D loss: -0.4023, G loss: 0.3035
Epoch 49: Train D loss: -0.4106, G loss: 0.3108
image
Epoch 50: Train D loss: -0.4090, G loss: 0.3000
Epoch 51: Train D loss: -0.3908, G loss: 0.3033
Epoch 52: Train D loss: -0.3929, G loss: 0.3011
Epoch 53: Train D loss: -0.3975, G loss: 0.2898
Epoch 54: Train D loss: -0.3904, G loss: 0.3115
Epoch 55: Train D loss: -0.3649, G loss: 0.2771
Epoch 56: Train D loss: -0.3763, G loss: 0.2938
Epoch 57: Train D loss: -0.3817, G loss: 0.3170
Epoch 58: Train D loss: -0.3438, G loss: 0.2766
Epoch 59: Train D loss: -0.3707, G loss: 0.3001

......

image
Epoch 280: Train D loss: -0.1990, G loss: 0.1610
Epoch 281: Train D loss: -0.2045, G loss: 0.2129
Epoch 282: Train D loss: -0.1959, G loss: 0.1990
Epoch 283: Train D loss: -0.1795, G loss: 0.1501
Epoch 284: Train D loss: -0.1925, G loss: 0.1886
Epoch 285: Train D loss: -0.1922, G loss: 0.1648
Epoch 286: Train D loss: -0.1990, G loss: 0.1833
Epoch 287: Train D loss: -0.1987, G loss: 0.1909
Epoch 288: Train D loss: -0.2003, G loss: 0.1681
Epoch 289: Train D loss: -0.2046, G loss: 0.1724
image
Epoch 290: Train D loss: -0.2004, G loss: 0.1841
Epoch 291: Train D loss: -0.2178, G loss: 0.1841
Epoch 292: Train D loss: -0.1769, G loss: 0.1601
Epoch 293: Train D loss: -0.1852, G loss: 0.1555
Epoch 294: Train D loss: -0.1895, G loss: 0.1879
Epoch 295: Train D loss: -0.1996, G loss: 0.1534
Epoch 296: Train D loss: -0.1944, G loss: 0.1817
Epoch 297: Train D loss: -0.1926, G loss: 0.1857
Epoch 298: Train D loss: -0.2057, G loss: 0.1622
Epoch 299: Train D loss: -0.2130, G loss: 0.1960
image
image
n_d = 3

# G and D model, use DCGAN, note that sigmoid is removed in D
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)

d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                    latent_dim, n_d, weight_clip)

loss_plot(d_loss_hist, g_loss_hist)
Epoch 0: Train D loss: 0.0069, G loss: 0.0021
image
Epoch 1: Train D loss: -0.0791, G loss: 0.0306
Epoch 2: Train D loss: -0.1852, G loss: 0.1159
Epoch 3: Train D loss: -0.3618, G loss: 0.2186
Epoch 4: Train D loss: -0.4753, G loss: 0.2786
Epoch 5: Train D loss: -0.6302, G loss: 0.3484
Epoch 6: Train D loss: -0.7498, G loss: 0.3949
Epoch 7: Train D loss: -0.8587, G loss: 0.4415
Epoch 8: Train D loss: -0.9714, G loss: 0.4878
Epoch 9: Train D loss: -1.0270, G loss: 0.5135
image
Epoch 10: Train D loss: -1.0649, G loss: 0.5341
Epoch 11: Train D loss: -0.9526, G loss: 0.5177
Epoch 12: Train D loss: -0.8284, G loss: 0.4603
Epoch 13: Train D loss: -0.9364, G loss: 0.5148
Epoch 14: Train D loss: -1.0217, G loss: 0.5523
Epoch 15: Train D loss: -0.9515, G loss: 0.4988
Epoch 16: Train D loss: -0.9435, G loss: 0.5272
Epoch 17: Train D loss: -0.8170, G loss: 0.4336
Epoch 18: Train D loss: -0.8701, G loss: 0.4690
Epoch 19: Train D loss: -0.9068, G loss: 0.5018
image
Epoch 20: Train D loss: -0.8681, G loss: 0.4756
Epoch 21: Train D loss: -0.8347, G loss: 0.4296
Epoch 22: Train D loss: -0.8639, G loss: 0.4728
Epoch 23: Train D loss: -0.7830, G loss: 0.4581
Epoch 24: Train D loss: -0.7746, G loss: 0.4464
Epoch 25: Train D loss: -0.8700, G loss: 0.4785
Epoch 26: Train D loss: -0.8557, G loss: 0.4636
Epoch 27: Train D loss: -0.7885, G loss: 0.4442
Epoch 28: Train D loss: -0.7860, G loss: 0.4482
Epoch 29: Train D loss: -0.7841, G loss: 0.4317

......

image
Epoch 260: Train D loss: -0.4257, G loss: 0.2434
Epoch 261: Train D loss: -0.3834, G loss: 0.1874
Epoch 262: Train D loss: -0.4639, G loss: 0.3219
Epoch 263: Train D loss: -0.4426, G loss: 0.2938
Epoch 264: Train D loss: -0.4858, G loss: 0.2983
Epoch 265: Train D loss: -0.4438, G loss: 0.3005
Epoch 266: Train D loss: -0.4347, G loss: 0.2685
Epoch 267: Train D loss: -0.4632, G loss: 0.2412
Epoch 268: Train D loss: -0.4347, G loss: 0.3064
Epoch 269: Train D loss: -0.4426, G loss: 0.3141
image
Epoch 270: Train D loss: -0.4450, G loss: 0.2698
Epoch 271: Train D loss: -0.4017, G loss: 0.1301
Epoch 272: Train D loss: -0.4728, G loss: 0.2955
Epoch 273: Train D loss: -0.4224, G loss: 0.1896
Epoch 274: Train D loss: -0.4218, G loss: 0.2128
Epoch 275: Train D loss: -0.4780, G loss: 0.2925
Epoch 276: Train D loss: -0.4397, G loss: 0.2963
Epoch 277: Train D loss: -0.4463, G loss: 0.2299
Epoch 278: Train D loss: -0.4356, G loss: 0.3044
Epoch 279: Train D loss: -0.4483, G loss: 0.2750
image
Epoch 280: Train D loss: -0.4312, G loss: 0.2676
Epoch 281: Train D loss: -0.4409, G loss: 0.2906
Epoch 282: Train D loss: -0.4464, G loss: 0.2933
Epoch 283: Train D loss: -0.4409, G loss: 0.1911
Epoch 284: Train D loss: -0.4241, G loss: 0.1807
Epoch 285: Train D loss: -0.4174, G loss: 0.2371
Epoch 286: Train D loss: -0.4385, G loss: 0.2776
Epoch 287: Train D loss: -0.4441, G loss: 0.3239
Epoch 288: Train D loss: -0.3909, G loss: 0.1265
Epoch 289: Train D loss: -0.4617, G loss: 0.3183
image
Epoch 290: Train D loss: -0.4374, G loss: 0.2967
Epoch 291: Train D loss: -0.4362, G loss: 0.2297
Epoch 292: Train D loss: -0.4295, G loss: 0.2365
Epoch 293: Train D loss: -0.4244, G loss: 0.2824
Epoch 294: Train D loss: -0.4617, G loss: 0.3120
Epoch 295: Train D loss: -0.3845, G loss: 0.1841
Epoch 296: Train D loss: -0.4179, G loss: 0.3275
Epoch 297: Train D loss: -0.3968, G loss: 0.2162
Epoch 298: Train D loss: -0.4360, G loss: 0.2535
Epoch 299: Train D loss: -0.4168, G loss: 0.1963
image
image
n_d = 5

# G and D model, use DCGAN, note that sigmoid is removed in D
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)

d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                    latent_dim, n_d, weight_clip)

loss_plot(d_loss_hist, g_loss_hist)
Epoch 0: Train D loss: -0.0630, G loss: 0.0124
image
Epoch 1: Train D loss: -0.1226, G loss: 0.0588
Epoch 2: Train D loss: -0.2772, G loss: 0.1625
Epoch 3: Train D loss: -0.4880, G loss: 0.2672
Epoch 4: Train D loss: -0.6543, G loss: 0.3397
Epoch 5: Train D loss: -0.7899, G loss: 0.4041
Epoch 6: Train D loss: -0.8909, G loss: 0.4511
Epoch 7: Train D loss: -0.9759, G loss: 0.4947
Epoch 8: Train D loss: -1.0392, G loss: 0.5194
Epoch 9: Train D loss: -1.1024, G loss: 0.5463
image
Epoch 10: Train D loss: -1.1374, G loss: 0.5677
Epoch 11: Train D loss: -1.1750, G loss: 0.5820
Epoch 12: Train D loss: -1.2188, G loss: 0.5988
Epoch 13: Train D loss: -1.2543, G loss: 0.6115
Epoch 14: Train D loss: -1.2656, G loss: 0.6200
Epoch 15: Train D loss: -1.2664, G loss: 0.6195
Epoch 16: Train D loss: -1.2058, G loss: 0.6176
Epoch 17: Train D loss: -1.2978, G loss: 0.6354
Epoch 18: Train D loss: -1.3151, G loss: 0.6405
Epoch 19: Train D loss: -1.3089, G loss: 0.6427
image
Epoch 20: Train D loss: -1.2956, G loss: 0.6347
Epoch 21: Train D loss: -1.2645, G loss: 0.6462
Epoch 22: Train D loss: -1.1193, G loss: 0.6170
Epoch 23: Train D loss: -1.0726, G loss: 0.5990
Epoch 24: Train D loss: -1.2008, G loss: 0.6434
Epoch 25: Train D loss: -1.2399, G loss: 0.6336
Epoch 26: Train D loss: -1.2748, G loss: 0.6413
Epoch 27: Train D loss: -1.2918, G loss: 0.6473
Epoch 28: Train D loss: -1.3105, G loss: 0.6513
Epoch 29: Train D loss: -1.3160, G loss: 0.6507
image
Epoch 30: Train D loss: -1.2992, G loss: 0.6479
Epoch 31: Train D loss: -1.0788, G loss: 0.6045
Epoch 32: Train D loss: -1.1036, G loss: 0.5824
Epoch 33: Train D loss: -1.1215, G loss: 0.6005
Epoch 34: Train D loss: -0.7472, G loss: 0.5509
Epoch 35: Train D loss: -1.1456, G loss: 0.5953
Epoch 36: Train D loss: -1.1316, G loss: 0.6104
Epoch 37: Train D loss: -1.1104, G loss: 0.6178
Epoch 38: Train D loss: -0.9294, G loss: 0.5449
Epoch 39: Train D loss: -0.8962, G loss: 0.5298
image
Epoch 40: Train D loss: -0.9316, G loss: 0.5615
Epoch 41: Train D loss: -1.0236, G loss: 0.5511
Epoch 42: Train D loss: -1.0571, G loss: 0.5896
Epoch 43: Train D loss: -1.1424, G loss: 0.5962
Epoch 44: Train D loss: -1.1372, G loss: 0.5895
Epoch 45: Train D loss: -1.0107, G loss: 0.5562
Epoch 46: Train D loss: -1.0414, G loss: 0.5619
Epoch 47: Train D loss: -1.0015, G loss: 0.5283
Epoch 48: Train D loss: -1.0139, G loss: 0.5739
Epoch 49: Train D loss: -1.0580, G loss: 0.5779

......

image
Epoch 280: Train D loss: -0.5398, G loss: 0.2564
Epoch 281: Train D loss: -0.5926, G loss: 0.2978
Epoch 282: Train D loss: -0.5837, G loss: 0.3241
Epoch 283: Train D loss: -0.5839, G loss: 0.3225
Epoch 284: Train D loss: -0.5587, G loss: 0.1916
Epoch 285: Train D loss: -0.5656, G loss: 0.3763
Epoch 286: Train D loss: -0.5593, G loss: 0.3103
Epoch 287: Train D loss: -0.5779, G loss: 0.2773
Epoch 288: Train D loss: -0.5813, G loss: 0.3878
Epoch 289: Train D loss: -0.6136, G loss: 0.4114
image
Epoch 290: Train D loss: -0.5437, G loss: 0.3981
Epoch 291: Train D loss: -0.5895, G loss: 0.4018
Epoch 292: Train D loss: -0.5595, G loss: 0.3615
Epoch 293: Train D loss: -0.5514, G loss: 0.2601
Epoch 294: Train D loss: -0.5468, G loss: 0.3513
Epoch 295: Train D loss: -0.6066, G loss: 0.3609
Epoch 296: Train D loss: -0.5875, G loss: 0.3668
Epoch 297: Train D loss: -0.5536, G loss: 0.2995
Epoch 298: Train D loss: -0.5507, G loss: 0.2963
Epoch 299: Train D loss: -0.5845, G loss: 0.2848
image
image

WGAN-GP(improved wgan)

在WGAN中,需要进行截断, 在实验中发现: 对于比较深的WAGN,它不容易收敛。

大致原因如下:

  1. 实验发现最后大多数的权重都在-c 和c上,这就意味了大部分权重只有两个可能数,这太简单了,作为一个深度神经网络来说,这实在是对它强大的拟合能力的浪费.
  2. 实验发现容易导致梯度消失或梯度爆炸。判别器是一个多层网络,如果把clip的值设得稍微小了一点,每经过一层网络,梯度就变小一点点,多层之后就会指数衰减;反之,则容易导致梯度爆炸.

所以WGAN-GP使用了Gradient penalty(梯度惩罚)来代替clip.
因为Lipschitz限制是要求判别器的梯度不超过K,所以可以直接使用一个loss term来实现这一点,所以改进后D的优化目标改进为如下:


image

下面是WGAN-GP的具体代码实现,同WGAN,我们也只实现了他的训练代码,而模型我们直接使用DCGAN的模型.

import torch.autograd as autograd

def wgan_gp_train(trainloader, G, D, G_optimizer, D_optimizer, device, z_dim, lambda_=10, n_d=2):
    
    D.train()
    G.train()
    
    D_total_loss = 0
    G_total_loss = 0
    
    
    for i, (x, _) in enumerate(trainloader):
        x = x.to(device)

        # update D network
        # D optimizer zero grads
        D_optimizer.zero_grad()
        
        # D real loss from real images
        d_real = D(x)
        d_real_loss = - d_real.mean()
        
        # D fake loss from fake images generated by G
        z = torch.rand(x.size(0), z_dim).to(device)
        g_z = G(z)
        d_fake = D(g_z)
        d_fake_loss = d_fake.mean()
        
        # D gradient penalty
        
        #   a random number epsilon
        epsilon = torch.rand(x.size(0), 1, 1, 1).cuda()
        x_hat = epsilon * x + (1 - epsilon) * g_z
        x_hat.requires_grad_(True)

        y_hat = D(x_hat)
        #   computes the sum of gradients of y_hat with regard to x_hat
        gradients = autograd.grad(outputs=y_hat, inputs=x_hat, grad_outputs=torch.ones(y_hat.size()).cuda(),
                                  create_graph=True, retain_graph=True, only_inputs=True)[0]
        #   computes gradientpenalty
        gradient_penalty =  torch.mean((gradients.view(gradients.size()[0], -1).norm(p=2, dim=1) - 1) ** 2)
        
        # D backward and step
        d_loss = d_real_loss + d_fake_loss + lambda_ * gradient_penalty
        d_loss.backward()
        D_optimizer.step()
        
            
        D_total_loss += d_loss.item()

        # update G network
        # G optimizer zero grads
        if (i + 1) % n_d == 0:
            G_optimizer.zero_grad()

            # G loss
            g_z = G(z)
            d_fake = D(g_z)
            g_loss = - d_fake.mean()

            # G backward and step
            g_loss.backward()
            G_optimizer.step()
            
            G_total_loss += g_loss.item()
    
    return D_total_loss / len(trainloader), G_total_loss * n_d / len(trainloader)
# hyper params

# z dim
latent_dim = 100

# image size and channel
image_size=32
image_channel=3

# Adam lr and betas
learning_rate = 0.0002
betas = (0.5, 0.999)

# epochs and batch size
n_epochs = 300
batch_size = 32

# device : cpu or cuda:0/1/2/3
device = torch.device('cuda:0')

# n_d: train D
n_d = 2
lambda_ = 10

# mnist dataset and dataloader
train_dataset = load_furniture_data()
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# G and D model, use DCGAN, note that sigmoid is removed in D
G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)

# G and D optimizer, use Adam or SGD
G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)

d_loss_hist = []
g_loss_hist = []

for epoch in range(n_epochs):
    d_loss, g_loss = wgan_gp_train(trainloader, G, D, G_optimizer, D_optimizer, device, 
                           z_dim=latent_dim, lambda_=lambda_, n_d=n_d)
    print('Epoch {}: Train D loss: {:.4f}, G loss: {:.4f}'.format(epoch, d_loss, g_loss))
    
    d_loss_hist.append(d_loss)
    g_loss_hist.append(g_loss)
    
    if epoch == 0 or (epoch + 1) % 10 == 0:
        visualize_results(G, device, latent_dim)
Epoch 0: Train D loss: 1.1936, G loss: 2.7239
image
Epoch 1: Train D loss: -8.1520, G loss: 8.7105
Epoch 2: Train D loss: -14.5335, G loss: 15.9505
Epoch 3: Train D loss: -22.4751, G loss: 25.4797
Epoch 4: Train D loss: -25.5143, G loss: 26.5167
Epoch 5: Train D loss: -20.2827, G loss: 20.9673
Epoch 6: Train D loss: -15.2205, G loss: 17.7352
Epoch 7: Train D loss: -15.0674, G loss: 17.9785
Epoch 8: Train D loss: -14.2372, G loss: 19.3913
Epoch 9: Train D loss: -13.6457, G loss: 19.7493
image
Epoch 10: Train D loss: -12.9571, G loss: 20.5028
Epoch 11: Train D loss: -12.0761, G loss: 20.7169
Epoch 12: Train D loss: -12.5201, G loss: 21.4914
Epoch 13: Train D loss: -12.7979, G loss: 20.8781
Epoch 14: Train D loss: -11.8754, G loss: 21.4311
Epoch 15: Train D loss: -12.0360, G loss: 22.1997
Epoch 16: Train D loss: -12.3443, G loss: 21.8415
Epoch 17: Train D loss: -12.4492, G loss: 22.3451
Epoch 18: Train D loss: -12.4704, G loss: 23.1174
Epoch 19: Train D loss: -12.0635, G loss: 24.3485
image
Epoch 20: Train D loss: -11.5159, G loss: 23.7863
Epoch 21: Train D loss: -10.8694, G loss: 23.1774
Epoch 22: Train D loss: -11.7171, G loss: 23.6735
Epoch 23: Train D loss: -12.1799, G loss: 24.5387
Epoch 24: Train D loss: -11.2967, G loss: 24.4599
Epoch 25: Train D loss: -9.2917, G loss: 25.2789
Epoch 26: Train D loss: -11.7295, G loss: 24.9656
Epoch 27: Train D loss: -11.9890, G loss: 25.1133
Epoch 28: Train D loss: -11.0419, G loss: 26.9544
Epoch 29: Train D loss: -11.4329, G loss: 27.7644

......

image
Epoch 280: Train D loss: -5.3110, G loss: 45.2193
Epoch 281: Train D loss: -5.3459, G loss: 46.8995
Epoch 282: Train D loss: -5.4012, G loss: 45.6606
Epoch 283: Train D loss: -5.6629, G loss: 47.7304
Epoch 284: Train D loss: -6.0067, G loss: 47.8233
Epoch 285: Train D loss: -5.9803, G loss: 45.2547
Epoch 286: Train D loss: -5.6341, G loss: 48.4564
Epoch 287: Train D loss: -6.2482, G loss: 47.1421
Epoch 288: Train D loss: -5.5349, G loss: 46.8103
Epoch 289: Train D loss: -6.0081, G loss: 47.4786
image
Epoch 290: Train D loss: -6.1895, G loss: 49.2255
Epoch 291: Train D loss: -5.8228, G loss: 46.5874
Epoch 292: Train D loss: -6.7193, G loss: 50.4547
Epoch 293: Train D loss: -6.9497, G loss: 49.2031
Epoch 294: Train D loss: -6.4045, G loss: 49.5813
Epoch 295: Train D loss: -6.5181, G loss: 49.3917
Epoch 296: Train D loss: -5.3349, G loss: 49.1568
Epoch 297: Train D loss: -6.2215, G loss: 48.8781
Epoch 298: Train D loss: -6.0418, G loss: 50.5765
Epoch 299: Train D loss: -5.4949, G loss: 49.0278
image

同理,观察loss曲线和D上的参数分布.

loss_plot(d_loss_hist, g_loss_hist)
image
show_d_params(D)
image

作业:

观察WGAN和WGAN-GP生成器生成的图片效果,它们在相同epoch时生成的图片效果(或者说生成图片达到效果所需要epoch数量),它们的loss曲线以及D的参数分布,说说有什么不同?

:

  1. WGAN-GP converges faster under the same epoch;
  2. Loss curve of WGAN have a more stable but slower convergence, while the convergence of WGAN-GP is faster but still fluctuates after convergence;
  3. WGAN adopts the weight pruning strategy to forcibly satisfy that the gradient of each point within the defined domain is constant, which leads to that the parameters in the training process will always be truncated after updating.
  4. WGAN-GP adopts the strategy of gradient penalty, which guarantees that the L2 norm of the gradient relative to the original input should be bound, solving the problem of the explosion of the disappearing gradient of the training gradient.

你可能感兴趣的:(深度学习笔记(十三)—— GAN-2)