pytorch tutorial -- DCGAN

  • About GAN

    1. GANs is a framework for teaching DL model to capture data’s distribution so we can generate new data from that same distribution.
    2. The job of the generator is to spawn ‘fake’ images that look like the training images.
    3. The job of the discriminator is to look at an image and output whether or not it is a real training image or a fake image from the generator.
    4. The equilibrium of this game is when the generator is generating perfect fakes that look as if they came directly from the training data, and the discriminator is left to always guess at 50% confidence that the generator output is real or fake.
    5. D(x) is the discriminator network which outputs the (scalar) probability that x came from training data rather than the generator. D(x) can also be thought of as a traditional binary classifier.
    6. (z) represents the generator function which maps the latent vector z to data-space. The goal of G is to estimate the distribution that the training data comes from (pdata) so it can generate fake samples from that estimated distribution (pg).
    7. G play a minimax game in which D tries to maximize the probability it correctly classifies reals and fakes (logD(x)), and G tries to minimize the probability that D will predict its outputs are fake (log(1−D(G(x)))).
    8. In theory, the solution to this minimax game is where pg=pdata, and the discriminator guesses randomly if the inputs are real or fake. However, the convergence theory of GANs is still being actively researched and in reality models do not always train to this point.
  • define network – generator and discriminator

    1. import torch.nn as nn
      notice: layer function, ks,stride,padding calculation
      best to do: input size -> state size -> … -> state size -> output size
      notice: inplace manipulation, nn.ReLU(True), nn.Tanh(), nn.Sigmoid(), nn.LeakyReLU(0.2, inplace=True)
    2. instantial and remove to device, weight initialization
      net.apply(weights_init), initial weights for specific layers
      weights_init function, parameter is model itself
    3. instantial optimizer
  • loss function

    1. define: criterion=nn.BCELoss()
    2. call:
  • data procession
    create dataset -> dataloader -> read -> visualize

    1. import torchvision.datasets as dset dset.ImageFolder
    2. import
    3. next(iter(dataloader))
    4. import matplotlib.pyplot as plt plt.figure axis title imshow
  • parameters to argparse

    1. At first, complete logics of program, throw parameters at the beginning of the program
    2. Finally, convert parameters in argparse format
  • trainning
    Be mindful that training GANs is somewhat of an art form, as incorrect hyperparameter settings lead to mode collapse with little explanation of what went wrong. Here, we will closely follow Algorithm 1 from Goodfellow’s paper, while abiding by some of the best practices shown in ganhacks.

    • Discriminator:
      “update the discriminator by ascending its stochastic gradient”, separate mini-batch
    1. Practically, we want to maximize log(D(x))+log(1−D(G(z))).
    2. First, we will construct a batch of real samples from the training set, forward pass through D, calculate the loss (log(D(x))), then calculate the gradients in a backward pass.
    3. Secondly, we will construct a batch of fake samples with the current generator, forward pass this batch through D, calculate the loss (log(1−D(G(z)))), and accumulate the gradients with a backward pass.
    4. Now, with the gradients accumulated from both the all-real and all-fake batches, we call a step of the Discriminator’s optimizer.
    • Generator:
    1. train the Generator by minimizing log(1−D(G(z))) in an effort to generate better fakes. As mentioned, this was shown by Goodfellow to not provide sufficient gradients, especially early in the learning process.
    2. As a fix, we instead wish to maximize log(D(G(z))). In the code we accomplish this by: classifying the Generator output from Part 1 with the Discriminator, computing G’s loss using real labels as GT, computing G’s gradients in a backward pass, and finally updating G’s parameters with an optimizer step. It may seem counter-intuitive to use the real labels as GT labels for the loss function, but this allows us to use the log(x) part of the BCELoss (rather than the log(1−x) part) which is exactly what we want.
    • About code
    1. track parameters list
    2. enumerate来批量读取dataloader
    3. 梯度置零,数据放cuda,数据传入网络,设置label向量,
    4. backward顺序:errD_real -> errD_fake (fake.detach()) -> errG
  • Visualization

    1. First, we will see how D and G’s losses changed during training.
      plt.figure, title, plot, xlabel, ylabel, legend, show
      two losses both converge to 0.5 as G gets better
    2. Second, we will visualize G’s output on the fixed_noise batch for every epoch.
      import matplotlib.animation as animation animation.ArtistAnimation
      from IPython.display import HTML HTML(ani.to_jshtml()) jupyter中的内联动画
    3. And third, we will look at a batch of real data next to a batch of fake data from G.
      real_batch = next(iter(dataloader))
      plt.figure(figsize=(15,15)) -> plt.subplot(1,2,1) -> plt.axies(‘off’), plt.title(‘real or fake’) attribute -> plt.imshow ->
      notice: np.transpose(vutils.make_grid(real_batch[0].to(device)[:64],padding=5,normalize=True).cpu(),(1,2,0)))
  • Where to Go Next?
    We have reached the end of our journey, but there are several places you could go from here. You could:

    1. Train for longer to see how good the results get
    2. Modify this model to take a different dataset and possibly change the size of the images and the model architecture
    3. Check out some other cool GAN projects here
    4. Create GANs that generate music
