图像生成对抗生成网络gan

Hello there! This is my story of making a GAN that would generate images of cars, with PyTorch.

你好！这是我用PyTorch制作可生成汽车图像的GAN的故事。

First of all, let me tell you what a GAN is — at least to what I understand what it is.

首先，让我告诉您GAN是什么-至少据我了解是什么。

A Generative Adversarial Network(GAN) is a network we use to generate something(image, sound… anything), What we challenge here is the ability of the machine to imagine something. (For the paragraph below know that a GAN has two networks: A Generator and a Discriminator.)

生成对抗网络(GAN)是我们用来生成事物(图像，声音……任何事物)的网络。我们在这里面临的挑战是机器想象事物的能力。 (对于下面的段落，知道GAN有两个网络：生成器和鉴别器。)

How do we make a machine imagine something?

我们如何使机器想象某些东西？

Let’s say we’re trying to make the machine imagine some form of data, just as usual, we’ll start with it(our generator) imagining random noise — data that doesn’t make any sense at all.

假设我们正试图使机器想象某种形式的数据，就像往常一样，我们将从它(我们的生成器)开始想象随机噪声，即完全没有意义的数据。

We then feed it to another network that is trained specifically to distinguish between fake and real data(our discriminator), this network enables us to say how fake the generated data is, knowing which we’ll update the generation process to make it more and more real over training.

然后，我们将其提供给另一个经过专门训练的网络，以区分假数据和真实数据(我们的鉴别器)，该网络使我们能够说出所生成数据的伪造程度，知道我们将更新生成过程以使其更多，并在训练上更加真实。

Also note that we’ll be training our discriminator at the same time too(of course, we freeze the generator for a second), to distinguish better between real and fake images as our generator gets better.

还要注意，我们也将同时训练鉴别器(当然，我们冻结生成器一秒钟)，以便在生成器变得更好时更好地区分真实图像和伪图像。

You can imagine it to be similar to two people playing a game, one person knows a target picture that the other has to draw, and the other just draws pictures, seeing the picture drawn, the first person gives the second some feedback on how close his picture looks to the target, based off which he makes changes and gets better and better towards an ideal picture.

您可以想象它类似于两个人在玩游戏，一个人知道另一个人必须绘制的目标图片，另一个人只是绘制图片，看到了绘制的图片，第一个人给出了第二个关于接近程度的反馈他的图片面向目标，在此基础上他进行了更改，并朝着理想的图片越来越好。

In the simplest terms, thats how it works.(feel free to correct me though)

用最简单的话来说就是这样(尽管可以纠正我)

我们的数据集 (Our Dataset)

I looked through kaggle for images of cars, and the dataset I found most suitable was the Stanford cars dataset.

我通过kaggle浏览了汽车图像，发现最合适的数据集是斯坦福汽车数据集。

However, this isn’t a dataset that would be ideal if you’re into generating highly accurate images of cars.(images are very different from one another, and things happening in the background apart from just the car)

但是，如果您要生成高度准确的汽车图像，则这不是一个理想的数据集(图像彼此之间非常不同，并且除了汽车之外，背景中发生的事情)

I wouldn’t be telling you much about the processing the data into PyTorch usable form as it isn’t really that important, Ive just resized all images to 256x256(try smaller sizes for better images though!) and converted them into typical, 3-channel image tensors.

我也不会有什么告诉你关于处理数据到PyTorch可用的形式，因为它是不是真的那么重要，我刚调整的所有图像，以256×256(尝试较小尺寸，更好的图像虽然！)并将它们改建典型，3通道图像张量。

Let’s jump right into our model architecture!

让我们直接进入我们的模型架构！

发电机 (The Generator)

We’re working on image generation, so our GAN is going to be a deep convolutional GAN(DC GAN).

我们正在致力于图像生成，因此我们的GAN将成为深度卷积GAN(DC GAN)。

For the generator, Ive taken an input vector of size 128, and then applied about 7 transposed convolution layers, to finally generate an image of size 3x256x256.(a RGB-channelled 256x256 image)

对于生成器，我获取了大小为128的输入向量，然后应用了约7个转置的卷积层，最终生成了大小为3x256x256的图像(RGB通道化256x256图像)。

Heres the generator for reference:

继承人发电机供参考：

So we do this operation called transposed convolutions, which is essentially helps us to make this latent vector transform into a tensor of our image size.

因此，我们执行此操作(称为转置卷积)，这实际上有助于我们将潜伏矢量转换为图像大小的张量。

Transposed convolutions, simply work the other way around when compared to convolutions, instead of making the size of the image decrease, it makes it increase.

与卷积相比，转置的卷积只是以相反的方式工作，而不是减小图像的大小，而是使其增大。

We take a kernel matrix and slide it over each of the input image pixels, multiplying their values each time and map it to the output image(essentially take a note of each multiplication on the output). If we encounter an overlap in the output image while sliding this kernel over the input image, we simply take the sum of the values in it.

我们取一个核矩阵并将其在每个输入图像像素上滑动，每次将它们的值相乘，然后将其映射到输出图像(本质上记下输出上每个乘法的注释)。如果在将该内核滑动到输入图像上时在输出图像中遇到重叠，我们只需取其中的值之和即可。

If you’d like to learn more about transposed convolutions you can check this link. (This is the blog I used to understand transposed convolutions better)

如果您想了解有关转置卷积的更多信息，可以查看此链接。 (这是我用来更好地理解转置卷积的博客)

鉴别者 (The Discriminator)

The discriminator helps train our generator, all that this has to do, is predict if the image is real or fake(a 1 or 0 in this case). I’ve used a network that consists of only convolution layers that start with an input image of size 3x256x256 and run it through 7 convolutional layers and finally end with a 1x1x1 sized tensor, we flatten and run a sigmoid activation to get values ranging from 0 to 1.

鉴别器帮助训练我们的生成器，这一切必须做的是，预测图像是真实的还是假的(在这种情况下为1或0)。我使用了一个仅包含卷积层的网络，该卷积层的大小为3x256x256的输入图像，并通过7个卷积层运行，最后以1x1x1大小的张量结束，我们展平并运行S形激活以获取值，范围从0到1。

here’s the code for the discriminator:

这是辨别器的代码：

训练我们的GAN (Training our GAN)

Ideally, our discriminator has to predict that all the images from our dataset are real images, so it should predict a 1 for all images in our training dataset, Also, It is supposed to say that every image from our generator(generated images) are fake( a 0 prediction). Keeping this concept in our mind, we train the discriminator.

理想情况下，我们的判别器必须预测数据集中的所有图像都是真实图像，因此对于训练数据集中的所有图像，其预测值都应为1。此外，应该说，生成器中的每个图像(生成的图像)都是假(0预测)。我们牢记这个概念，我们训练鉴别器。

here’s the code for discriminator training:

这是鉴别训练的代码：

Coming to our generator, our generator should make images so well, that our discriminator must be fooled, so our discriminator should return a 1 for each image generated, this becomes our target, and our score is what the discriminator returns for our generated images.(value ranging from 0–1)

来到我们的生成器时，生成器应该使图像非常好，以至于我们的鉴别器必须被愚弄，因此我们的鉴别器应该为生成的每个图像返回1，这成为我们的目标，而得分就是鉴别器针对我们生成的图像返回的分数。 (值介于0到1之间)

here’s the code for generator training:

这是发电机训练的代码：

We train both our generator and discriminator in tandem to fit to our dataset, as the generator gets better at generating images, our discriminator should get better at telling them apart, in turn making our generator understand more finer details in our image.

我们会同时训练生成器和鉴别器以适合我们的数据集，因为生成器在生成图像方面变得更好，我们的鉴别器在区分它们方面也应该变得更好，从而使生成器了解图像中更精细的细节。

结果 (Results)

So we started off with our first batch of random noise that looked like this:

因此，我们从第一批随机噪声开始，如下所示：

After about 20 epochs:

大约20个纪元后：

weird, squiggly images that kinda resemble cars.

怪异，扭曲的图像有点像汽车。

after about 50 epochs:

大约50个纪元后：

looks far off from real cars.

看起来与真正的汽车相去甚远。

After about 80 epochs we get this:

在大约80个时代之后，我们得到了：

To be fair, they don’t really look as great as real life images of cars but it’s getting somewhere.

公平地说，它们看上去并不像汽车的真实生活图像那样好看，但它正在普及。

I could run it for about 8 epochs more and here’s what I ended with:

我可以再运行约8个时间，这就是我的结局：

We’re getting something similar to cars, at least.

至少我们得到了类似于汽车的东西。

the best image of a car that I could Isolate from the above images was this one:

我可以从上述图像中分离出的最好的汽车图像是：

and this one:

还有这个：

Not really great, But not bad either.

并不是很好，但是也不错。

尾注 (End Notes)

While it isn’t really realistic looking cars that I ended up with, Im happy that I could get this far. The dataset that I’ve used has very different images of cars and was made for an entirely different purpose, So It’s cool to see it get this far.

尽管我最终得到的并不是真正逼真的汽车，但我很高兴能够做到这一点。我使用的数据集具有截然不同的汽车图像，并且是为完全不同的目的而制作的，因此很高兴看到它走了这么远。

引文 (Citations)

Dataset hosted on kaggle originally made for this paper:

由kaggle托管的数据集最初是为本文制作的：

3D Object Representations for Fine-Grained Categorisation

用于精细分类的3D对象表示

Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei

乔纳森·克劳斯(Jonathan Krause)，迈克尔·史塔克(Michael Stark)，贾登，李飞飞

4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013

在ICCV 2013(3dRR-13)上举行的第4届IEEE 3D表示和识别研讨会。 悉尼，澳大利亚。 2013年12月8日

Thanks to kaggle and the makers of this dataset for letting me experiment around.

感谢kaggle和该数据集的创建者，让我可以进行实验。

Notebooks saved on jovian.ml.

笔记本保存在jovian.ml上。

That’s about it for my story, Hope It was informative.

我的故事就是这样，希望它能提供很多信息。

翻译自: https://medium.com/swlh/gan-to-generate-images-of-cars-5f706ca88da