kaggle 导出训练数据_5个用于训练gan的kaggle数据集

kaggle 导出训练数据

Generative adversarial networks(GANs) are a set of deep neural network models, introduced by Ian Goodfellow in 2014, used to generate synthetic data. GANs have been used in a wide variety of tasks including improving astronomical images, up-scaling resolution of old video games, and the most well known application, ‘deepfakes’ which involve human image synthesis. In this post, I will walk through some interesting data sets that can be used to train GAN models. This catalogue of data can serve as a starting point for those interested in getting started on building GAN models.

生成对抗网络(GANs)是Ian Goodfellow在2014年引入的一组深度神经网络模型,用于生成综合数据。 GAN已用于多种任务,包括改善天文图像,提高旧视频游戏的分辨率以及最著名的应用程序“ deepfakes”(涉及人类图像合成)。 在本文中,我将逐步介绍一些有趣的数据集,这些数据集可用于训练GAN模型。 该数据目录可作为有兴趣开始构建GAN模型的人员的起点。

Let’s get started!

让我们开始吧!

To start, let’s briefly go over the concepts behind GAN models. A GAN is composed of two competing neural networks, a generator and a discriminator. The generator is a modified convolutional neural network that learns to produce synthetic data from noise. The discriminator is a convolutional neural network that learns to distinguish between fake and real data. As model training proceeds the discriminator gets better at distinguishing between real and fake data and the generator gets better at generating realistic data.

首先,让我们简要介绍一下GAN模型背后的概念。 GAN由两个相互竞争的神经网络,一个生成器和一个鉴别器组成。 生成器是一种改进的卷积神经网络,可以学习从噪声中生成合成数据。 鉴别器是一个卷积神经网络,它学会区分假数据和真实数据。 随着模型训练的进行,判别器会更好地区分真实数据和伪造数据,而生成器会更好地生成真实数据。

Now, let’s get into some interesting data sets.

现在,让我们进入一些有趣的数据集。

抽象艺术数据 (Abstract Art Data)

kaggle 导出训练数据_5个用于训练gan的kaggle数据集_第1张图片
Source 资源

This data set contains 2782 abstract art images scraped from wikiart.org. This data can be used to build a GAN in order to generate synthetic images of abstract art. The data set contains images of real abstract art by Van Gogh, Dali, Picasso, and more.

该数据集包含2782个从wikiart.org刮取的抽象艺术图像。 此数据可用于构建GAN,以生成抽象艺术的合成图像。 数据集包含梵高,达利,毕加索等人的真实抽象艺术图像。

秀丽隐杆线虫的高内涵筛选 (High-Content screening with C. Elegans)

kaggle 导出训练数据_5个用于训练gan的kaggle数据集_第2张图片
Source 资源

This data contains images corresponding to screens to find novel antibiotics using roundworm C. Elegans. The data has images of roundworms infected with a pathogen called Enterococcus faecalis. Some of the images are of roundworms that have not been treated with the antibiotic, ampicillin, and others are of infected roundworms which have been treated with ampicillin. For those interested in applying GANs to an interesting drug discovery problem, this is a great place to start!

该数据包含与使用images虫C. Elegans寻找新抗生素的屏幕相对应的图像。 数据包含感染了粪便肠球菌的病原体的round虫图像。 一些图像是未经抗生素,氨苄青霉素处理过的round虫,其他图像是经过氨苄青霉素处理过的感染round虫。 对于有兴趣将GAN应用于有趣的药物发现问题的人来说,这是一个不错的起点!

肺部胸部X线异常 (Pulmonary Chest X-ray Abnormalities)

kaggle 导出训练数据_5个用于训练gan的kaggle数据集_第3张图片
Source 资源

This data set contains chest X-ray images that are clinically labeled by radiologists. There are 336 chest X-ray images with tuberculosis and 326 images that correspond to healthy individuals. This is a great data source for those who are interested in getting their feet wet with using GANs for medical image data synthesis.

该数据集包含放射科医生临床标记的胸部X射线图像。 有336例结核病胸部X线照片和326例健康个体的图像。 对于那些对使用GAN进行医学图像数据合成感兴趣的人来说,这是一个很好的数据源。

假面 (Fakefaces)

Source 资源

This data actually contains synthetic images of human faces generated by GANs. These images were scraped from the website This Person does not Exist. The site generates a new fake face image, produced by a GAN, each time you refresh the page. It is a great set of data to start with for generating synthetic images with GANs.

该数据实际上包含由GAN生成的人脸合成图像。 这些图像是从“ 此人不存在 ”网站上删除的 。 每次刷新页面时,该站点都会生成由GAN生成的新的伪造人脸图像。 从GANs生成合成图像开始,它是一组很好的数据。

眼镜或不戴眼镜 (Glasses or No Glasses)

kaggle 导出训练数据_5个用于训练gan的kaggle数据集_第4张图片
Source 资源

This data set contains images of faces with glasses and images of faces without glasses. While these images were generated using GANs, they can also serve as training data for generating additional synthetic images.

该数据集包含戴眼镜的人脸图像和不戴眼镜的人脸图像。 这些图像是使用GAN生成的,它们也可以用作训练数据以生成其他合成图像。

结论 (CONCLUSIONS)

To summarize, in this post we discussed five Kaggle data sets that can be used to generate synthetic images with GAN models. These data sources should be a good starting point for getting your feet wet with GANs. If you are interested in some useful code to get you started using GANs, check out this Intro to GANs Kaggle notebook. I hope you found this post useful/interesting. Thank you for reading!

总而言之,在本文中,我们讨论了五个Kaggle数据集,可用于使用GAN模型生成合成图像。 这些数据源应该是使用GAN弄湿脚的好起点。 如果您对一些有用的代码感兴趣,以帮助您开始使用GAN,请查看GANs Kaggle笔记本简介 。 我希望您发现这篇文章有用/有趣。 感谢您的阅读!

翻译自: https://towardsdatascience.com/5-kaggle-data-sets-for-training-gans-33dc2e035161

kaggle 导出训练数据

你可能感兴趣的:(机器学习,算法,tensorflow,人工智能,深度学习)