pytorch 人脸修复
黑客数据科学工作流程(Hacking data science workflows)
I came across an interesting problem recently. A teammate and I were working on a series of Deep Learning experiments that involved an image dataset that spanned hundreds of gigabytes. Me, being the indecisive goof I am, wanted to understand whether the data was suited to a plethora of classification tasks, all spanning different configurations of the dataset.
我最近遇到了一个有趣的问题。 我和一个队友正在进行一系列深度学习实验,这些实验涉及跨越数百GB的图像数据集。 作为优柔寡断的我,我想了解数据是否适合于过多的分类任务,这些任务都跨越了数据集的不同配置。
This led us down a PyTorch DataLoader shaped rabbit hole for hours, before we nearly gave up in frustration. Thankfully though, the only thing more frustrating than writing scaffolding code is waiting for a virtual machine to finish copying files hordes across arbitrary directories.
在我们几乎放弃沮丧之前,这导致我们在一个PyTorch DataLoader形兔子Kong上钻了几个小时。 值得庆幸的是,唯一比编写脚手架代码更令人沮丧的是,等待虚拟机完成跨任意目录复制成群的文件。
Fortunately, we soon stumbled upon a solution and decided that it was time to give the DataLoader class a facelift. We took our messy scaffolding code, cleaned it up, and added the ability to not only dynamically label training data, but also specify subsets, perform custom pre-processing actions that feed into the DataLoader itself, and much more. A few hours of caffeine-induced code later, BetterLoader was born.
幸运的是,我们很快偶然发现了一个解决方案,并决定是时候对DataLoader类进行一些改进了。 我们采用了凌乱的脚手架代码,对其进行了清理,并添加了以下功能:不仅可以动态标记训练数据,还可以指定子集,执行自定义的预处理操作,这些操作将馈入DataLoader本身,等等。 几小时后,由咖啡因引发的代码诞生了BetterLoader 。
I’d be lying if I said I wasn’t at least kind of proud of the logo I whipped up 如果我说我至少不为我鞭打过的徽标感到骄傲,那我会撒谎BetterLoader gets rid of the default PyTorch DataLoader structure entirely. You can now store your files in a single, flat directory, and use the power of JSON configuration files to load your data in a ton of different ways. For example, here are a few lines of code that lets you load up a subset of a dataset with dynamic labels:
更好的加载器完全摆脱了默认的PyTorch DataLoader结构。 现在,您可以将您的文件存储在一个单一的,平坦的目录,并且使用的功率JSON配置文件加载在一吨的不同的方式您的数据。 例如,以下几行代码使您可以使用动态标签加载数据集的子集:
from betterloader import BetterLoaderindex_json = './examples/index.json'
basepath = "./examples/sample_dataset/"
batch_size = loader = BetterLoader(basepath=basepath, index_json_path=index_json)
loaders, sizes = loader.fetch_segmented_dataloaders(
batch_size=2,transform=None)
The cool part? Our index.json
just contains a list of key-value pairs, where the key is the class label, and the value is a list of associated file names. However, the BetterLoader function that reads the index file can be customised, and I’ve been able to use this library with regex, boolean conditions, and even MongoDB calls.
最酷的部分? 我们的index.json
仅包含一个键-值对列表,其中键是类标签,而值是关联文件名的列表。 但是,可以自定义读取索引文件的BetterLoader函数,并且我已经能够将此库与regex,布尔条件,甚至MongoDB调用一起使用。
My latest BetterLoader workflow involves checking if an image needs to be loaded, fetching crop centres from a MongoDB instance, creating a bunch of crops, and then feeding those crops to the loader. Instead of creating different scaffolding code every time though, I just use the BetterLoader.
我最新的BetterLoader工作流程涉及检查是否需要加载图像,从MongoDB实例获取作物中心,创建一堆作物,然后将这些作物供入加载器。 但是,我没有使用每次都创建不同的脚手架代码的方式,而是使用BetterLoader 。
So, yeah. That’s the BetterLoader — the PyTorch DataLoader on steroids. It’s still early days, but we’re super excited at the prospect of making our lives, and hopefully the lives of other Data Scientists, way easier. If you think BetterLoader sounds useful and you’ve somehow avoided all the links to the docs that I’ve sprinkled throughout this article, you can find the source code on Github here, and PyPi page here.
嗯是的。 那是更好的加载器- 类固醇的PyTorch DataLoader 。 仍处于起步阶段,但我们对让生活变得更轻松的前景充满希望,并希望其他数据科学家的生活更轻松。 如果你觉得BetterLoader听起来有用的,你以某种方式避免所有链接的文档,我已经在这篇文章中洒,你可以找到在Github上的源代码在这里,和网页的PyPI这里。
We’re also going to be opening up a ton of tickets to both add support for unsupervised deep learning methodologies, and to fix the plethora of issues that’ll eventually crop up since a caffeine-fueled binge seldom results in a perfectly stable library. We’d love for you to drop us a star, get involved, or just to stay tuned!
我们还将开放大量门票,以增加对无监督深度学习方法的支持,并解决由于咖啡因引发的狂欢很少导致完美稳定的库而最终导致的大量问题。 我们希望您为我们提供一颗星,参与其中或只是保持关注!
翻译自: https://towardsdatascience.com/fixing-the-pytorch-dataloader-990b336b8e5a
pytorch 人脸修复