torch.utils.data.DataLoader()官方文档翻译

class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None, *, prefetch_factor=2, persistent_workers=False, pin_memory_device='')

数据加载器。组合数据集和采样器,并在给定数据集上提供迭代。

DataLoader 支持单进程或多进程加载、自定义加载顺序和optional automatic batching (collation)和memory pinning的map-style 和 iterable-style数据集。

参数:

  • dataset (Dataset) – dataset from which to load the data.      ------从中加载数据的数据集

  • batch_size (int, optional) – how many samples per batch to load (default: 1).     ------批次数

  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).      ------设置为 True,可以在每epoch对数据进行打乱。

  • sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.      -----定义从数据集中提取样本的策略。可以是实现了 _ _ len _ _ 的任何迭代器。如果指定了,则不能指定 shuffle。

  • batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.      ------类似采样器,同时返回一批索引,与batch_size, shuffle, sampler, 和drop_last参数互斥。

  • num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)      -----用于数据加载的子进程数量,0表示数据将在主进程中加载。

  • collate_fn (Callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.     ------合并一个样本列表以形成一个小批量的张量。当从map-style的数据集中使用批量加载时使用。

  • pin_memory (bool, optional) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below. ----如果为True,数据加载器将复制张量到设备/CUDA 固定内存中,然后再返回它们。如果数据元素是自定义类型,或者 Collate _ fn 返回自定义类型的批处理,请参见下面的示例。

  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)   ------ 如果数据集大小不能被批处理大小整除,则将其设置为 True 以删除最后一个未完成的批处理。如果 False 和数据集的大小不能被批处理大小整除,那么最后一个批处理将会更小。

  • timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)        ----- 如果为正值,则从workers那里收集批处理的超时值,应始终为非负值。这个参数是用来设置数据读取超时时间的,超过这个时间还没读取到数据的话就会报错。

  • worker_init_fn (Callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)     ------如果不是 None,那么在seeding之后和加载数据之前,将使用 worker id ([0,num _ workers-1]中的 int)作为输入对每个工作子进程调用这个函数。(理解:在数据导入前和步长结束后,根据工作子进程的ID逐个按顺序导入数据)

  • generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)   -----如果不是“None”,那么RandomSampler将使用 RNG 生成随机索引,并使用多处理为 worker 生成 base _ seed。红色字体作者存在疑问。

  • prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default: 2)    ------每个worker预先装载的批次数。2意味着在所有workers之间总共有2 * num _ worker 预取批处理

  • persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)     -----如果为 True,则数据加载器将不会在数据集消耗一次后关闭工作进程。这能维持workers数据集实例的活动状态。

  • pin_memory_device (str, optional) – the data loader will copy Tensors into device pinned memory before returning them if pin_memory is set to true.   ------如果 pin _ memory 设置为 true,则数据加载器将在返回之前将 Tensors 复制到设备固定内存中

实例的地址:

"""
    批训练,把数据变成一小批一小批数据进行训练。
    DataLoader就是用来包装所使用的数据,每次抛出一批数据
"""
import torch
import torch.utils.data as Data
 
BATCH_SIZE = 5
 
x = torch.linspace(1, 11, 11)
y = torch.linspace(11, 1, 11)
print(x)
print(y)
# 把数据放在数据库中
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    # 从数据库中每次抽出batch size个样本
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    # num_workers=2,
)
 
 
def show_batch():
    for epoch in range(3):
        for step, (batch_x, batch_y) in enumerate(loader):
            # training
            print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))
 
 
if __name__ == '__main__':
    show_batch()

结果:

tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])
tensor([11., 10.,  9.,  8.,  7.,  6.,  5.,  4.,  3.,  2.,  1.])
steop:0, batch_x:tensor([ 6.,  3.,  4., 10., 11.]), batch_y:tensor([6., 9., 8., 2., 1.])
steop:1, batch_x:tensor([5., 1., 8., 2., 9.]), batch_y:tensor([ 7., 11.,  4., 10.,  3.])
steop:2, batch_x:tensor([7.]), batch_y:tensor([5.])
steop:0, batch_x:tensor([10.,  7.,  4., 11.,  2.]), batch_y:tensor([ 2.,  5.,  8.,  1., 10.])
steop:1, batch_x:tensor([3., 8., 6., 1., 9.]), batch_y:tensor([ 9.,  4.,  6., 11.,  3.])
steop:2, batch_x:tensor([5.]), batch_y:tensor([7.])
steop:0, batch_x:tensor([ 3., 10.,  6.,  8., 11.]), batch_y:tensor([9., 2., 6., 4., 1.])
steop:1, batch_x:tensor([9., 7., 4., 2., 1.]), batch_y:tensor([ 3.,  5.,  8., 10., 11.])
steop:2, batch_x:tensor([5.]), batch_y:tensor([7.])

你可能感兴趣的:(深度学习,人工智能)