小白学Pytorch系列--Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)

小白学Pytorch系列–Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)

方法 注释
nn.DataParallel 在模块级实现数据并行。
nn.parallel.DistributedDataParallel 实现基于torch的分布式数据并行。模块级的分布式包。

nn.DataParallel

小白学Pytorch系列--Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)_第1张图片
小白学Pytorch系列--Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)_第2张图片

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)  # input_var can be on any device, including CPU

nn.parallel.DistributedDataParallel

小白学Pytorch系列--Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)_第3张图片

>>> torch.distributed.init_process_group(
>>>     backend='nccl', world_size=N, init_method='...'
>>> )
>>> model = DistributedDataParallel(model, device_ids=[i], output_device=i)

小白学Pytorch系列--Torch.nn API DataParallel Layers (multi-GPU, distributed)(17)_第4张图片

>>> import torch
>>> import torch.distributed as dist
>>> import os
>>> import torch.multiprocessing as mp
>>> import torch.nn as nn
>>> # On each spawned worker
>>> def worker(rank):
>>>     dist.init_process_group("nccl", rank=rank, world_size=2)
>>>     torch.cuda.set_device(rank)
>>>     model = nn.Linear(1, 1, bias=False).to(rank)
>>>     model = torch.nn.parallel.DistributedDataParallel(
>>>         model, device_ids=[rank], output_device=rank
>>>     )
>>>     # Rank 1 gets one more input than rank 0.
>>>     inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]
>>>     with model.join():
>>>         for _ in range(5):
>>>             for inp in inputs:
>>>                 loss = model(inp).sum()
>>>                 loss.backward()
>>>     # Without the join() API, the below synchronization will hang
>>>     # blocking for rank 1's allreduce to complete.
>>>     torch.cuda.synchronize(device=rank)

你可能感兴趣的:(PyTorch框架,pytorch,深度学习,python)