PyTorch 分布式训练模型 - 集群

【黑暗之魂】 史诗向/神话/无火的余烬

 

 

distributed_backend="nccl"
init_method="env://"


def init_distributed(backend="nccl",
                     init_method="env://", *,
                     warning=True):
    # A simple initializer of distributed

    from torch import distributed

    if not distributed.is_available():
        raise RuntimeError("`distributed` is not available.")

    if not is_distributed():
        raise RuntimeError(
            f"For distributed training, use `python -m torch.distributed.launch "
            f"--nproc_per_node={device_count()} {args}` ...")

    if distributed.is_initialized():
        if warning:
            logger.w

你可能感兴趣的:(PyTorch)