「已解决」使用DDP多卡训练在All distributed processes registered. Starting with 8 processes卡死

使用DDP进行多卡加速训练,卡在以下位置:

----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 8 processes
----------------------------------------------------------------------------------------------------

解决方法

export NCCL_P2P_DISABLE=1

你可能感兴趣的:(pytorch)