多GPU并行训练选择

输入nvidia-smi topo -m得到

GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity
GPU0     X      SYS     SYS     NV4     0-47            N/A
GPU1    SYS      X      NV4     SYS     0-47            N/A
GPU2    SYS     NV4      X      PHB     0-47            N/A
GPU3    NV4     SYS     PHB      X      0-47            N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

在这里面,GPU间的通讯速度:
NV# > PIX > PXB > PHB > NODE > SYS

所以如果用两个GPU训练,最好使用GPU0和GPU3,或者GPU1和GPU2

ChatGPT解释:

NV# - NVLink is a high-speed direct GPU-to-GPU interconnect. Especially when multiple NVLinks ("#") are bonded together, this can provide the fastest communication speed between GPUs.

PIX - This represents communication that traverses at most a single PCIe bridge. It would generally be faster than connections that involve multiple bridges or the host bridge.

PXB - A connection that traverses multiple PCIe bridges but does not traverse the PCIe Host Bridge. This would be slower than a PIX connection because it involves more bridges.

PHB - This type of connection involves the traversal of the PCIe bus and a PCIe Host Bridge (usually the CPU). In terms of GPU-to-GPU communication, this would typically be slower than PIX and PXB due to the traversal of the host bridge.

NODE - If this were used, it would represent a connection traversing the PCIe and the interconnect between PCIe Host Bridges within a NUMA node. This would be slower than a PHB connection for GPU-to-GPU communication due to the additional steps involved.

SYS - This type of connection traverses the PCIe bus as well as the SMP interconnect between NUMA nodes. This would be the slowest type of connection for GPU-to-GPU communication due to the additional steps required to cross NUMA boundaries.

你可能感兴趣的:(计算机科学与技术,stable,diffusion,python,pytorch,人工智能,并行)