【使用paddle在linux双显卡报错 NVIDIA-NCCL2 is not installed correctly on your system】

  1. Installing NCCL

In order to download NCCL, ensure you are registered for the NVIDIA Developer Program.

Go to: NVIDIA NCCL home page.
Click Download.
Complete the short survey and click Submit.
Accept the Terms and Conditions. A list of available download versions of NCCL displays.
Select the NCCL version you want to install. A list of available resources displays. Refer to the following sections to choose the correct package depending on the Linux distribution you are using.

3.1. Ubuntu
Installing NCCL on Ubuntu requires you to first add a repository to the APT system containing the NCCL packages, then installing the NCCL packages through APT. There are two repositories available; a local repository and a network repository. Choosing the latter is recommended to easily retrieve upgrades when newer versions are posted.
In the following commands, please replace with your CPU architecture: x86_64, ppc64le, or sbsa, and replace with the Ubuntu version, for example ubuntu1604, ubuntu1804, or ubuntu2004.

1. Install the repository.
    
    For a local NCCL repository:
    sudo dpkg -i nccl-repo-.deb
    例如:sudo dpkg -i nccl-local-repo-ubuntu2204-2.14.3-cuda11.7_1.0-1_amd64.deb
    如果出现提示,按照提示运行命令。The public nccl-local-repo-ubuntu2204-2.14.3-cuda11.7 GPG key does not appear to be installed.To install the key, run this command: sudo cp /var/nccl-local-repo-ubuntu2204-2.14.3-cuda11.7/nccl-local-F0C3C384-keyring.gpg /usr/share/keyrings/
    Note:The local repository installation will prompt you to install the local key it embeds and with which packages are signed. Make sure to follow the instructions to install the local key, or the install phase will fail later.
    
    For the network repository:
    wget https://developer.download.nvidia.com/compute/cuda/repos///cuda-keyring_1.0-1_all.deb
    sudo dpkg -i cuda-keyring_1.0-1_all.deb

2. Update the APT database:

sudo apt update

3. Install the libnccl2 package with APT. Additionally, if you need to compile applications with NCCL, you can install the libnccl-dev package as well:
Note: If you are using the network repository, the following command will upgrade CUDA to the latest version.
sudo apt install libnccl2 libnccl-dev(我用的这个自动安装,没有指定特定版本)
If you prefer to keep an older version of CUDA, specify a specific version, for example:
sudo apt install libnccl2=2.4.8-1+cuda10.0 libnccl-dev=2.4.8-1+cuda10.0

Refer to the download page for exact package versions.

之后,再次检查,就会发现,成功解决了!

python
paddle.utils.run_check()
Running verify PaddlePaddle program …
PaddlePaddle works well on 1 GPU.
W0411 17:06:00.713511 9061 fuse_all_reduce_op_pass.cc:79] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 2 GPUs.
PaddlePaddle is installed successfully! Let’s start deep learning with PaddlePaddle now.

你可能感兴趣的:(paddlepaddle,linux,多卡训练)