【软件安装】安装deepspeed 时 cuda 报错

pip 安装 deepspeed 时,遇到了如下错误:

(torch_game) [sealgo@ocr-gpu-129-48 baidu]$ pip install deepspeed -i https://pypi.tuna.tsinghua.edu.cn/simple 
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting deepspeed
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/73/f2/c6760ca21855ff8a0a787dc9943e0a15c833db0eefb424f9af8703668a64/deepspeed-0.10.2.tar.gz (858 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "", line 2, in 
        File "", line 34, in 
        File "/tmp/pip-install-mx_jkfk4/deepspeed_16a92dd0211a4b64a0cbf49f1127eab5/setup.py", line 100, in 
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-mx_jkfk4/deepspeed_16a92dd0211a4b64a0cbf49f1127eab5/op_builder/builder.py", line 41, in installed_cuda_version
          assert cuda_home is not None, "CUDA_HOME does not exist, unable to compile CUDA op(s)"
      AssertionError: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

这种情况一般是由于cuda 重新安装或没有在默认路径 /usr/local/ 下,因此没有找到系统环境变量 CUDA_HOME
找到自己的cuda 安装路径后,使 CUDA_HOME 指向相应的位置即可:
export CUDA_HOME=/user_path/cuda-12.0/
然后 pip install deepspeed 即可
