pip 安装 deepspeed 时,遇到了如下错误:
(torch_game) [sealgo@ocr-gpu-129-48 baidu]$ pip install deepspeed -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting deepspeed
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/73/f2/c6760ca21855ff8a0a787dc9943e0a15c833db0eefb424f9af8703668a64/deepspeed-0.10.2.tar.gz (858 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [8 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-mx_jkfk4/deepspeed_16a92dd0211a4b64a0cbf49f1127eab5/setup.py", line 100, in
cuda_major_ver, cuda_minor_ver = installed_cuda_version()
File "/tmp/pip-install-mx_jkfk4/deepspeed_16a92dd0211a4b64a0cbf49f1127eab5/op_builder/builder.py", line 41, in installed_cuda_version
assert cuda_home is not None, "CUDA_HOME does not exist, unable to compile CUDA op(s)"
AssertionError: CUDA_HOME does not exist, unable to compile CUDA op(s)
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
解决:
这种情况一般是由于cuda 重新安装或没有在默认路径 /usr/local/
下,因此没有找到系统环境变量 CUDA_HOME
找到自己的cuda 安装路径后,使 CUDA_HOME
指向相应的位置即可:
export CUDA_HOME=/user_path/cuda-12.0/
然后 pip install deepspeed
即可