WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Traceback (most recent call last):
File "finetune.py", line 6, in
import bitsandbytes as bnb
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 7, in
from .autograd._functions import (
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in
import bitsandbytes.functional as F
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 13, in
setup.run_cuda_setup()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 101, in run_cuda_setup
binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 382, in evaluate_cuda_setup
cudart_path = determine_cuda_runtime_lib_path()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 247, in determine_cuda_runtime_lib_path
CUDASetup.get_instance().add_log_entry(f'{candidate_env_vars["CONDA_PREFIX"]} did not contain '
NameError: name 'CUDA_RUNTIME_LIB' is not defined
Traceback (most recent call last):
File "finetune.py", line 6, in
import bitsandbytes as bnb
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 7, in
from .autograd._functions import (
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in
import bitsandbytes.functional as F
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 13, in
setup.run_cuda_setup()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 101, in run_cuda_setup
binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 382, in evaluate_cuda_setup
cudart_path = determine_cuda_runtime_lib_path()
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 247, in determine_cuda_runtime_lib_path
CUDASetup.get_instance().add_log_entry(f'{candidate_env_vars["CONDA_PREFIX"]} did not contain '
NameError: name 'CUDA_RUNTIME_LIB' is not defined
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 372341) of binary: /home/gaosong/anaconda3/envs/vicuna8/bin/python
Traceback (most recent call last):
File "/home/gaosong/anaconda3/envs/vicuna8/bin/torchrun", line 8, in
sys.exit(main())
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-06-08_15:31:06
host : server
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 372342)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-06-08_15:31:06
host : server
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 372341)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
这个错误原因是查找cuda安装目录报的错
如果可以直接找到,就不会报这个错了
echo $CONDA_PREFIX 可以看到目录位置在
cd $CONDA_PREFIX/lib
检查是否存在
ls libcudart.so.11.0, 提醒不存在, 不要问我为什么叫 libcudart.so.11.0, 这个是我本机其它环境有的这个版本, 而且其它环境可用的
sudo find / -name 'libcudart.so.11.0'
找到此文件,复制到 $CONDA_PREFIX/lib 目录
我的目录是
cp /work1/home/gaosong/anaconda3/envs/gpt/lib/libcudart.so.11.0 $CONDA_PREFIX/lib
然后接着报错
CUDA SETUP: CUDA runtime path found: /home/gaosong/anaconda3/envs/vicuna8/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone [email protected]:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=117 make cuda11x
尝试这个版本
cd $CONDA_PREFIX/lib
rm -rf libcudart.so.11.0
cp /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudart.so.12 ./
mv libcudart.so.12 libcudart.so.12.0
# 升级到0.38.0 此处报错, 注释掉
# if USE_8bit is True:
# assert bnb.__version__ >= '0.37.2', "Please downgrade bitsandbytes's version, for example: pip install bitsandbytes==0.37.2"