导师配了一个台式机,便着手配置PyTorch环境。根据台式机的显卡驱动(472.12)、CUDA、cuDNN版本安装好PyTorch之后,调用torch.cuda.is_available()函数,可以发现PyTorch-GPU版本已经安装成功。
import torch
print(torch.__version__)
print(torch.cuda.is_available())
# 1.10.1
# True
但是安装的PyTorch却无法调用GPU进行运算
a = torch.Tensor(5,3)
print(a)
a.cuda()
# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
# [9.1837e-39, 9.3674e-39, 1.0745e-38],
# [1.0653e-38, 9.5510e-39, 1.0561e-38],
# [1.0194e-38, 1.1112e-38, 1.0561e-38],
# [9.9184e-39, 1.0653e-38, 4.1327e-39]])
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Jupyter Notebook还提示我们:
D:\Anaconda3\lib\site-packages\torch\cuda\__init__.py:83: UserWarning:
Found GPU%d %s which is of cuda capability %d.%d.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability supported by this library is %d.%d.
warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
PyTorch no longer supports this GPU because it is too old. 我们的GPU型号比较旧(GeForce GT 730,2G显存,算力3.5),现在的PyTorch已经不支持了。
在按照Python的提示设置CUDA_LAUNCH_BLOCKING=1,即禁用所有cuda应用程序异步执行,仍然不能正常使用GPU进行运算
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
a = torch.Tensor(5,3)
print(a)
a.cuda()
# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
# [9.1837e-39, 9.3674e-39, 1.0745e-38],
# [1.0653e-38, 9.5510e-39, 1.0561e-38],
# [1.0194e-38, 1.1112e-38, 1.0561e-38],
# [9.9184e-39, 1.0653e-38, 4.1327e-39]])
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
由于Jupyter Notebook提示“当前的PyTorch已经不支持我们的GPU”,故可以尝试降低PyTorch版本。
但将PyTorch版本由1.10.1降到1.9.1、1.9.0、1.8.0后,Python仍然报出相同的错误提示,无法调用GPU进行运算。在添加了conda的下载镜像源之后,使用conda install来下载PyTorch依然非常慢(至少需要4h,且安装过程可能中断,此处省略n天……),故我们采用离线的方式来安装PyTorch。
在清华镜像网站中可以下载cudatoolkit、pytorch、torchvision、torchaudio的离线安装包。通过conda install --use-local安装离线安装包(在tar.brz文件的下载目录中运行)
conda install --use-local pytorch-1.7.1-py3.9_cuda110_cudnn8_0.tar.bz2
conda install --use-local cudatoolkit-11.0.221-h74a9793_0.tar.bz2
conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2
conda install -c anaconda torchaudio==0.7.2 // 有些包在conda默认的channels中不包含,比如cudatoolkit-8.0,cudnn等,这时只需要在conda install指令后加上-c anaconda即可
conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2 // torchvision的版本变成了0.2.2
由于有些项目还不支持最新的Python3.9,故新建一个基于Python3.7的环境(便于以后使用),同时安装对应版本的CUDA、cuDNN,并不断对PyTorch降级。
conda install --use-local pytorch-1.6.0-py3.7_cuda102_cudnn7_0.tar.bz2
conda install --use-local cudatoolkit-10.2.89-h74a9793_1.tar.bz2
conda install --use-local torchaudio-0.6.0-py37.tar.bz2
conda install --use-local torchvision-0.7.0-py37_cu102.tar.bz2
PyTorch1.5.1版本不需要torchaudio
conda install --use-local pytorch-1.5.1-py3.7_cuda92_cudnn7_0.tar.bz2
conda install --use-local cudatoolkit-9.2-0.tar.bz2
conda install --use-local torchvision-0.6.1-py37_cu92.tar.bz2
此时,Jupyter Notebook已经“不再提示”GPU型号比较旧,PyTorch不支持了。但是Python仍然报出相同的错误提示,无法调用GPU进行运算。
a = torch.Tensor(5,3)
print(a)
a.cuda()
# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
# [9.1837e-39, 9.3674e-39, 1.0745e-38],
# [1.0653e-38, 9.5510e-39, 1.0561e-38],
# [1.0194e-38, 1.1112e-38, 1.0561e-38],
# [9.9184e-39, 1.0653e-38, 4.1327e-39]])
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
根据一些博客的讨论,错误RuntimeError: CUDA error: no kernel image is available for execution on the device可能是由于GPU的算力小于3.5。于是我们查找资料,探究各个版本的PyTorch所支持的GPU算力:
PyTorch | Pyton | CUDA | cuDNN | Architectures |
---|---|---|---|---|
pytorch-1.0.0 | py3.7 | cuda10.0.130 | cudnn7.4.1_1 | sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.0.0 | py3.7 | cuda8.0.61 | cudnn7.1.2_1 | sm_20, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61 |
pytorch-1.0.0 | py3.7 | cuda9.0.176 | cudnn7.4.1_1 | sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_70 |
pytorch-1.0.1 | py3.7 | cuda10.0.130 | cudnn7.4.2_0 | sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.0.1 | py3.7 | cuda10.0.130 | cudnn7.4.2_2 | sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.0.1 | py3.7 | cuda8.0.61 | cudnn7.1.2_0 | sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61 |
pytorch-1.0.1 | py3.7 | cuda8.0.61 | cudnn7.1.2_2 | sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61 |
pytorch-1.0.1 | py3.7 | cuda9.0.176 | cudnn7.4.2_0 | sm_35, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.0.1 | py3.7 | cuda9.0.176 | cudnn7.4.2_2 | sm_35, sm_50, sm_60, sm_70 |
pytorch-1.1.0 | py3.7 | cuda10.0.130 | cudnn7.5.1_0 | sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.1.0 | py3.7 | cuda9.0.176 | cudnn7.5.1_0 | sm_35, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.2.0 | py3.7 | cuda9.2.148 | cudnn7.6.2_0 | sm_35, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.2.0 | py3.7 | cuda10.0.130 | cudnn7.6.2_0 | sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.2.0 | py3.7 | cuda9.2.148 | cudnn7.6.2_0 | sm_35, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.3.0 | py3.7 | cuda10.0.130 | cudnn7.6.3_0 | sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.3.0 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.3.0 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.3.1 | py3.7 | cuda10.0.130 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.3.1 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.3.1 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.4.0 | py3.7 | cuda10.0.130 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.4.0 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.4.0 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.5.0 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.5.0 | py3.7 | cuda10.2.89 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.5.0 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.5.1 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.5.1 | py3.7 | cuda10.2.89 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.5.1 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.6.0 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.6.0 | py3.7 | cuda10.2.89 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.6.0 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.7.0 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.7.0 | py3.7 | cuda10.2.89 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.7.0 | py3.7 | cuda11.0.221 | cudnn8.0.3_0 | sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80 |
pytorch-1.7.0 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.7.1 | py3.7 | cuda10.1.243 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.7.1 | py3.7 | cuda10.2.89 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.7.1 | py3.7 | cuda11.0.221 | cudnn8.0.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80 |
pytorch-1.7.1 | py3.7 | cuda9.2.148 | cudnn7.6.3_0 | sm_37, sm_50, sm_60, sm_61, sm_70 |
pytorch-1.8.0 | py3.7 | cuda10.1 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.8.0 | py3.7 | cuda10.2 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.8.0 | py3.7 | cuda11.1 | cudnn8.0.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86 |
pytorch-1.8.1 | py3.7 | cuda10.1 | cudnn7.6.3_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.8.1 | py3.7 | cuda10.2 | cudnn7.6.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75 |
pytorch-1.8.1 | py3.7 | cuda11.1 | cudnn8.0.5_0 | sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86 |
参考:pytorch 报错 RuntimeError: CUDA error: no kernel image is available for execution on the device
我的显卡(GeForce GT 730,2G显存),算力为3.5,应该适用于绝大多数PyTorch版本,但无法调用GPU进行运算。
因此,需要从其他途径寻找解决办法。
通过查找资料,我们发现有一些国际友人也遇到了和我们同样的问题,他们通过pip install --pre解决了该问题
pip uninstall torch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html
参考: Cuda error: no kernel image is available for execution on the device #31285
其中的cu110对应着CUDA11.0。如果自己的CUDA版本为CUDA11.x,则可以将此处替换为cu11x
这种方法“或许”可行,但是由于python服务器与中国大陆的距离较远,pip install的速度非常慢,上述命令的下载速度仅有2kb/s(好羡慕国外的网友)。或许我们得放弃这种方法了。
通过查看上述代码的python安装过程提示,可以发现pip install --pre torch torchvision是从某个网站逐个下载某个版本的torch和torchvision。因此,我们可以尝试从国内的镜像源下载并安装该版本的PyTorch安装包
pip install C:\Users\Lenovo\Downloads\torch-1.10.1-cp37-cp37m-win_amd64.whl
结果发现安装的PyTorch是CPU版本的。无计可施,我们只能放弃这种方法。
在查阅了大量的资料后,我们发现了一种可行的方法:Building PyTorch from source.
即使GPU的型号很老,也能通过这种方法使用较新版本的PyTorch来进行GPU运算。
We will have to:
It will take a few hours to build if you have a relatively strong machine, but otherwise it’s very painless to do.
####################
### source build ###
####################
# one time prep
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
# then
docker run --runtime=nvidia --rm nvidia/cuda:11.0-base nvidia-smi
# find the latest container at https://ngc.nvidia.com/catalog/containers/nvidia:pytorch (use Tags tab),
# but also check that the driver version isn't too high in release notes here:
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
# then pull it:
docker pull nvcr.io/nvidia/pytorch:21.04-py3
#docker run --gpus all --ipc=host --rm -it nvcr.io/nvidia/pytorch:21.04-py3
# to mount some host system dir inside the docker -v src:tgt
docker run --gpus all --ipc=host --rm -it -v ~/github/00pytorch/pytorch/docker:/tmp/output nvcr.io/nvidia/pytorch:21.04-py3
# once docker is running:
conda create -n pytorch-dev python=3.8 -y
bash
conda init bash
conda activate pytorch-dev
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
# adjust cuda113 below to whatever cuda version the image is for (cuda110, etc.)
conda install -c pytorch magma-cuda113 -y
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
#git submodule sync
#git submodule update --init --recursive
#git pull
# to build a wheel
unset PYTORCH_BUILD_VERSION
unset PYTORCH_VERSION
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
CUDA_HOME="/usr/local/cuda" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
USE_SYSTEM_NCCL=1 \
NCCL_INCLUDE_DIR="/usr/include/" \
NCCL_LIB_DIR="/usr/lib/" \
python setup.py bdist_wheel 2>&1 | tee build.log
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output
# adjust TORCH_CUDA_ARCH_LIST if needed, the full list is:
# TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX"
# had to install nccl .so objects on the target system
# could also add:
# USE_OPENCV=1 \
# but need to have matching .so objects on the target system
# NEXT: build torchvision - since many packages depend on it
cd ..
git clone https://github.com/pytorch/vision
cd vision
# if you are updating an existing checkout
#git pull
# to build a wheel
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
python setup.py bdist_wheel
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output
cd ..
git clone --recursive https://github.com/pytorch/audio
cd audio
# if you are updating an existing checkout
#git submodule sync
#git submodule update --init --recursive
#git pull
# to build a wheel
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
BUILD_SOX=1 python setup.py bdist_wheel
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output
参考:
- Cuda error: no kernel image is available for execution on the device #31285
- Building PyTorch from source on Windows to work with an old GPU
- How to install pytorch FROM SOURCE (with cuda enabled for a deprecated CUDA cc 3.5 of an old gpu) using anaconda prompt on Windows 10?
- How to Compile the Latest Pytorch from Source in Windows with CUDA Support
- Building Pytorch from source with cuda support on WSL2(Ubuntu 20.04, cuda11.4, Windows11)
- build from source 安装 PyTorch及很多坑
这可能也会花费很多时间,我们先尝试别的方法.
通过查找同型号显卡(GeForce GT 730)的PyTorch安装步骤,发现有网友使用PyTorch1.0.0版本安装成功并正常使用。但是这个版本太低了,很多新功能应该无法使用。
我们只能尝试继续对PyTorch降级,最终降到1.2.0版本时,终于可以正常使用了!
import torch
print(torch.__version__)
print(torch.cuda.is_available())
# 1.2.0
# True
a = torch.Tensor(5,3)
print(a)
print(a.cuda())
# tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
# [6.3619e-43, 7.5296e+16, 6.3619e-43],
# [7.5296e+16, 6.3619e-43, 7.5305e+16],
# [6.3619e-43, 7.5305e+16, 6.3619e-43],
# [7.5291e+16, 6.3619e-43, 7.5291e+16]])
# tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
# [6.3619e-43, 7.5296e+16, 6.3619e-43],
# [7.5296e+16, 6.3619e-43, 7.5305e+16],
# [6.3619e-43, 7.5305e+16, 6.3619e-43],
# [7.5291e+16, 6.3619e-43, 7.5291e+16]], device='cuda:0')