在RTX 3090 上判断,当前版本的的torch版本是否可以用,一般需要通过如下方式:
python
进入python 环境, import torch
导入torch 安装包;torch.cuda.is_available()
,torch.zeros(1).cuda()
直到,第四步骤完成,才能说明当前版本的cuda
可以调用当前版本的pytorch
;
问题的关键点:
- 安装pytorch 过程中, 需要两个注意点, 一个是当前安装的pytorch 版本, 该pytroch 版本官网指定包含了哪几个cuda 版本;
- 使用
pip install torch==1.8.1
的方式安装,默认的是torch 版本+ 当前主机上的cuda 版本- 可能出现的问题, 当前主机的cuda 版本 不兼容该torch版本中官方发布的几个cuda 版本;
>>> torch.zeros(1).cuda()
/home/respecting/anaconda3/envs/torch1.8.1/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
表明当前的安装的pytorch 版本没有匹配上合适的cuda
, 即当前pytorch 版本的 cuda 版本没有对应到自己主机上,安装的cuda 版本,
pytorch 环境中安装的cuda 版本, 需要满足以下两个条件:
具体讲来, 同一个pytorch 版本,比如 pytorch 1.8.1 会对应到不同版本的 cuda
# ROCM 4.0.1 (Linux only)
pip install torch==1.8.1+rocm4.0.1 torchvision==0.9.1+rocm4.0.1 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
# ROCM 3.10 (Linux only)
pip install torch==1.8.1+rocm3.10 torchvision==0.9.1+rocm3.10 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 11.1
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.1
pip install torch==1.8.1+cu101 torchvision==0.9.1+cu101 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
# CPU only
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pytorch 环境中安装的cuda 版本, 需要满足以下两个条件:
- 当前pytorch版本的算力支持 当前机器上显卡的算力;
- pytorch 中的 cuda 版本不能高于当前机器上已经安装的 cuda 版本;
知道了问题的原因之后, 我们便可以解决了:
cuda11.1 <= pytorch-cuda --version <= 当前机器上安装的 cuda --version
由于笔者机器上安装的是 cuda11.2 , 而3090对应的cuda 版本必须大于等于cuda11.1,
故安装pytorch 1.8.1 中的 cuda11.1 版本, 卸载重新安装对应版本;
pip install -i https://pypi.douban.com/simple torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl
Looking in indexes: https://pypi.douban.com/simple
Processing ./torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl
Requirement already satisfied: numpy in /home/respecting/anaconda3/envs/torch1.8.1/lib/python3.7/site-packages (from torch==1.8.1+cu111) (1.21.6)
Requirement already satisfied: typing-extensions in /home/respecting/anaconda3/envs/torch1.8.1/lib/python3.7/site-packages (from torch==1.8.1+cu111) (4.2.0)
Installing collected packages: torch
Attempting uninstall: torch
Found existing installation: torch 1.8.1
Uninstalling torch-1.8.1:
Successfully uninstalled torch-1.8.1
Successfully installed torch-1.8.1+cu111
torchvision install
(torch1.8.1) respecting@respecting-B360M-GAMING-HD:/media/respecting/Ubuntu 18.0/June18$ pip install -i https://pypi.douban.com/simple torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl
Looking in indexes: https://pypi.douban.com/simple
Processing ./torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl
Requirement already satisfied: numpy in /home/respecting/anaconda3/envs/torch1.8.1/lib/python3.7/site-packages (from torch==1.8.1+cu111) (1.21.6)
Requirement already satisfied: typing-extensions in /home/respecting/anaconda3/envs/torch1.8.1/lib/python3.7/site-packages (from torch==1.8.1+cu111) (4.2.0)
Installing collected packages: torch
Attempting uninstall: torch
Found existing installation: torch 1.8.1
Uninstalling torch-1.8.1:
Successfully uninstalled torch-1.8.1
Successfully installed torch-1.8.1+cu111
(torch1.8.1) respecting@respecting-B360M-GAMING-HD:~$ python
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.zeros(1).cuda()
tensor([0.], device='cuda:0')
>>>
3090Ti 系列 nvidia 显卡驱动
Version: 515.57
Release Date: 2022.6.28
Operating System: Linux 64-bit
Language: English (US)
File Size: 346.53 MB
这里给出,GPU 与 cuda, cudnn 之间的关系,
具体的环境的搭建参考这里,
安装步骤
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#fntarg_4
从这里可以看出:官方已经规定了 基于 Ada Lovelace 的 4090 显卡, 要求cuda >= 11.8
而截至到 2023. 06.10 , Pytorch=2.0.1 官方版本最高支持到 cuda 11.8。
Collecting nvidia-cuda-cupti-cu11==11.7.101
Downloading https://pypi.doubanio.com/packages/e6/9d/dd0cdcd800e642e3c82ee3b5987c751afd4f3fb9cc2752517f42c3bc6e49/nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 1.2 MB/s eta 0:00:00
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install torch==2.0.0 and torchvision==0.15.0 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested torch==2.0.0
torchvision 0.15.0 depends on torch==2.0.0+cu117
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts