安装顺序以及我使用的版本:
(pytorch) zq@zkti:~/pytorch$ conda -V
conda 4.5.11
Version: 455.23.04 WHQL
(pytorch) zq@zkti:~/pytorch$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
(pytorch) zq@zkti:~/pytorch$ cat /usr/local/cuda/include/cudnn_version.h
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
8.0.5
>>> print(torch.__version__)
1.8.0a0+c0723a0
开源版本下载连接:https://www.anaconda.com/products/individual
sudo bash Anaconda3-5.3.1-Linux-x86_64.sh
...
installation finished.
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/sammy/.bashrc ? [yes|no]
[no] >>>
输入yes,将conda的路径加入PATH中
source ~/.bashrc
conda list
在安装cuda之前首先检查system requirement
cuda安装官方指引:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements
在Table 1. Native Linux Distribution Support in CUDA 11.1中提示需要使用GCC 9.X 或者8.2 or abover,我本机中的默认gcc version为7.5 所以首先升级gcc
(base) zq@zkti:~/Downloads/gcc-build-8.2.0/gcc$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) zq@zkti:~/Downloads$ sudo apt install software-properties-common
[sudo] password for zq:
Reading package lists... Done
Building dependency tree
Reading state information... Done
···
(base) zq@zkti:~/Downloads$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
Toolchain test builds; see https://wiki.ubuntu.com/ToolChain
More info: https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test
Press [ENTER] to continue or Ctrl-c to cancel adding it.
···
(base) zq@zkti:~/Downloads$ sudo apt install gcc-8 g++-8
Reading package lists... Done
Building dependency tree
Reading state information... Done
(base) zq@zkti:~/Downloads$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80 --slave /usr/bin/g++ g++ /usr/bin/g++-8
update-alternatives: using /usr/bin/gcc-8 to provide /usr/bin/gcc (gcc) in auto mode
(base) zq@zkti:~/Downloads$
(base) zq@zkti:~/Downloads$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 70 --slave /usr/bin/g++ g++ /usr/bin/g++-7
(base) zq@zkti:~/Downloads$ sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/bin/gcc-8 80 auto mode
1 /usr/bin/gcc-7 70 manual mode
2 /usr/bin/gcc-8 80 manual mode
Press <enter> to keep the current choice[*], or type selection number:
(base) zq@zkti:~/Downloads$ gcc --version
gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
···
GCC升级后正式开始安装NVIDIA driver 455.23.04
官网下载连接:https://www.nvidia.com/Download/driverResults.aspx/163522/en-us
安装
sudo bash NVIDIA-Linux-x86_64-455.23.04.run
安装后即可查看
zq@zkti:~$ nvidia-smi
Fri Nov 20 11:15:53 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3080 Off | 00000000:01:00.0 Off | N/A |
| 30% 44C P0 59W / 320W | 0MiB / 10016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Attention : 这里的CUDA Version不是本机的,是driver对应支持的,所以现在来安装 CUDA toolkit 11.1.0
安装说明文档:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#introduction
下载链接:https://developer.nvidia.com/zh-cn/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal
在安装前请仔细查看system requirment and pre-install actions
选择正确的操作系统、架构、发行版本、版本、已经安装程序类型:Linux x86_64 Ubuntu 18.04 runfile
这里我使用的是runfile:
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
sudo sh cuda_11.1.0_455.23.05_linux.run
安装完成后验证:
zq@zkti:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
安装说明文档:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#verify
安装前同上要匹配对应版本:https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#cudnn-versions-804-805
选择自己CUDA 和 cuDNN 以及driver 对应的版本
这里我选择安装的是8.0.5的最新版本
下载对应版本:https://developer.nvidia.com/cudnn
我下载的tar包,建议大家下载tar包使用方便:
$ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
github地址:https://github.com/pytorch/pytorch#from-source
1、创建新的虚拟环境
conda create pytorch python=3.7
conda activate pytorch
2、安装依赖
pip install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
3、下载依赖
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
4、编译安装
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install
5、验证,我这里出错,这提示应该使用develop版本的安装
>>> import torch
Traceback (most recent call last):
File "" , line 1, in <module>
File "/home/zq/pytorch/torch/__init__.py", line 218, in <module>
''').strip()) from None
ImportError: Failed to load PyTorch C extensions:
It appears that PyTorch has loaded the `torch/_C` folder
of the PyTorch repository rather than the C extensions which
are expected in the `torch._C` namespace. This can occur when
using the `install` workflow. e.g.
$ python setup.py install && python -c "import torch"
This error can generally be solved using the `develop` workflow
$ python setup.py develop && python -c "import torch" # This should succeed
or by running Python from a different directory.
6、根据提示再次安装
zq@zkti:~/pytorch$ python setup.py develop
Building wheel torch-1.8.0a0
-- Building version 1.8.0a0
cmake --build . --target install --config Release -- -j 20
···
7、再次验证
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
True
>>> print(torch.backends.cudnn.version())
8005
参考链接:https://oldpan.me/archives/pytorch-build-simple-instruction
总结:在安装的过程中一定要仔细检查自己的环境和软件版本信息,根据NVIDIA的官网说明和要求核对,一环扣一环,pytorch这里还不支持cuda11.1的conda一键安装所以只能编译,一步一步来,踩了很多坑希望能对大家有帮助