Anaconda官网版本档案
https://repo.anaconda.com/archive/
cd ~
mkdir download
cd download
下载Anaconda安装包
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh
bash Anaconda3-2023.03-Linux-x86_64.sh
创建Python虚拟环境
conda create -n 名称 python=版本
设置Anaconda路径
$ vim ~/.bashrc
加入安装路径
# Anaconda3
export PATH="/home/XXXX/anaconda3/bin:$PATH"
source activate
或
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
echo 'source activate' >> ~/.bashrc
更新配置
source ~/.bashrc
错误的结果就是配置的所有虚拟环境都以base的python版本运行,无法配置每个虚拟环境使用不同python版本,失去了虚拟环境意义。
https://www.nvidia.com/download/index.aspx?lang=en-us
nvidia-smi
不要在 WSL 中安装任何 Linux 显卡驱动程序
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2
生产环境:V100x4
系统版本:Ubuntu 22.04
凌晨还在用watch显示使用状态
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:08.0 Off | 0 |
| N/A 47C P0 184W / 300W | 6945MiB / 16384MiB | 75% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-16GB Off | 00000000:00:09.0 Off | 0 |
| N/A 45C P0 249W / 300W | 7863MiB / 16384MiB | 91% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-16GB Off | 00000000:00:0A.0 Off | 0 |
| N/A 45C P0 194W / 300W | 7983MiB / 16384MiB | 75% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-16GB Off | 00000000:00:0B.0 Off | 0 |
| N/A 35C P0 41W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1548534 C python 6942MiB |
| 1 N/A N/A 1548535 C python 7860MiB |
| 2 N/A N/A 1548536 C python 7980MiB |
+---------------------------------------------------------------------------------------+
中午就发现这样了
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.104
不管是nvtop还是nvitop还是gpustat都不管用
$ lspci | grep -i nvidia
00:08.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:09.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0a.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0b.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.86.10 Wed Jul 26 23:20:03 UTC 2023
GCC version: gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)
$ dpkg -l | grep nvidia
ii gpustat 0.6.0-1 all pretty nvidia device monitor
iU libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-535 535.86.10-0ubuntu1 all Shared files used by the NVIDIA libraries
iU libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA libcompute package
iU libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA Video Decoding runtime libraries
iU libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVENC Video Encoding runtime libraryiU libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 Extra libraries for the NVIDIA driver
iU libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-535:amd64 535.86.10-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
iU nvidia-compute-utils-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA compute utilities
iU nvidia-dkms-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA DKMS package
iU nvidia-driver-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA driver metapackage
iU nvidia-firmware-535-535.104.05 535.104.05-0ubuntu0.22.04.4 amd64 Firmware files used by the kernel module
ii nvidia-kernel-common-535 535.86.10-0ubuntu1 amd64 Shared files used with the kernel module
iU nvidia-kernel-source-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA kernel source package
ii nvidia-modprobe 535.86.10-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-prime 0.8.17.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 535.86.10-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
iU nvidia-utils-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA driver support binaries
ii screen-resolution-extra 0.18.2 all Extension for the nvidia-settings control panel
iU xserver-xorg-video-nvidia-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA binary Xorg driver
$ cat /proc/driver/nvidia/version
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade libnvidia-gl-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 upgrade nvidia-dkms-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 upgrade nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-configured nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 install nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-installed nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status unpacked nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 upgrade libnvidia-decode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade libnvidia-compute-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-extra-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-encode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
原来是偷偷升级了535.86.10 -> 535.104.05,NVIDIA 内核驱动版本与系统驱动不一致
sudo apt-mark hold nvidia-driver-版本
$ sudo apt-mark hold nvidia-driver-535
nvidia-driver-535 set on hold.
考虑生产环境保持软件和环境稳定,关闭软件包自动更新
sudo dpkg-reconfigure unattended-upgrades
$ sudo dpkg-reconfigure unattended-upgrades
Replacing config file /etc/apt/apt.conf.d/20auto-upgrades with new version
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local
历史版本
https://developer.nvidia.com/cuda-toolkit-archive
WSL 上的 CUDA 用户指南
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
W: GPG error: file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY CDD5140FF7B46061
E: The repository 'file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
删除GPG key
sudo apt-key del 7fa2af80
安装GPG key
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-F7B46061-keyring.gpg /usr/share/keyrings/
nvcc -V
编辑路径配置
vim ~/.bashrc
加入系统路径
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
或
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"' >> ~/.bashrc
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export CUDA_HOME="$CUDA_HOME:/usr/local/cuda"'>> ~/.bashrc
更新配置
source ~/.bashrc
- 官方提供的CUDA(Toolkit)
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
- Conda提供子环境方法cudatoolkit
conda install cudatoolkit=10.0 -c pytorch
- 安装官方CUDA Toolkit,选用与显卡驱动匹配的最新版,它向下兼容
它提供用于创建高性能 GPU 加速应用程序的完整开发环境,包括 GPU 加速库、调试和优化工具、C/C++ 编译器以及用于部署应用程序的运行时库。
- 安装虚拟子环境CUDA Toolkit 的版本不能高于主环境中的官方CUDA版本
为了匹配子环境其他软件版本,在虚拟子环境中安装的其他版本CUDA toolkit,属于运行时库等动态链接库,用于调用CUDA功能。
https://developer.nvidia.com/rdp/cudnn-archive
需要注册账号登录下载
sudo apt-get install zlib1g
(base) fb@VP01:~/download$ conda activate modelscope
(modelscope) fb@VP01:~/download$ sudo apt-get install zlib1g
[sudo] password for fb:
Readi