nvidia-smi
Wed Nov 24 13:44:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:21:00.0 Off | N/A |
| 0% 35C P8 21W / 370W | 180MiB / 24265MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2341 G /usr/lib/xorg/Xorg 167MiB |
| 0 N/A N/A 2445 G /usr/bin/gnome-shell 11MiB |
+-----------------------------------------------------------------------------+
进入docker
nvidia-smi
bash: nvidia-smi: command not found
docker run -itd --gpus all --name rv -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all wq001/rastertorch
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
ERRO[0000] error waiting for container: context canceled
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I1124 05:37:22.076466 139016 nvc.c:372] initializing library context (version=1.6.0, build=dd2c49d6699e4d8529fbeaa58ee91554977b652e)
I1124 05:37:22.076510 139016 nvc.c:346] using root /
I1124 05:37:22.076516 139016 nvc.c:347] using ldcache /etc/ld.so.cache
I1124 05:37:22.076519 139016 nvc.c:348] using unprivileged user 65534:65534
I1124 05:37:22.076535 139016 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1124 05:37:22.076634 139016 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
I1124 05:37:22.078579 139017 nvc.c:274] loading kernel module nvidia
I1124 05:37:22.078778 139017 nvc.c:278] running mknod for /dev/nvidiactl
I1124 05:37:22.078815 139017 nvc.c:282] running mknod for /dev/nvidia0
I1124 05:37:22.078835 139017 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I1124 05:37:22.085097 139017 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I1124 05:37:22.085235 139017 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I1124 05:37:22.087446 139017 nvc.c:292] loading kernel module nvidia_uvm
I1124 05:37:22.087503 139017 nvc.c:296] running mknod for /dev/nvidia-uvm
I1124 05:37:22.087578 139017 nvc.c:301] loading kernel module nvidia_modeset
I1124 05:37:22.087630 139017 nvc.c:305] running mknod for /dev/nvidia-modeset
I1124 05:37:22.087983 139018 driver.c:101] starting driver service
I1124 05:37:22.089796 139016 nvc_info.c:758] requesting driver information with ''
I1124 05:37:22.090886 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.495.44
I1124 05:37:22.090931 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.495.44
I1124 05:37:22.090973 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.495.44
I1124 05:37:22.091011 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.495.44
I1124 05:37:22.091058 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.495.44
I1124 05:37:22.091101 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.495.44
I1124 05:37:22.091137 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.495.44
I1124 05:37:22.091167 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.495.44
I1124 05:37:22.091208 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.495.44
I1124 05:37:22.091234 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.495.44
I1124 05:37:22.091261 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.495.44
I1124 05:37:22.091292 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.495.44
I1124 05:37:22.091334 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.495.44
I1124 05:37:22.091373 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.495.44
I1124 05:37:22.091401 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.495.44
I1124 05:37:22.091435 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.495.44
I1124 05:37:22.091476 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.495.44
I1124 05:37:22.091517 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.495.44
I1124 05:37:22.091670 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.495.44
I1124 05:37:22.091761 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.495.44
I1124 05:37:22.091793 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.495.44
I1124 05:37:22.091825 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.495.44
I1124 05:37:22.091855 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.495.44
I1124 05:37:22.091903 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.495.44
I1124 05:37:22.091937 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.495.44
I1124 05:37:22.091982 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.495.44
I1124 05:37:22.092024 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.495.44
I1124 05:37:22.092054 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.495.44
I1124 05:37:22.092095 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.495.44
I1124 05:37:22.092123 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.495.44
I1124 05:37:22.092150 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.495.44
I1124 05:37:22.092182 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.495.44
I1124 05:37:22.092224 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.495.44
I1124 05:37:22.092264 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.495.44
I1124 05:37:22.092293 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.495.44
I1124 05:37:22.092325 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.495.44
I1124 05:37:22.092379 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libcuda.so.495.44
I1124 05:37:22.092429 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.495.44
I1124 05:37:22.092460 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.495.44
I1124 05:37:22.092489 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.495.44
I1124 05:37:22.092519 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.495.44
W1124 05:37:22.092533 139016 nvc_info.c:397] missing library libnvidia-nscq.so
W1124 05:37:22.092539 139016 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so
W1124 05:37:22.092548 139016 nvc_info.c:397] missing library libvdpau_nvidia.so
W1124 05:37:22.092555 139016 nvc_info.c:397] missing library libnvidia-ifr.so
W1124 05:37:22.092561 139016 nvc_info.c:397] missing library libnvidia-cbl.so
W1124 05:37:22.092564 139016 nvc_info.c:401] missing compat32 library libnvidia-cfg.so
W1124 05:37:22.092569 139016 nvc_info.c:401] missing compat32 library libnvidia-nscq.so
W1124 05:37:22.092574 139016 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so
W1124 05:37:22.092580 139016 nvc_info.c:401] missing compat32 library libnvidia-allocator.so
W1124 05:37:22.092584 139016 nvc_info.c:401] missing compat32 library libnvidia-ngx.so
W1124 05:37:22.092590 139016 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so
W1124 05:37:22.092596 139016 nvc_info.c:401] missing compat32 library libnvidia-ifr.so
W1124 05:37:22.092601 139016 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so
W1124 05:37:22.092605 139016 nvc_info.c:401] missing compat32 library libnvoptix.so
W1124 05:37:22.092610 139016 nvc_info.c:401] missing compat32 library libnvidia-cbl.so
I1124 05:37:22.092792 139016 nvc_info.c:297] selecting /usr/bin/nvidia-smi
I1124 05:37:22.092808 139016 nvc_info.c:297] selecting /usr/bin/nvidia-debugdump
I1124 05:37:22.092823 139016 nvc_info.c:297] selecting /usr/bin/nvidia-persistenced
I1124 05:37:22.092843 139016 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-control
I1124 05:37:22.092858 139016 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-server
W1124 05:37:22.092909 139016 nvc_info.c:423] missing binary nv-fabricmanager
I1124 05:37:22.092931 139016 nvc_info.c:341] listing firmware path /usr/lib/firmware/nvidia/495.44
I1124 05:37:22.092952 139016 nvc_info.c:520] listing device /dev/nvidiactl
I1124 05:37:22.092957 139016 nvc_info.c:520] listing device /dev/nvidia-uvm
I1124 05:37:22.092960 139016 nvc_info.c:520] listing device /dev/nvidia-uvm-tools
I1124 05:37:22.092963 139016 nvc_info.c:520] listing device /dev/nvidia-modeset
I1124 05:37:22.092983 139016 nvc_info.c:341] listing ipc path /run/nvidia-persistenced/socket
W1124 05:37:22.093001 139016 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket
W1124 05:37:22.093014 139016 nvc_info.c:347] missing ipc path /tmp/nvidia-mps
I1124 05:37:22.093020 139016 nvc_info.c:814] requesting device information with ''
I1124 05:37:22.098736 139016 nvc_info.c:705] listing device /dev/nvidia0 (GPU-88c8d8c8-8b2f-1f42-481d-6943f660960a at 00000000:21:00.0)
NVRM version: 495.44
CUDA version: 11.5
Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce RTX 3090
Brand: GeForce
GPU UUID: GPU-88c8d8c8-8b2f-1f42-481d-6943f660960a
Bus Location: 00000000:21:00.0
Architecture: 8.6
I1124 05:37:22.098764 139016 nvc.c:423] shutting down library context
I1124 05:37:22.099300 139018 driver.c:163] terminating driver service
I1124 05:37:22.099651 139016 driver.c:203] driver service terminated successfully
GPU的驱动正确安装,继续找
uname -a
Linux TRX40-AORUS-PRO 5.13.0-21-generic #21-Ubuntu SMP Tue Oct 19 08:59:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
1.distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
2.curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
报错
3.curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Unsupported distribution!
# Check https://nvidia.github.io/nvidia-docker
https://nvidia.github.io/nvidia-docker
手动指定,我这Ubuntu21咋整??
distribution="ubuntu20.04"
4.curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
5.sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
命中:1 http://download.zerotier.com/debian/bionic bionic InRelease
获取:2 http://security.ubuntu.com/ubuntu impish-security InRelease [110 kB]
命中:3 http://cn.archive.ubuntu.com/ubuntu impish InRelease
命中:4 https://download.docker.com/linux/ubuntu impish InRelease
获取:5 http://cn.archive.ubuntu.com/ubuntu impish-updates InRelease [110 kB]
命中:6 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease
命中:7 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease
命中:8 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease
获取:9 http://security.ubuntu.com/ubuntu impish-security/main amd64 DEP-11 Metadata [6,436 B]
获取:10 http://cn.archive.ubuntu.com/ubuntu impish-backports InRelease [101 kB]
获取:11 http://security.ubuntu.com/ubuntu impish-security/universe amd64 DEP-11 Metadata [2,312 B]
获取:12 http://cn.archive.ubuntu.com/ubuntu impish-updates/main amd64 DEP-11 Metadata [18.9 kB]
获取:13 http://cn.archive.ubuntu.com/ubuntu impish-updates/universe amd64 DEP-11 Metadata [4,196 B]
获取:14 http://cn.archive.ubuntu.com/ubuntu impish-backports/universe amd64 DEP-11 Metadata [9,284 B]
已下载 363 kB,耗时 4秒 (87.4 kB/s)
正在读取软件包列表... 完成
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成
nvidia-container-toolkit 已经是最新版 (1.6.0-1)。
nvidia-container-toolkit 已设置为手动安装。
下列软件包是自动安装的并且现在不需要了:
chromium-codecs-ffmpeg-extra gstreamer1.0-vaapi libgstreamer-plugins-bad1.0-0 libva-wayland2
使用'sudo apt autoremove'来卸载它(它们)。
升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 66 个软件包未被升级。
??
6.sudo systemctl restart docker
查看–gpus 参数是否安装成功
docker run --help | grep -i gpus
--gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
用不了GPU
重装20再来更吧
docker run -itd --gpus all --name rv -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all quay.io/azavea/raster-vision:pytorch-latest