参考:
https://docs.docker.com/engine/install/ubuntu/#uninstall-docker-engine
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-linux
命令如下:
sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
参考:https://learn.microsoft.com/zh-cn/windows/wsl/tutorials/gpu-compute
命令为:
curl https://get.docker.com | sh
sudo service docker start
日志如下:
~$ curl https://get.docker.com | sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 18565 100 18565 0 0 46172 0 --:--:-- --:--:-- --:--:-- 46181
# Executing docker install script, commit: a8a6b338bdfedd7ddefb96fe3e7fe7d4036d945a
Warning: the "docker" command appears to already exist on this system.
If you already have Docker installed, this script can cause trouble, which is
why we're displaying this warning and provide the opportunity to cancel the
installation.
If you installed the current Docker package using this script and are using it
again to update Docker, you can safely ignore this message.
You may press Ctrl+C now to abort this script.
+ sleep 20
WSL DETECTED: We recommend using Docker Desktop for Windows.
Please get Docker Desktop from https://www.docker.com/products/docker-desktop
You may press Ctrl+C now to abort this script.
+ sleep 20
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
+ sudo -E sh -c mkdir -p /etc/apt/keyrings && chmod -R 0755 /etc/apt/keyrings
+ sudo -E sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | gpg --dearmor --yes -o /etc/apt/keyrings/docker.gpg
+ sudo -E sh -c chmod a+r /etc/apt/keyrings/docker.gpg
+ sudo -E sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu jammy stable" > /etc/apt/sources.list.d/docker.list
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null
+ sudo -E sh -c docker version
Client: Docker Engine - Community
Version: 23.0.5
API version: 1.42
Go version: go1.19.8
Git commit: bc4487a
Built: Wed Apr 26 16:21:07 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.5
API version: 1.42 (minimum version 1.12)
Go version: go1.19.8
Git commit: 94d3ad6
Built: Wed Apr 26 16:21:07 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.20
GitCommit: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
runc:
Version: 1.1.5
GitCommit: v1.1.5-0-gf19387a
docker-init:
Version: 0.19.0
GitCommit: de40ad0
================================================================================
To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:
dockerd-rootless-setuptool.sh install
Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.
To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/
WARNING: Access to the remote API on a privileged Docker daemon is equivalent
to root access on the host. Refer to the 'Docker daemon attack surface'
documentation for details: https://docs.docker.com/go/attack-surface/
================================================================================
~$ sudo service docker start
命令如下:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-docker-keyring.gpg
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
日志如下:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-docker-keyring.gpg
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb [signed-by=/usr/share/keyrings/nvidia-docker-keyring.gpg] https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
$ sudo apt-get update
Get:1 file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease [1575 B]
Get:1 file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease [1575 B]
Get:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease [1484 B]
Hit:3 https://download.docker.com/linux/ubuntu jammy InRelease
Get:4 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease [1481 B]
Get:5 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease [1474 B]
Get:6 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 Packages [25.5 kB]
Hit:7 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:8 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages [7416 B]
Hit:9 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Get:10 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages [4488 B]
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [108 kB]
Hit:12 http://security.ubuntu.com/ubuntu jammy-security InRelease
Fetched 150 kB in 2s (59.9 kB/s)
Reading package lists... Done
$ sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
bridge-utils dns-root-data dnsmasq-base ubuntu-fan xsltproc
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
The following NEW packages will be installed:
libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base nvidia-docker2
0 upgraded, 5 newly installed, 0 to remove and 2 not upgraded.
Need to get 3904 kB of archives.
After this operation, 15.1 MB of additional disk space will be used.
Get:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 libnvidia-container1 1.13.1-1 [931 kB]
Get:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 libnvidia-container-tools 1.13.1-1 [24.8 kB]
Get:3 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 nvidia-container-toolkit-base 1.13.1-1 [2122 kB]
Get:4 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 nvidia-container-toolkit 1.13.1-1 [821 kB]
Get:5 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 nvidia-docker2 2.13.0-1 [5548 B]
Fetched 3904 kB in 1s (2991 kB/s)
Selecting previously unselected package libnvidia-container1:amd64.
(Reading database ... 74216 files and directories currently installed.)
Preparing to unpack .../libnvidia-container1_1.13.1-1_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.13.1-1) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../libnvidia-container-tools_1.13.1-1_amd64.deb ...
Unpacking libnvidia-container-tools (1.13.1-1) ...
Selecting previously unselected package nvidia-container-toolkit-base.
Preparing to unpack .../nvidia-container-toolkit-base_1.13.1-1_amd64.deb ...
Unpacking nvidia-container-toolkit-base (1.13.1-1) ...
Selecting previously unselected package nvidia-container-toolkit.
Preparing to unpack .../nvidia-container-toolkit_1.13.1-1_amd64.deb ...
Unpacking nvidia-container-toolkit (1.13.1-1) ...
Selecting previously unselected package nvidia-docker2.
Preparing to unpack .../nvidia-docker2_2.13.0-1_all.deb ...
Unpacking nvidia-docker2 (2.13.0-1) ...
Setting up nvidia-container-toolkit-base (1.13.1-1) ...
Setting up libnvidia-container1:amd64 (1.13.1-1) ...
Setting up libnvidia-container-tools (1.13.1-1) ...
Setting up nvidia-container-toolkit (1.13.1-1) ...
Setting up nvidia-docker2 (2.13.0-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
若要运行机器学习框架容器并开始使用此 NVIDIA NGC TensorFlow 容器的 GPU,请输入 命令:
docker run --gpus all -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:22.04-tf2-py3
日志如下:
kangpengtao@LAPTOP-3SUHS40U:~$ docker run --gpus all -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:20.03-tf2-py3
================
== TensorFlow ==
================
NVIDIA Release 20.03-tf2 (build 11026100)
TensorFlow Version 2.1.0
Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2019 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use 'nvidia-docker run' to start this container; see
https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
root@a92ec134fa67:/workspace# cd nvidia-examples/cnn/
root@a92ec134fa67:/workspace/nvidia-examples/cnn# python resnet.py --batch_size=64
2023-05-02 18:54:25.352825: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2023-05-02 18:54:26.518389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2023-05-02 18:54:26.520776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
PY 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
TF 2.1.0
Script arguments:
--image_width=224
--image_height=224
--distort_color=False
--momentum=0.9
--loss_scale=128.0
--image_format=channels_last
--data_dir=None
--data_idx_dir=None
--batch_size=64
--num_iter=300
--iter_unit=batch
--log_dir=None
--export_dir=None
--tensorboard_dir=None
--display_every=10
--precision=fp16
--dali_mode=None
--use_xla=False
--predict=False
2023-05-02 18:54:27.810662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2023-05-02 18:54:27.934852: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:27.934907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2023-05-02 18:54:27.934931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2023-05-02 18:54:27.934989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-02 18:54:27.957090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-05-02 18:54:27.961855: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-05-02 18:54:28.008659: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-05-02 18:54:28.013889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-05-02 18:54:28.013952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-05-02 18:54:28.014459: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.014862: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.014910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2023-05-02 18:54:28.048429: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592005000 Hz
2023-05-02 18:54:28.050463: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4022130 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-05-02 18:54:28.050502: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-05-02 18:54:28.365085: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.365354: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x401e740 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-05-02 18:54:28.365391: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 2060, Compute Capability 7.5
2023-05-02 18:54:28.365869: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.365908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2023-05-02 18:54:28.365929: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2023-05-02 18:54:28.365941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-02 18:54:28.365969: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-05-02 18:54:28.365981: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-05-02 18:54:28.365992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-05-02 18:54:28.366004: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-05-02 18:54:28.366017: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-05-02 18:54:28.366446: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.366958: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:28.366991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2023-05-02 18:54:28.367765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2023-05-02 18:54:30.644568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-05-02 18:54:30.644616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2023-05-02 18:54:30.644626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2023-05-02 18:54:30.648735: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:30.648781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-05-02 18:54:30.649217: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-05-02 18:54:30.649309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4612 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:Expected a shuffled dataset but input dataset `x` is not shuffled. Please invoke `shuffle()` on input dataset.
2023-05-02 18:54:42.671689: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-02 18:54:43.252228: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.829411). Check your callbacks.
global_step: 10 images_per_sec: 34.1
global_step: 20 images_per_sec: 248.9
global_step: 30 images_per_sec: 253.4
global_step: 40 images_per_sec: 255.0
global_step: 50 images_per_sec: 255.8
global_step: 60 images_per_sec: 254.3
global_step: 70 images_per_sec: 251.9
global_step: 80 images_per_sec: 256.0
global_step: 90 images_per_sec: 253.7
global_step: 100 images_per_sec: 251.6
global_step: 110 images_per_sec: 254.6
global_step: 120 images_per_sec: 253.4
global_step: 130 images_per_sec: 251.1
global_step: 140 images_per_sec: 252.9
global_step: 150 images_per_sec: 253.4
global_step: 160 images_per_sec: 253.5
global_step: 170 images_per_sec: 252.0
global_step: 180 images_per_sec: 254.4
global_step: 190 images_per_sec: 252.7
global_step: 200 images_per_sec: 252.3
global_step: 210 images_per_sec: 250.2
global_step: 220 images_per_sec: 250.4
global_step: 230 images_per_sec: 248.9
global_step: 240 images_per_sec: 250.6
global_step: 250 images_per_sec: 249.9
global_step: 260 images_per_sec: 250.9
global_step: 270 images_per_sec: 252.0
global_step: 280 images_per_sec: 252.9
global_step: 290 images_per_sec: 252.6
global_step: 300 images_per_sec: 249.9
epoch: 0 time_taken: 92.3
300/300 - 92s - loss: 9.1162 - top1: 0.7911 - top5: 0.8168
root@a92ec134fa67:/workspace/nvidia-examples/cnn# nvidia
nvidia-smi nvidia_entrypoint.sh
root@a92ec134fa67:/workspace/nvidia-examples/cnn# nvidia-smi
Tue May 2 18:58:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50 Driver Version: 531.79 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2060 On | 00000000:01:00.0 On | N/A |
| N/A 51C P8 11W / N/A| 989MiB / 6144MiB | 9% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+