ubuntu20.04 docker 下编译 tensorflow-gpu

ubuntu20.04 安装tensorflow-gpu

配置:
系统 ubuntu 20.04 LTS
显卡 GTX 1060 6G

1 安装cudatoolkit (我选 CUDA Toolkit 12.2 )

NVIDIA CUDA Installation Guide for Linux
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#prepare-ubuntu

选择 2.7 步骤,下载 deb 包,本地安装
2.7. Download the NVIDIA CUDA Toolkit

https://developer.nvidia.com/cuda-downloads

ubuntu20.04 docker 下编译 tensorflow-gpu_第1张图片
选择

linux x86_64 ubuntu 20.04 deb(local)

命令行执行

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2004-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

安装完成

2 安装cudNN (我选 CUDA Toolkit 12.2 对应的 版本 cuDNN v8.9.3 )

需要注册并且登录Nvidia 账号
然后到这个地址下载
https://developer.nvidia.com/rdp/cudnn-download

我选择这个,根据CUDA Toolkit 版本选对应的
Local Installer for Ubuntu20.04 x86_64 (Deb):

Download cuDNN v8.9.3 (July 11th, 2023), for CUDA 12.x

然后下载到本地安装

sudo chmod 777 cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb

完成

docker 拉取镜像 tensorflow/tensorflow:devel-gpu

参考 Docker Linux 构建 https://tensorflow.google.cn/install/source?hl=zh-cn

在某个目录,我这$PWD/home/wmx/software/tensorDocker

sudo docker run --gpus all -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)" tensorflow/tensorflow:devel-gpu bash

报错:

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled 

解决:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

启动成功:
ubuntu20.04 docker 下编译 tensorflow-gpu_第2张图片

你可能感兴趣的:(AI,tensorflow,人工智能)