为Tensorflow2.x提供GPU支持

Tensorflow-gpu安装

安装docker及nvidia-docker

在Ubuntu上安装Docker并使得Docker支持GPU

安装Tensorflow

在宿主机上安装GPU驱动

  • 查找合适的Nvidia驱动器版本并安装
sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

使用python3.8作为基础镜像

  • 拉取镜像
sudo docker pull python:3.8
  • 编写docker-compose.yml
version: '3'
services:
	tensorflow_gpu:
	container_name: tensorflow_gpu
	image: python:3.8
	user: "0"
	working_dir: /home
	volumes:
		./src:/home
	deploy:
		resources:
			reservations:
				devices:
					- driver: nvidia
					  count: "all"
					  capabilities: [gpu]
	stdin_open: true
	tty: true
	command: /bin/bash -c "chown -R 1002:1002 . && /bin/bash"
  • 创建容器
sudo docker-compose up -d

在Docker内安装cuda

  • 查看最大支持的cuda版本
nvidia-smi
  • 选择指定版本的cuda,安装类型选择runfile(local)
wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

CUDA Toolkit Archive | NVIDIA Developer

  • 安装gcc-7和g++-7
cp /etc/apt/source.list /etc/apt/source.list.bak
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse" > source.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse" >> source.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse" >> source.list
echo "deb https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse" >> source.list
apt update | grep NO_PUBKEY
gpg --keyserver keyserver.ubuntu.com --recv-keys $key
gpg --export --armor $key | apt-key add -
apt update
apt install gcc-7 -y
apt install g++-7 -y
apt upgrade -y
apt autoremove
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 90
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 90
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 50
update-alternatives  --install /usr/bin/g++ g++ /usr/bin/g++-10 50
update-alternatives --config gcc

apt update 报错NO_PUBKEY

  • 安装cuda,选择仅安装cuda-toolkit
sh cuda_10.1.243_418.87.00_linux.run
echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CPUTI/lib64" >> ~/.bashrc
echo "export CUDA_HOME=/usr/local/cuda/bin" >> ~/.bashrc
echo "export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME" >> ~/.bashrc
source ~/.bashrc

Ubuntu安装cuda

在Docker内安装tensorflow

  • pip安装对应版本的tensorflow
pip install tensorflow=2.3.0

安装 GPU 支持项

  • 测试tensorflow
python
import tensorflow as tf
tf.test.is_gpu_available()

在Docker内安装cudnn

  • 下载解压对应版本的cudnn,并移动到cuda中
 cp -r -d $path/lib64/* /usr/local/cuda/lib64/

cuDNN Archive
安装cudnn

  • 重新测试tensorflow
import tensorflow as tf
tf.config.list_physical_devices('GPU')
  • 强制使用CPU
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

环境设置需在执行tf初始化前使用

Tensoflow案例

函数拟合

import os

import tensorflow as tf

 
class Linear(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(
            units=1,
            activation=None,
            kernel_initializer=tf.zeros_initializer(),
            bias_initializer=tf.zeros_initializer()
        )
 
    def call(self, input):
        output = self.dense(input)
        return output


def demo_func():
    X = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    y = tf.constant([[7.0], [8.0]])

    model = Linear()
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
    for i in range(1000):
        with tf.GradientTape() as tape:
            y_pred = model(X)      
            loss = tf.reduce_mean(tf.square(y_pred - y))
        # 使用model.variables直接获得模型中的所有变量
        grads = tape.gradient(loss, model.variables)    
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
        if i % 100 == 0:
            print(i, loss.numpy())
    print(model.variables)

if __name__ == "__main__":
    print('Tensorflow vesion:{}'.format(tf.__version__))
    
    use_gpu = True
    if use_gpu:
        print('Default to GPU')
        print('GPU Info:{}'.format(tf.config.list_physical_devices('GPU')))
    else:
        print('Set to use CPU')
        os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 
        os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
        print('GPU Info:{}'.format(tf.config.list_physical_devices('GPU')))

    demo_func() 

你可能感兴趣的:(tensorflow,python,docker)