GPU训练前的事

GPU训练前的事

flyfish

查看CUDA 版本

cat /usr/local/cuda/version.txt

结果

CUDA Version 10.1.168

查看cuDNN 版本

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

结果
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

nvcc命令

nvcc --version

结果

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

nvidia-smi命令

nvidia-smi
 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   26C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

其他对应

NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     

执行命令
ubuntu-drivers devices

结果

== /sys/devices/pci0000:00/0000:00:08.0 ==
modalias : pci:v000010DEd000015F8sv000010DEsd0000118Fbc03sc02i00
vendor   : NVIDIA Corporation
model    : GP100GL [Tesla P100 PCIe 16GB]
driver   : nvidia-driver-418 - third-party free recommended
driver   : nvidia-driver-410 - third-party free
driver   : xserver-xorg-video-nouveau - distro free builtin

CUDA的环境变量

export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

cuDNN的安装

安装过程就是下载文件,解压文件,拷贝文件,更改权限

tar xzvf cudnn-*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

多版本CUDA版本共存可切换

 sudo rm -rf cuda
 sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda

建立软链接:ln -s 原目录 映射目录
删除软链接: rm -rf 映射目录

run
如果下载的是run文件则执行
sudo sh cuda_*_linux.run`

object_detection的环境变量

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

版本对应
tensorflow_gpu-1.13.1
Python: 2.7、3.3-3.6
GCC 4.8
Bazel 0.19.2
cuDNN: 7.4
CUDA 10.0
驱动 410

驱动版本与CUDA的对应关系
NVIDIA graphics driver R418 or newer for CUDA 10.1
NVIDIA graphics driver R410 or newer for CUDA 10.0

按需使用GPU

config = tf.ConfigProto() 
config.gpu_options.allow_growth = True 
session = tf.Session(config=config)

查看 operations 和 Tensor 在哪个设备上运行

with tf.Session(graph=detection_graph,config=tf.ConfigProto(log_device_placement=True)) as sess:

1秒查看一次GPU情况
watch -n 1 nvidia-smi

设置GPU持续模式(persistence-mode)
nvidia-smi -pm 1

查看硬盘空间
df -h

挂载硬盘

查看有几个硬盘
fdisk -l

假设要挂载的硬盘是Disk /dev/vdb

创建目录之后执行挂载命令
mkdir hdd
mount /dev/vdb hdd

查看某句代码执行时间
%f 微秒的表示: 范围: [0,999999]

import datetime
print(datetime.datetime.now().strftime('%H:%M:%S.%f'))

测试是否使用GPU的代码

import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

文件操作的命令

scp 文件上传下载命令

将本地文件夹下的所有文件上传示例

scp -r  /media/santiago/folder  [email protected]:/root/hdd/

单个文件上传
示例

scp /media/santiago/f.txt [email protected]:/root/hdd/

下载示例

scp [email protected]:/root/hdd/f.txt  /media/santiago/
scp -r [email protected]:/root/hdd/ /media/santiago/

压缩当前目录test到test.tar

tar -zcvf test.tar test

解压缩当前目录下的file.tar到test

tar -zxvf test.tar.gz

如果出现错误

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

使用命令

tar -xvf test.tar.gz

你可能感兴趣的:(深度学习)