GPU训练前的事
flyfish
查看CUDA 版本
cat /usr/local/cuda/version.txt
结果
CUDA Version 10.1.168
查看cuDNN 版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
结果
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
nvcc命令
nvcc --version
结果
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
nvidia-smi命令
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:08.0 Off | 0 |
| N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
其他对应
NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0
执行命令
ubuntu-drivers devices
结果
== /sys/devices/pci0000:00/0000:00:08.0 ==
modalias : pci:v000010DEd000015F8sv000010DEsd0000118Fbc03sc02i00
vendor : NVIDIA Corporation
model : GP100GL [Tesla P100 PCIe 16GB]
driver : nvidia-driver-418 - third-party free recommended
driver : nvidia-driver-410 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin
CUDA的环境变量
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
cuDNN的安装
安装过程就是下载文件,解压文件,拷贝文件,更改权限
tar xzvf cudnn-*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
多版本CUDA版本共存可切换
sudo rm -rf cuda
sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda
建立软链接:ln -s 原目录 映射目录
删除软链接: rm -rf 映射目录
run
如果下载的是run文件则执行
sudo sh cuda_*_linux.run`
object_detection的环境变量
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
版本对应
tensorflow_gpu-1.13.1
Python: 2.7、3.3-3.6
GCC 4.8
Bazel 0.19.2
cuDNN: 7.4
CUDA 10.0
驱动 410
驱动版本与CUDA的对应关系
NVIDIA graphics driver R418 or newer for CUDA 10.1
NVIDIA graphics driver R410 or newer for CUDA 10.0
按需使用GPU
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
查看 operations 和 Tensor 在哪个设备上运行
with tf.Session(graph=detection_graph,config=tf.ConfigProto(log_device_placement=True)) as sess:
1秒查看一次GPU情况
watch -n 1 nvidia-smi
设置GPU持续模式(persistence-mode)
nvidia-smi -pm 1
查看硬盘空间
df -h
挂载硬盘
查看有几个硬盘
fdisk -l
假设要挂载的硬盘是Disk /dev/vdb
创建目录之后执行挂载命令
mkdir hdd
mount /dev/vdb hdd
查看某句代码执行时间
%f 微秒的表示: 范围: [0,999999]
import datetime
print(datetime.datetime.now().strftime('%H:%M:%S.%f'))
测试是否使用GPU的代码
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
文件操作的命令
scp 文件上传下载命令
将本地文件夹下的所有文件上传示例
scp -r /media/santiago/folder [email protected]:/root/hdd/
单个文件上传
示例
scp /media/santiago/f.txt [email protected]:/root/hdd/
下载示例
scp [email protected]:/root/hdd/f.txt /media/santiago/
scp -r [email protected]:/root/hdd/ /media/santiago/
压缩当前目录test到test.tar
tar -zcvf test.tar test
解压缩当前目录下的file.tar到test
tar -zxvf test.tar.gz
如果出现错误
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
使用命令
tar -xvf test.tar.gz