CUDA/cuDNN/GPU/驱动相关查询

一、查看显卡型号

参考资料

Linux(Ubuntu)系统查看显卡型号

方法一(推荐)

  1. 查看显卡型号
lspci | grep -i vga

# 返回的是一个十六进制数字代码
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)
  1. 查看十六进制代号
# 查询网站
http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

# 查的结果
Name: TU107 [GeForce GTX 1650] 

方法二

  1. 查看显卡型号
nvidia-smi

查看驱动版本

cat /proc/driver/nvidia/version

查看显卡算力

显卡算力表

二、 查看CUDA版本

  1. 方法一
nvcc -V
  1. 方法二
cat /usr/local/cuda/version.txt

查看cuda算力/cuda cores核心数

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
yichao@yichao:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1650"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 3904 MBytes (4093444096 bytes)
  (14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores
  GPU Max Clock rate:                            1680 MHz (1.68 GHz)
  Memory Clock rate:                             4001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
CUDA Capability Major/Minor version number:    7.5
Total amount of global memory:                 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores

三、查看cuDNN版本

参考资料 Ubuntu 18.4 查看CUDNN版本

  1. 方法一
# 方法一
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
  1. 方法二
# 如果方法一失效,执行方法二
# 1. 查看cudnn.h文件
cat /usr/local/cuda/include/cudnn.h | grep cudnn

输出
/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

# 2. cudnn.h文件没有定义cuDNN版本,找到cudnn_version.h文件
# 将查找结果重定向到标准输出中,过滤掉"权限不够"的文件
find / -name cudnn_version.h 2>&1 | grep -v "权限不够"

输出
/home/yichao/Downloads/cuda/include/cudnn_version.h

# 3. 查看cuDNN版本信息
cat /home/yichao/Downloads/cuda/include/cudnn_version.h

输出
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
cuDNN的版本为:8.0.5

四、测试cuda是否安装成功

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
  (28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 11.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS

重要说明:

  1. RTX 3060显卡是 Ampere 架构, cuda 11.1以上版本支持 RTX 3060 显卡;cuda 11.1 以下的版本,无法发挥 RTX 3060 的性能
第一次的结果:
(28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores

第二次的结果:
(28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores

五、测试cuDNN是否安装成功(deb方式)

当选择deb方式进行安装时,会在 /usr/src/cudnn_samples_v7 有一些cudnn的例子,编译mnistCUDNN sample进行验证。

# 复制cuDNN samples到home目录下
cp -r /usr/src/cudnn_samples_v7 /$HOME

# 进入home目录
cd $HOME/cudnn_samples_v7/mnistCUDNN/

# 编译mnistCUDNN 
sudo make clean 
sudo make

# 运行mnistCUDNN 
# 如果出现Test passed!表明cuDNN已安装成功
sudo ./mnistCUDNN

六、cuda与cuDNN版本对齐

cuDNN Support Matrix
cuDNN-Support-Matrix

七、GPU量化支持

Supported hardware
compute-capabilities

Table 4. Supported hardware
CUDA Compute Capability Example Device TF32 FP32 FP16 INT8 FP16 Tensor Cores INT8 Tensor Cores DLA
8.6 NVIDIA A10 Yes Yes Yes Yes Yes Yes No
8.0 NVIDIA A100/GA100 GPU Yes Yes Yes Yes Yes Yes No
7.5 Tesla T4 No Yes Yes Yes Yes Yes No
7.2 Jetson AGX Xavier No Yes Yes Yes Yes Yes Yes
7.0 Tesla V100 No Yes Yes Yes Yes No No
6.2 Jetson TX2 No Yes Yes No No No No
6.1 Tesla P4 No Yes No Yes No No No
6.0 Tesla P100 No Yes Yes No No No No
5.3 Jetson TX1 No Yes Yes No No No No
5.2 Tesla M4 No Yes No No No No No
5.0 Quadro K2200 No Yes No No No No No

你可能感兴趣的:(运维,GPU,CUDA,NVIDIA)