Linux(Ubuntu)系统查看显卡型号
lspci | grep -i vga
# 返回的是一个十六进制数字代码
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)
# 查询网站
http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci
# 查的结果
Name: TU107 [GeForce GTX 1650]
nvidia-smi
cat /proc/driver/nvidia/version
显卡算力表
nvcc -V
cat /usr/local/cuda/version.txt
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
# 清空之前编译的文件
sudo make clean
# 重新编译,-j8表示8线程用于加速
sudo make -j8
./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
yichao@yichao:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce GTX 1650"
CUDA Driver Version / Runtime Version 11.4 / 10.2
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP: 896 CUDA Cores
GPU Max Clock rate: 1680 MHz (1.68 GHz)
Memory Clock rate: 4001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP: 896 CUDA Cores
参考资料 Ubuntu 18.4 查看CUDNN版本
# 方法一
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
# 如果方法一失效,执行方法二
# 1. 查看cudnn.h文件
cat /usr/local/cuda/include/cudnn.h | grep cudnn
输出
/* cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"
# 2. cudnn.h文件没有定义cuDNN版本,找到cudnn_version.h文件
# 将查找结果重定向到标准输出中,过滤掉"权限不够"的文件
find / -name cudnn_version.h 2>&1 | grep -v "权限不够"
输出
/home/yichao/Downloads/cuda/include/cudnn_version.h
# 3. 查看cuDNN版本信息
cat /home/yichao/Downloads/cuda/include/cudnn_version.h
输出
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
cuDNN的版本为:8.0.5
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
# 清空之前编译的文件
sudo make clean
# 重新编译,-j8表示8线程用于加速
sudo make -j8
./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3060"
CUDA Driver Version / Runtime Version 11.4 / 10.2
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 12051 MBytes (12636061696 bytes)
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores/SM
(28) Multiprocessors, ( 64) CUDA Cores/MP: 1792 CUDA Cores
GPU Max Clock rate: 1777 MHz (1.78 GHz)
Memory Clock rate: 7501 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 2359296 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 10 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3060"
CUDA Driver Version / Runtime Version 11.4 / 11.1
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 12051 MBytes (12636061696 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1777 MHz (1.78 GHz)
Memory Clock rate: 7501 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 2359296 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 10 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS
重要说明:
第一次的结果:
(28) Multiprocessors, ( 64) CUDA Cores/MP: 1792 CUDA Cores
第二次的结果:
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
当选择deb方式进行安装时,会在 /usr/src/cudnn_samples_v7 有一些cudnn的例子,编译mnistCUDNN sample进行验证。
# 复制cuDNN samples到home目录下
cp -r /usr/src/cudnn_samples_v7 /$HOME
# 进入home目录
cd $HOME/cudnn_samples_v7/mnistCUDNN/
# 编译mnistCUDNN
sudo make clean
sudo make
# 运行mnistCUDNN
# 如果出现Test passed!表明cuDNN已安装成功
sudo ./mnistCUDNN
cuDNN Support Matrix
cuDNN-Support-Matrix
Supported hardware
compute-capabilities
CUDA Compute Capability | Example Device | TF32 | FP32 | FP16 | INT8 | FP16 Tensor Cores | INT8 Tensor Cores | DLA |
---|---|---|---|---|---|---|---|---|
8.6 | NVIDIA A10 | Yes | Yes | Yes | Yes | Yes | Yes | No |
8.0 | NVIDIA A100/GA100 GPU | Yes | Yes | Yes | Yes | Yes | Yes | No |
7.5 | Tesla T4 | No | Yes | Yes | Yes | Yes | Yes | No |
7.2 | Jetson AGX Xavier | No | Yes | Yes | Yes | Yes | Yes | Yes |
7.0 | Tesla V100 | No | Yes | Yes | Yes | Yes | No | No |
6.2 | Jetson TX2 | No | Yes | Yes | No | No | No | No |
6.1 | Tesla P4 | No | Yes | No | Yes | No | No | No |
6.0 | Tesla P100 | No | Yes | Yes | No | No | No | No |
5.3 | Jetson TX1 | No | Yes | Yes | No | No | No | No |
5.2 | Tesla M4 | No | Yes | No | No | No | No | No |
5.0 | Quadro K2200 | No | Yes | No | No | No | No | No |