gpu

# 搜索集显和独显
lspci | grep VGA     # 查看集成显卡
lspci | grep NVIDIA  # 查看NVIDIA显卡

# 查看nouveau是否启动运行
lsmod | grep nouveau

#  查看当前电脑的显卡型号
lshw -numeric -C display

# 以1秒钟为间隔来查看GPU资源占用情况
watch -n 1 nvidia-smi

# 查看cuda版本
cat  /usr/local/cuda/version.txt

# 如何配置CUDA
yum install kernel-devels
yum install kernel-headers

# 如何指定nvcc编译参数, 参考 https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation
--gpu-architecture arch (-arch) 指定虚拟架构
--gpu-code code,... (-code) 指定真实架构
# Generate only PTX
-arch compute_35 (-code默认使用-arch值)
-arch compute_35 -code compute_35
-gencode arch=compute_35,code=compute_35
# Generate only SASS
-arch compute_35 -code sm_35
-gencode arch=compute_35,code=sm_35
# Generate both PTX and SASS
-arch sm_35 (-code默认使用-arch值以及最接近的虚拟架构)
-arch=compute_50 -code=sm_35,compute_35
-gencode arch=compute_35,code=\"sm_35,compute_35\"

# !!!WRONG!!! specifications 错误示范
# NVCC complains that you need to specify -arch with a virtual code architecture like compute_35
-code compute_35
-code sm_35
-arch sm_35 -code compute_35
-arch sm_35 -code sm_35

# Choose GPU with lowest utilization
export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=memory.free,index --format=csv,nounits,noheader | sort -nr | head -1 | awk '{ print $NF }'

名词解释

Global Memory: 显存
SM(Streaming Multiprocessing): 流式多处理器 是GPU中的计算单元。每一个SM都有自己的控制单元(Control Unit),寄存器(Register),缓存(Cache),指令流水线(execution pipelines)。不同代的指令集不同,指令编码不同。(注:实际中的结论,compute_30以上的程序,计算能力高的GPU可以运行编译成低代的程序,反之则不行,如计算能力6.1的GPU可以运行编译成compute_30,sm_30的程序)
SP(Streaming Processor): CUDA Core


Pascal架构的GP100 GPU架构
SM
CUDA Core

参考:Nvidia GPU架构 - Cuda Core,SM,SP等等傻傻分不清?

官方文档

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html cuda编译器驱动程序nvcc

https://developer.nvidia.com/cuda-90-download-archive CUDA 9
https://developer.nvidia.com/cuda-10.1-download-archive-base CUDA 10.1
https://docs.nvidia.com/cuda/archive/10.1/ CUDA 10.1 文档
https://developer.nvidia.com/cuda-toolkit-archive cuda toolkit 历史版本文档

https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf CUDA C++ 开发指南
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html CUDA C最佳实践

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf Volta架构白皮书
https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html Turing兼容性指南 e.g.T4
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html Ampere兼容性指南 e.g.RTX3080

Note that compute_XX refers to a PTX version and sm_XX refers to a cubin version. The arch= clause of the -gencode= command-line option to nvcc specifies the front-end compilation target and must always be a PTX version. The code= clause specifies the back-end compilation target and can either be cubin or PTX or both. Only the back-end target version(s) specified by the code= clause will be retained in the resulting binary; at least one must be PTX to provide Turing compatibility.

https://docs.nvidia.com/cuda/cublas/index.html 基础线性代数库cublas,汇编级优化。
https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html 深度学习库cudnn

附录:

显卡,显卡驱动,nvcc, cuda driver,cudatoolkit,cudnn到底是什么?
Matching CUDA arch and CUDA gencode for various NVIDIA architectures - Arnon Shimoni
CUDA架构及对应编译参数
CUDA:nvcc编译参数示例
How to specify architecture to compile CUDA code
Cuda Tips:nvcc的code、arch、gencode选项

你可能感兴趣的:(gpu)