CUDA Toolkit 11.0 Update1 (Aug 2020), Versioned Online Documentation
CUDA Toolkit 11.0 (May 2020), Versioned Online Documentation Download
cuDNN v8.0.2 (July 24th, 2020), for CUDA 11.0
版本太新,tar安装始终存在问题
最后安装的版本是:
CUDA Toolkit 10.2 (Nov 2019), Versioned Online Documentation
Download cuDNN v7.6.5 (November 18th, 2019), for CUDA 10.2
CUDA,NVIDIA Driver,Linux,GCC之间的版本对应关系表格
你可以电脑的配置信息中找到显卡的具体型号,如果你是双系统,在Windows下的设备管理器中也可以查到显卡的详细信息;
你也可以在ubuntu的终端中输入命令:
lspci | grep -i nvidia
会显示出你的NVIDIA GPU版本信息,不过不是很详细。
我的显示为(GeForce GTX 970):
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)
输入命令:
uname -m && cat /etc/*release
结果显示:
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
......
在终端中输入:
gcc -v #or $ gcc --version
结果显示:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
......
若未安装请使用下列命令进行安装:
sudo apt-get install build-essential
在终端中输入:
uname -r
结果显示:
4.15.0-112-generic
如果缺失,在终端中输入:
sudo apt-get install linux-headers-$(uname -r)
可以安装对应kernel版本的kernel header和package development
若以上各项验证检查均满足要求,便可进行下面的正式安装过程。如果没有满足要求的话,可以参考cuda的官方文档,里面有详细的针对每个问题的解决方案。
NVIDIA-Linux-x86_64-450.57.run
NVIDIA中国官网
Ubuntu笔记本Nvidia显卡使用状况,my神翻可以通过这里的链接安装显卡驱动。下面的安装步骤适用台式机安装驱动。
参考ubuntu16.04安装NVIDIA显卡驱动,第一次成功手动安装显卡驱动。
安装前 系统状态,ubuntu16.04初装完成,系统驱动还是核显的,没有换成系统自带的nvidia驱动之前的样子。
这步没用到,因为之前是没装的,复制的其他博客,看命令也不难解释。
#for case1: original driver installed by apt-get:
sudo apt-get remove --purge nvidia*
#for case2: original driver installed by runfile:
sudo chmod +x *.run
sudo ./NVIDIA-Linux-x86_64-384.59.run --uninstall
如果原驱动是用apt-get安装的,就用第1种方法卸载。
如果原驱动是用runfile安装的,就用–uninstall命令卸载。其实,用runfile安装的时候也会卸载掉之前的驱动,所以不手动卸载亦可。
禁用 nouveau驱动:
lsmod | grep nouveau # 查看有没有输出,如果有信息输出,则需要禁掉
sudo gedit /etc/modprobe.d/blacklist.conf #将默认的驱动拉进黑名单
在blacklist.conf的最后添加下面几行:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb
更新: sudo update-initramfs -u #这一步可能要快点,但也无妨。
重启: lsmod | grep nouveau # 查看有没有输出,如果没有任何信息输出,则说明ok
进入tty模式进行安装:
Ctrl+Alt+F1进入文本模式,Ctrl+Alt+F7返回图形界面模式(期间文本模式下的进度不会改变,还可一继续进入文本模式)
#输入账户名及密码
sudo su #输入密码,以root权限运行
cd ~/ # ~/ <=> /home/yourname/
sudo service lightdm stop # 关闭图形界面
#如果安装失败,重新打开图形界面sudo service lightdm restart 把刚刚加入黑免单的驱动删除重启就回到了原来的状态
sudo init 3 #这句官网有介绍,Switch to runlevel 3.
sudo sh NVIDIA-Linux-x86_64-410.78.run --no-opengl-files –no-x-check –no-nouveau-check
#–no-opengl-files 只安装驱动文件,不安装OpenGL文件。这个参数最重要,只有禁用opengl这样安装才不会出现循环登陆的问题
#–no-x-check 安装驱动时不检查X服务
#–no-nouveau-check 安装驱动时不检查nouveau
#后面两个参数可不加。
如果在装的过程中出现以下信息,请选择(重要!,踩坑许久):
之前看他的报错提示,又去官网论坛找解决方法,又说是Ubuntu内核可能不支持需要升级Ubuntu内核.也是折腾了许久,其实不必。
最后会看到安装成功的提示。
Installation of NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version:450.57) is now complete. Please update your xorg.conf file as appriate; see the file /usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.
OK
sudo service lightdm restart # 重新开启图形界面
nvidia-smi # 查看是否安装成功
终端输入 :
cat /proc/driver/nvidia/version #会输出NVIDIA Driver的版本号
lspci | grep -i nvidia #查看gpu版本信息
cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64.deb
CUDA Toolkit 11.0 (May 2020), Versioned Online Documentation
CUDA Toolkit Archive
官方信息:CUDA Toolkit Documentation v10.2.89
官方安装指导:NVIDIA CUDA Installation Guide for Linux.
Nvidia驱动和cuda对照表
根据Ubuntu16.04+CUDA9.0 安装(全网最简便快速安装,测试成功),配置环境发。
Ubuntu 16.04 上安装 CUDA 9.0 详细教程
参考Ubuntu18.04安装Cuda10.1安装CUDA。
注意不要安装驱动,类似下图这样。
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer │
│ - [ ] Driver │
│ [ ] 450.51.05 │
│ + [X] CUDA Toolkit 11.0 │
│ [X] CUDA Samples 11.0 │
│ [X] CUDA Demo Suite 11.0 │
│ [X] CUDA Documentation 11.0 │
│ Options │
│ Install │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
我的笔记本,出现这个图:
┌──────────────────────────────────────────────────────────────────────────────┐
│ Existing package manager installation of the driver found. It is strongly │
│ recommended that you remove this before continuing. │
│ Abort │
│ Continue │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | 'Enter': Select │
选Continue
安装后的提示如下,需要配置环境。
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.2/
Samples: Installed in /home/mooc/
Please make sure that
- PATH includes /usr/local/cuda-10.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 440.00 is required for CUDA 10.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.1/
Samples: Installed in /home/lzy/
Please make sure that
- PATH includes /usr/local/cuda-10.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
gedit ~/.bashrc
在文件末尾添加
export PATH=/usr/local/cuda-11.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=/usr/local/cuda-10.1/bin:/usr/local/cuda-10.1/NsightCompute-2019.1${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
最后使其生效
source ~/.bashrc
a.以下试按照runfile形式安装的验证方式。但deb方式安装发现NVIDIA_CUDA-10.2_Samples文件没有。
cd /usr/local/cuda-10.2/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
结果如图
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 970"
CUDA Driver Version / Runtime Version 11.0 / 10.2
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 4040 MBytes (4236115968 bytes)
(13) Multiprocessors, (128) CUDA Cores/MP: 1664 CUDA Cores
GPU Max Clock rate: 1253 MHz (1.25 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 1835008 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
出现Result = PASS则表示安装成功通过。
cat /proc/driver/nvidia/version
结果显示类似
NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.81 Sat Sep 2 02:43:11 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
nvcc -V # 会输出CUDA的版本信息
打开终端输入:
cd NVIDIA_CUDA-10.2_Samples/
然后终端输入:$ make
系统就会自动进入到编译过程,整个过程大概需要十几到二十分钟,请耐心等待。如果出现错误的话,系统会立即报错停止。
竟然出错了!使用make -k
来跳过这个错误。
fatal error: nvscibuf.h
第一次运行时可能会报错,提示的错误信息可能会是系统中没有gcc,
解决办法就是通过命令重新安装gcc就行,在终端输入:$ sudo apt-get install gcc
安装完gcc后, 再make
就正常了。
如果编译成功,最后会显示Finished building CUDA samples,如下图所示。
make[1]: Leaving directory '/home/mooc/NVIDIA_CUDA-11.0_Samples/7_CUDALibraries/simpleCUBLASXT'
Finished building CUDA samples
./bandwidthTest
看到类似如下图片中的显示,则代表成功
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 970
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.8
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.4
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 145.7
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Package Manager Installation
根据官方安装指导:NVIDIA CUDA Installation Guide for Linux就行。
如上 官方安装指导 测试。但,没有 NVIDIA_CUDA-10.2_Samples 文件夹。但samples文件也一样用。把/usr/local/cuda-11.0/samples
复制出来,到home
。
cd samples
make
cd samples/1_Utilities/deviceQuery
./deviceQuery
cd samples/1_Utilities/bandwidthTest
./bandwidthTest
参考官方指导
sudo /usr/local/cuda-11.0/bin/uninstall_cuda_11.0.pl
/usr/local/cuda-11.0
目录下任然有文件存在,这是cudnn文件,所以还需要将cuda-11.0文件删除干净:sudo rm -rf /usr/local/cuda-11.0
Removing CUDA Toolkit and Driver
To remove CUDA Toolkit:
$ sudo apt-get --purge remove "*cublas*" "cuda*"
To remove NVIDIA Drivers:
$ sudo apt-get --purge remove "*nvidia*"
sudo rm -rf cuda
sudo rm -r cuda-9.0
Download cuDNN v8.0.2 (July 24th, 2020), for CUDA 11.0
cuDNN
cudnn-archive
Downloading cuDNN For Linux
官方安装参考文档This Archives document provides access to previously released cuDNN documentation versions.
参考Ubuntu 16.04 配置安装 Tensorflow Gpu版本
选: cuDNN Library for Linux下载
参考检测CUDNN是否成功安装
https://www.jianshu.com/p/8e9090a62342
https://www.cnblogs.com/liuwenhua/p/11521668.html
https://blog.csdn.net/wanzhen4330/article/details/81699769#cudnn%E7%9A%84%E5%AE%89%E8%A3%85
1、首先解压缩下的cudnn压缩包文件
tar -xzvf cudnn-11.0-linux-x64-v8.0.2.39.tgz
解压出:
cuda/include/cudnn.h
cuda/include/cudnn_adv_infer.h
cuda/include/cudnn_adv_train.h
cuda/include/cudnn_backend.h
cuda/include/cudnn_cnn_infer.h
cuda/include/cudnn_cnn_train.h
cuda/include/cudnn_ops_infer.h
cuda/include/cudnn_ops_train.h
cuda/include/cudnn_version.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.8
cuda/lib64/libcudnn.so.8.0.2
cuda/lib64/libcudnn_adv_infer.so
cuda/lib64/libcudnn_adv_infer.so.8
cuda/lib64/libcudnn_adv_infer.so.8.0.2
cuda/lib64/libcudnn_adv_infer_static.a
cuda/lib64/libcudnn_adv_train.so
cuda/lib64/libcudnn_adv_train.so.8
cuda/lib64/libcudnn_adv_train.so.8.0.2
cuda/lib64/libcudnn_adv_train_static.a
cuda/lib64/libcudnn_cnn_infer.so
cuda/lib64/libcudnn_cnn_infer.so.8
cuda/lib64/libcudnn_cnn_infer.so.8.0.2
cuda/lib64/libcudnn_cnn_infer_static.a
cuda/lib64/libcudnn_cnn_train.so
cuda/lib64/libcudnn_cnn_train.so.8
cuda/lib64/libcudnn_cnn_train.so.8.0.2
cuda/lib64/libcudnn_cnn_train_static.a
cuda/lib64/libcudnn_ops_infer.so
cuda/lib64/libcudnn_ops_infer.so.8
cuda/lib64/libcudnn_ops_infer.so.8.0.2
cuda/lib64/libcudnn_ops_infer_static.a
cuda/lib64/libcudnn_ops_train.so
cuda/lib64/libcudnn_ops_train.so.8
cuda/lib64/libcudnn_ops_train.so.8.0.2
cuda/lib64/libcudnn_ops_train_static.a
cuda/lib64/libcudnn_static.a
cuda/lib64/libcudnn_static.a
Procedure
Navigate to your directory containing the cuDNN Tar file.
Unzip the cuDNN package.
$ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
出错,说是链接问题。(在尝试CUDA11与cuDNN8时,tar形式的安装一直存在链接问题。)
链接:
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.5
sudo ln -s libcudnn.so.6.0.21 libcudnn.so.6
sudo ln -s libcudnn.so.6 libcudnn.so
sudo chmod +r libcudnn.so.7.0.5
sudo ln -sf libcudnn.so.7.0.5 libcudnn.so.7
sudo ln -sf libcudnn.so.7 libcudnn.so
sudo ldconfig
如果没出错,会显示:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
官方安装参考
下载三个文件:
cuDNN Runtime Library for Ubuntu16.04 x86_64 (Deb)
cuDNN Developer Library for Ubuntu16.04 x86_64 (Deb)
cuDNN Code Samples and User Guide for Ubuntu16.04 x86_64 (Deb)
后按装:
sudo dpkg -i libcudnn8_8.0.2.39-1+cuda11.0_amd64.deb
sudo dpkg -i libcudnn8-dev_8.0.2.39-1+cuda11.0_amd64.deb
sudo dpkg -i libcudnn8-doc_8.0.2.39-1+cuda11.0_amd64.deb
cp -r /usr/src/cudnn_samples_v8/ /home/mooc/
cd /home/mooc/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN
结果如下:
Executing: mnistCUDNN
cudnnGetVersion() : 8002 , CUDNN_VERSION from cudnn.h : 8002 (8.0.2)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 13 Capabilities 5.2, SmClock 1253.0 Mhz, MemSize (Mb) 4039, MemClock 3505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.037568 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.039200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.068480 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.071264 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.574752 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 3.837248 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.088384 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.149120 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.164736 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.203712 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.497536 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 1.100000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.018144 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.026784 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.051264 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.057472 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.086016 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.143008 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.081024 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.090912 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.108960 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.148992 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.206688 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.261760 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.043584 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.045472 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.081376 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.101952 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.147296 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.433760 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.089216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.089856 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.093024 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.156448 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.163584 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.202016 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.021248 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.021568 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.026880 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.056800 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.076128 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.143872 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.090656 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.092576 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.110464 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.155904 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.185184 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.239072 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
cuDNN两种安装方式,验证方法好像还不通用。先按第二个来了。以后有问题再说。