首先确定计算机中已安装Visual Studio集成开发环境,本人选用了VS2008.
登陆Nvidia官网(http://developer.nvidia.com/cuda-downloads)下载驱动(driver),开发包(SDK),工具包(Toolkit)。
注:根据自己计算机的配置选择desktop / notebook的 64 / 32 位版本,并且driver,SDK和Toolkit版本要一致。
选择自定义(高级)选项,单击下一步
建议选择 执行清洁安装
执行安装,选择自定义安装,修改安装路径位 D:\Program Files\NVIDIA GPU Computing Toolkit\\CUDA\v4.2\
执行安装,修改安装路径为 D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2
打开VS2008,依次选择[工具(Tools)]->[选项(Options)]->[项目和解决方案(Projects and Solutions)]。
注:以下要求按照自己的CUDA开发套件安装目录更改路径
在 [可执行文件] 中添加:
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\bin
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\bin\win32\Release
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\bin\win32\Debug
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\shared\bin\win32\Release
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\shared\bin\win32\Debug
在 [包含文件] 中添加;
D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\CUDALibraries\common\inc
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\inc
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\shared\inc
在 [库文件] 中添加:
D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\lib\Win32
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\lib
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\lib\Win32
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\shared\lib\Win32
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\CUDALibraries\common\lib
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\CUDALibraries\common\lib\Win32
在 [源文件] 中添加;
D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\src
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\src
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\shared\src
D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\CUDALibraries\common\src
选择 [VC++项目设置] ,在 [C/C++文件扩展名] 中添加 *.cu,在 [包括的扩展名] 中添加 .cuh。
选择 [文本编辑器] -> [文件扩展名],在编辑框中填入cu,在 [编辑器] 下来菜单中选择Microsoft Visual C++,点击添加。
此时运行 D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\SDK Browser\browser.exe,可以选择运行自带的演示例程。
例如选择Device Query,如果能够运行,则说明经过上述步骤,配置已完成。
否则将CUDA Toolkit的安装目录(D:\NVIDIA GPU Computing Toolkit\CUDA\v4.2\extra\visual_studio_intergration\rules)下的4个rules文件复制到 D:\Program Files\Microsoft Visual Studio 9.0\VC\VCProjectDefaults目录下。
cutil链接库是CUDA程序运行必需的库文件,但CUDA v4.2没有提供现成的,需要自己编译并得到。进入D:\Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common,找到cutil_vs2008.vcproj,打开,设置编译平台为Win32,然后分别编译Debug和Release版本就可以了。
创建一个win32控制台应用程序,选中 [附加选项] 中的空项目,创建完成。
右击项目名称,选择 [自定义项目规则],勾选 [CUDA Runtime API Build Rule(v4.2)]。
右击项目名称,选择 [属性],选择 [属性配置] -> [链接器] -> [常规],在 [附加库目录]中添加附加依赖项cudart.lib,cutil32D.lib等所在目录 $(CUDA_PATH)\lib\$(PlatformName);..\..\common\lib\$(PlatformName)。
在 [输入] -> [附加库依赖项] 中添加cudart.lib cutil32D.lib cuda.lib等。否则在编译时会出现类似"error LNK2019 无法解析的外部符号"的错误。
右击 [源文件]文件夹,选择 [添加] -> [新建项],选择C++模板,填写名称时可为***.cu的形式,即后缀名为cu。
以上项目创建成功,即可编写程序,编译,运行。下面给出一个测试程序代码,以供检测编译环境是否已配置成功。
#include
#include
#include
#include
#include
#include
// Program main
int main( int argc, char** argv)
{
printf("CUDA Device Query (Runtime API) version (CUDART static linking)\n");
int deviceCount;
cudaGetDeviceCount(&deviceCount);
// This function call returns 0 if there are no CUDA capable devices.
if (deviceCount == 0)
printf("There is no device supporting CUDA\n");
int dev;
for (dev = 0; dev < deviceCount; ++dev) {
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
if (dev == 0) {
// This function call returns 9999 for both major & minor fields, if no CUDA capable devices are present
if (deviceProp.major == 9999 && deviceProp.minor == 9999)
printf("There is no device supporting CUDA.\n");
else if (deviceCount == 1)
printf("There is 1 device supporting CUDA\n");
else
printf("There are %d devices supporting CUDA\n", deviceCount);
}
printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);
#if CUDART_VERSION >= 2020
int driverVersion = 0, runtimeVersion = 0;
cudaDriverGetVersion(&driverVersion);
printf(" CUDA Driver Version: %d.%d\n", driverVersion/1000, driverVersion%100);
cudaRuntimeGetVersion(&runtimeVersion);
printf(" CUDA Runtime Version: %d.%d\n", runtimeVersion/1000, runtimeVersion%100);
#endif
printf(" CUDA Capability Major revision number: %d\n", deviceProp.major);
printf(" CUDA Capability Minor revision number: %d\n", deviceProp.minor);
printf(" Total amount of global memory: %u bytes\n", deviceProp.totalGlobalMem);
#if CUDART_VERSION >= 2000
printf(" Number of multiprocessors: %d\n", deviceProp.multiProcessorCount);
printf(" Number of cores: %d\n", 8 * deviceProp.multiProcessorCount);
#endif
printf(" Total amount of constant memory: %u bytes\n", deviceProp.totalConstMem);
printf(" Total amount of shared memory per block: %u bytes\n", deviceProp.sharedMemPerBlock);
printf(" Total number of registers available per block: %d\n", deviceProp.regsPerBlock);
printf(" Warp size: %d\n", deviceProp.warpSize);
printf(" Maximum number of threads per block: %d\n", deviceProp.maxThreadsPerBlock);
printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n",
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);
printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n",
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %u bytes\n", deviceProp.memPitch);
printf(" Texture alignment: %u bytes\n", deviceProp.textureAlignment);
printf(" Clock rate: %.2f GHz\n", deviceProp.clockRate * 1e-6f);
#if CUDART_VERSION >= 2000
printf(" Concurrent copy and execution: %s\n", deviceProp.deviceOverlap ? "Yes" : "No");
#endif
#if CUDART_VERSION >= 2020
printf(" Run time limit on kernels: %s\n", deviceProp.kernelExecTimeoutEnabled ? "Yes" : "No");
printf(" Integrated: %s\n", deviceProp.integrated ? "Yes" : "No");
printf(" Support host page-locked memory mapping: %s\n", deviceProp.canMapHostMemory ? "Yes" : "No");
printf(" Compute mode: %s\n", deviceProp.computeMode == cudaComputeModeDefault ?
"Default (multiple host threads can use this device simultaneously)" :
deviceProp.computeMode == cudaComputeModeExclusive ?
"Exclusive (only one host thread at a time can use this device)" :
deviceProp.computeMode == cudaComputeModeProhibited ?
"Prohibited (no host thread can use this device)" :
"Unknown");
#endif
}
printf("\nTest PASSED\n");
CUT_EXIT(argc, argv);
}
运行结果如下图所示
[1] WIN7和VS2008条件下CUDA环境的搭建
[2] Windows7 64bit + VS2008 + CUDA 4.0 安装配置完全过程