一个流对应并发的概念,是一组顺序执行的操作(可能由多个主机线程发出);
多个流对应并行的概念,因为发生顺序具有不确定性。
//基本函数
cudaDeviceGetStreamPriorityRange()
//创建两个流 cudaStream_t stream[2]; for (int i = 0; i < 2; ++i) cudaStreamCreate(&stream[i]); float* hostPtr; cudaMallocHost(&hostPtr, 2 * size); ... //两个流,每个流有三个命令 for (int i = 0; i < 2; ++i) { //从主机内存复制数据到设备内存 cudaMemcpyAsync(inputDevPtr + i * size, hostPtr + i * size, size, cudaMemcpyHostToDevice, stream[i]); //执行Kernel处理谁被内存 MyKernel <<<100, 512, 0, stream[i]>>>(outputDevPtr + i * size, inputDevPtr + i * size, size); //从设备内存到主机内存 cudaMemcpyAsync(hostPtr + i * size, outputDevPtr + i * size, size, cudaMemcpyDeviceToHost, stream[i]); } ... //销毁流 for (int i = 0; i < 2; ++i) cudaStreamDestroy(stream[i]);说明:
[1].http://docs.nvidia.com/cuda/cuda-c-programming-guide/#streams