CUDA计算向量内积的程序(源自CUDA范例编程)

__syncthreads() acts as a barrier at which all threads in the block must wait before any is allowed to proceed.
//计算向量的内积程序

#include
#define imin(a,b)    (a>>(dev_a,dev_b,dev_partial_c);

	cudaMemcpy(partial_c,dev_partial_c,blocksPerGrid*sizeof(float),cudaMemcpyDeviceToHost);

	c=0;
	for(int i=0;i

 

你可能感兴趣的:(CUDA,cuda,编程,float,cache,c,测试)