cuda cudaError_t cudaerr = cudaDeviceSynchronize()运行报错“misaligned address”解决办法

最近用cuda去做加速计算,发现当计算数据量较大时,报错“misaligned address”,如下:

cuda cudaError_t cudaerr = cudaDeviceSynchronize()运行报错“misaligned address”解决办法_第1张图片

出现这种情况可能是因为指针没有与处理器所需的边界对齐造成的。

From the CUDA Programming Guide, section 5.3.2:

Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).

This is what the debugger is trying to tell you: Basically, you shouldn't dereference a pointer pointing to a 32-bit value from an address not aligned at a 32-bit boundary.

你可能感兴趣的:(深度学习,深度学习,人工智能)