Incrementally parallelizing the existing code

Assess, Parallelize, Optimize, Deploy


Incrementally parallelizing the existing code_第1张图片


Step 1: Profiling the code in order to identify the hot spots.

Strong scaling (Amdahl's Law) is a measure of how, for a given problem size, performance changes as more processors are added to the system

Weak scaling (Gustafson's Law) is a measure of how the performance per unit of work changes as more processors are added


Step 2: Parallelize the code

use GPU-accelerated libraries (https://developer.nvidia.com/gpu-accelerated-libraries)

use CUDA C/C++


Step 3: Optimize

high-level optimizations: algorithm choice & data movement (overlapping movement with computation)

low-level optimizations: explicitly caching data in shared memory or tuning floating point sequences


Step 4:  Deploy

       some key points to look out for when productizing your GPU-accelerated code:

Make sure you check the return value from API calls

Consider how you will distribute the CUDA runtime and libraries


参考链接:

https://developer.nvidia.com/content/assess-parallelize-optimize-deploy

你可能感兴趣的:(Incrementally parallelizing the existing code)