Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming

Let us introduce a new term first[1]. 


Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming_第1张图片


It is the ratio of active warps / maximum number(32) of warps. 


It depends on three parameters: 

1) threads/block (set in <<<>>>)

2) registers/thread (can see in the ptx file or use --ptxas-option=-v to see after finish compiling) 

3) shared memory/block(also can see the ptx file and use--ptxas-option=-v to see after finish compiling). However, if our shared memory variable is set extern (use 'extern' to define the shared memory). We get this variable from the runtime. 


We can use these charts to see how can we improve the occupancy.(By keeping other two variables the same, changing one variable.)[1]


Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming_第2张图片


Also, at the expense of accuracy, we can -use fast_math or replace some math function with CUDA math intrinsic function in the code. [1]


Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming_第3张图片


Reference: 

[1] 18645 CMU How to write fast code Jike Chong and Ian Lane





 

你可能感兴趣的:(Math,schedule,CUDA,GPU)