OpenCl 笔记2 Optimization

1. Someone once said that if you don't care much about the performance, parallel programming is easy.

2. Many of the performance improvements are published, giving the impression that using GPU programming has a result of more than hundreds time faster. Most of times, the compared CPU programs are totally non-optimized and possibly poorly designed for CPU computation.

3. GPU typically needs thousands of threads for full utilization. This is not only important to achieve a high computed rate, but more to hide the memory latencies.

4. Due to the fact that the global memory bandwidth is the limiting factor in many cases. Coalrscing techniques can dramatically influence performance if global memory is used heavily.

你可能感兴趣的:(OpenCl 笔记2 Optimization)