Parallel Programming - Performance Checklist



  1. Where  is the parallelism,which variable is used as the variable in parallel for
  2. Load balance
  3. Use atomic operations instead of mutex, signal whenever possible
  4. Try to use Map-reduce, parallel sort to organize the data

 

 

ForGPU

  1. Check shared memory per thread to see whether we can fully utilize the GPU SM processors
  2. Check number of registers and shared memory
  3. Optimize memory storage: packing your data structure; block storage for large uniform data structure (1D - nD matrix); if two variables are frequently read together, put them in the closest position in the memory.
  4. Using memory pool  to reduce the cost of the memory allocation costs
  5. Bit operations are important

你可能感兴趣的:(Parallel Programming - Performance Checklist)