原文:http://software.intel.com/en-us/blogs/2008/12/16/compare-windows-threads-openmp-intel-threading-building-blocks-for-parallel-programming/
This is an interesting topic when we plan to implement parallel programs on multi-core system to best utilize processors. That means we want to divide (serial) big task into small tasks and let them running simultaneously.
Next question is what method will be used? There are three options – 1) Traditional Windows* Threads 2) OpenMP* 3) Intel® Threading Build Blocks (call it TBB below). This is hard to say what is the best and what is the worst, it depends developer’s situation. For example, if the developer doesn't have parallel programming experience (skill) before, so OpenMP* and TBB could be used when the developer hates to learn Windows* threads. The advantages of OpenMP* are that code is clean and easier to maintenance. TBB is helpful, that the developer doesn’t need to understand how threads work, just submit your tasks to TBB, trust TBB to run your application with better performance. Some developers want to control threads running by themselves, Windows* threads is an option.
Here I list major factors of three options (for your consideration)
Challenges for parallel programming |
Windows* threads |
OpenMP* |
Intel® Threading Build Blocks |
Task level |
|
x |
x |
Cross-platform support |
|
x |
x |
Scalable runtime libraries |
|
|
x |
Threads’ Control |
x |
|
|
Pre-tested and validated |
|
x |
x |
C Development support |
x |
x |
|
Intel® Threading Tools support |
x |
x |
x |
Maintenance for tomorrow |
|
x |
x |
Scalable memory allocator |
|
|
x |
“light” mutex |
|
|
x |
Processor affinity |
x |
|
Thread affinity |
Thinking that you might be in one of below situations, please do different thing to save development cost.
Case-1
You already have workable multithreaded program, and hope to find performance bottleneck then improve it.
You don’t need to re-write the code, just use Intel® VTune™ Performance Analyzer and Intel? Thread Profiler to find essential performance problem in code, so you have opportunity to use OpenMP* or TBB to improve code in “deep” loop, or change mechanism on sync-objects
Case-2
You may have serial code, but don’t know how to change as multithreaded code.
You use Intel® VTune™ Performance Analyzer to find hotspots functions in your code, don’t need to change whole program to parallel – just change critical code to parallel.
Case-3
You have a new project to be developed. Consider your algorithm as parallel work, divide to small tasks, proper granularity. If you are not good at multithreaded programming – just use TBB to submit small tasks, or use OpenMP* to deal with structured stream