理解这部分内容之前,先要理解omp_get_num_threads()和omp_get_max_threads()的含义和区别,参考:http://blog.csdn.net/gengshenghong/article/details/7003110
(1)OMP_DYNAMIC环境变量:
取值为TRUE和FALSE,定义是否动态设定并行区域执行的线程,默认为false。
根据MSDN的说明:http://msdn.microsoft.com/zh-cn/subscriptions/y2s46ze1.aspx
If set to TRUE, the number of threads that are used for executing parallel regions may be adjusted by the runtime environment to best utilize system resources. If set to FALSE, dynamic adjustment is disabled. The default condition is implementation-defined.
简单的理解,如果是FALSE,那么就按照上文(http://blog.csdn.net/gengshenghong/article/details/7003110)中关于如何确定并行区域生成线程数量的原则去确定实际生成的线程数量;如果是TRUE,那么运行时会根据系统资源等因素进行调整,一般而言,“best utilize system resources",大部分情况下,生成与CPU数量相等的线程就是最好的利用资源了。实际测试发现,生成的线程数量也是和num_threads子句有关,如果num_threads设置的线程数量小于CPU核的数量,那么在动态调整下,生成的数量发现还是和num_threads子句指定的相同的,看下面的例子来理解(先在下一部分理解了omp_set_dynamic函数之后再回头来理解这一点):
#include <omp.h> int main(int argc, _TCHAR* argv[]) { omp_set_dynamic(10); #pragma omp parallel num_threads(10) { printf("ID: %d, Max threads: %d, Num threads: %d \n",omp_get_thread_num(), omp_get_max_threads(), omp_get_num_threads()); } return 0; }在这样的情况下,printf打印了四次(四核CPU下),如果子句改为num_threads(2),只会打印两次。
(2)omp_set_dynamic()
MSDN:http://msdn.microsoft.com/zh-cn/subscriptions/chy4kf8y.aspx
If dynamic_threads evaluates to a nonzero value, the number of threads that are used for executing subsequent parallel regions may be adjusted automatically by the run-time environment to best utilize system resources. As a consequence, the number of threads specified by the user is the maximum thread count. The number of threads in the team executing a parallel region remainsfixed for the duration of that parallel region and is reported by theomp_get_num_threads function.
If dynamic_threads evaluates to 0, dynamic adjustment is disabled.
A call to omp_set_dynamic has precedence over the OMP_DYNAMIC environment variable.
omp_get_dynamic函数用于获取是否允许动态调整,其返回值为int类型,但是实际只会返回0和1两个值之一。
(PS:不知道为什么openmp不规定omp_set_dynamic的参数类型为bool,可能是C里面没有bool类型的原因?)
(3)例子:
#include <omp.h> int main(int argc, _TCHAR* argv[]) { omp_set_dynamic(0); // can be ignored! printf("Is Dynamic: %d\n",omp_get_dynamic()); omp_set_num_threads(10); #pragma omp parallel num_threads(5) { printf("ID: %d, Max threads: %d, Num threads: %d \n",omp_get_thread_num(), omp_get_max_threads(), omp_get_num_threads()); } omp_set_dynamic(1000); // nonzero is OK! printf("Is Dynamic: %d\n",omp_get_dynamic()); omp_set_num_threads(10); #pragma omp parallel num_threads(5) { printf("ID: %d, Max threads: %d, Num threads: %d \n",omp_get_thread_num(), omp_get_max_threads(), omp_get_num_threads()); } return 0; }分析上面的程序,来理解设置动态和不设置的差别,在4核CPU上运行的结果如下: