项目背景:
为了提升项目的运行效率,考虑多线程技术。最近OpenMP技术很热,咱也凑凑热闹,也为了充分发挥电脑硬件的能力。
硬件:
酷睿2双核 2.2GHz
3G 内存
软件:
Visual Studio 2010 旗舰版
Windows 7 旗舰版 32bit
难点:
由于多个线程操作同一个文件,很有可能存在线程冲突。
OpenMP:
1. 必须的头文件 <omp.h>
2. #pragma omp 预处理指示符指定要采用OpenMP。 例如通过 #pragma om parallel for 来指定下方的for循环采用多线程执行,此时编译器会根据CPU的个数来创建线程数。对于双核系统,编译器会默认创建两个线程执行并行区域的代码。
示例代码:
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #include <iostream>
- #include <stdio.h>
- #include <omp.h> // OpenMP编译需要包含的头文件
-
- int main()
- {
- #pragma omp parallel for
- for (int i = 0; i < 100; ++i)
- {
- std::cout << i << std::endl;
- }
-
- return 0;
- }
#include <iostream> #include <stdio.h> #include <omp.h> // OpenMP编译需要包含的头文件 int main() { #pragma omp parallel for for (int i = 0; i < 100; ++i) { std::cout << i << std::endl; } return 0; }
3. OpenMP 常用库函数
函数原型 功能
int omp_get_num_procs(void) 返回当前可用的处理器个数
int omp_get_num_threads(void) 返回当前并行区域中活动线程的个数,如果在并行区域外部调用,返回1
int omp_get_thread_num(void) 返回当前的线程号(omp_get_thread_ID更好一些)
int omp_set_num_threads(void) 设置进入并行区域时,将要创建的线程个数
3.1 并行区域
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #pragma omp parallel //大括号内为并行区域
- {
-
- }
#pragma omp parallel //大括号内为并行区域 { //put parallel code here. }
3.2 库函数示例
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #include <iostream>
- #include <omp.h>
-
- int main()
- {
- std::cout << "Processors Number: " << omp_get_num_procs() << std::endl;
-
- std::cout << "Parallel area 1" << std::endl;
- #pragma omp parallel
- {
- std::cout << "Threads number: " << omp_get_num_threads() << std::endl;
- std::cout << "; this thread ID is " << omp_get_thread_num() << std::endl;
- }
-
- std::cout << "Parallel area 2" << std::endl;
- #pragma omp parallel
- {
- std::cout << "Number of threads: " << omp_get_num_threads() << std::endl;
- std::cout << "; this thread ID is " << omp_get_thread_num() << std::endl;
- }
-
- return 0;
- }
#include <iostream> #include <omp.h> int main() { std::cout << "Processors Number: " << omp_get_num_procs() << std::endl; std::cout << "Parallel area 1" << std::endl; #pragma omp parallel { std::cout << "Threads number: " << omp_get_num_threads() << std::endl; std::cout << "; this thread ID is " << omp_get_thread_num() << std::endl; } std::cout << "Parallel area 2" << std::endl; #pragma omp parallel { std::cout << "Number of threads: " << omp_get_num_threads() << std::endl; std::cout << "; this thread ID is " << omp_get_thread_num() << std::endl; } return 0; }
3.3 for循环并行化的基本用法
3.3.1 数据不相关性
利用openmp实现for循环的并行化,需满足数据的不相关性。
在循环并行化时,多个线程同时执行循环,迭代的顺序是不确定的。如果数据是非相关的,那么可以采用基本的 #pragma omp parallel for 预处理指示符。
如果语句S2与语句S1相关,那么必然存在以下两种情况之一:
1. 语句S1在一次迭代中访问存储单元L,而S2在随后的一次迭代中访问同一存储单元,称之为循环迭代相关(loop carried dependence);
2. S1和S2在同一循环迭代中访问同一存储单元L,但S1的执行在S2之前,称之为非循环迭代相关(loop-independent dependence)。
3.3.2 for循环并行化的几种声明形式
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #include <iostream>
- #include <omp.h>
-
- int main()
- {
-
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < 10; ++i)
- {
- std::cout << i << std::endl;
- }
- }
-
-
- #pragma omp parallel for
- for (int i = 0; i < 10; ++i)
- {
- std::cout << i << std:: endl;
- }
-
- return 0;
- }
#include <iostream> #include <omp.h> int main() { //声明形式一 #pragma omp parallel { #pragma omp for for (int i = 0; i < 10; ++i) { std::cout << i << std::endl; } } //声明形式二 #pragma omp parallel for for (int i = 0; i < 10; ++i) { std::cout << i << std:: endl; } return 0; }
上面代码的两种声明形式是一样的,可见第二种形式更为简洁。不过,第一种形式有一个好处:可以在并行区域内、for循环以外插入其他并行代码。
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
-
- #pragma omp parallel
- {
- std::cout << "OK." << std::endl;
- #pragma omp for
- for(int i = 0; i < 10; ++i)
- {
- std::cout << i << std::endl;
- }
- }
-
-
-
- #pragma omp parallel for
-
- for(int i = 0; i < 10; ++i)
- {
- std::cout << i << std::endl;
- }
3.3.3 for 循环并行化的约束条件
尽管OpenMP可以很方便地对for循环进行并行化,但并不是所有的for循环都可以并行化。下面几种情形的for循环便不可以:
1. for循环的循环变量必须是有符号型。例如,for(unsigned int i = 0; i < 10; ++i){...}编译不通过。
2. for循环的比较操作符必须是<, <=, >, >=。例如,for(int i = 0; i != 10; i++)编译不通过。
3. for循环的增量必须是整数的加减,而且必须是一个循环不变量。例如,for(int i = 0; i < 10; i = i+1)编译不通过,感觉只能++i, i++, --i, i--。
4. for循环的比较操作符如果是<, <=,那么循环变量只能增加。例如,for(int i = 0; i != 10; --i)编译不通过。
5. 循环必须是单入口,单出口。循环内部不允许能够达到循环以外的跳出语句,exit除外。异常的处理也不必须在循环体内部处理。例如,如循环体内的break或者goto语句,会导致编译不通过。
3.3.4 基本for循环并行化示例
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #include <iostream>
- #include <omp.h>
-
- int main()
- {
- int a[10] = {1};
- int b[10] = {2};
- int c[10] = {3};
-
- #pragma omp parallel
- {
- #pragma omp for
- for(int i = 0; i < 10; ++i)
- {
-
- c[i] = a[i] + b[i];
- }
- }
-
- return 0;
- }
#include <iostream> #include <omp.h> int main() { int a[10] = {1}; int b[10] = {2}; int c[10] = {3}; #pragma omp parallel { #pragma omp for for(int i = 0; i < 10; ++i) { //c[i]只与a[i]和b[i]相关 c[i] = a[i] + b[i]; } } return 0; }
3.3.5 嵌套for循环
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- #include <iostream>
- #include <omp.h>
-
- int main()
- {
- #pragma omp parallel
- {
- #pragma omp for
- for(int i = 0; i < 10; ++i)
- {
- for(int j = 0; j < 10; ++j)
- {
- c[i][j] = a[i][j] + b[i][j];
- }
- }
- }
-
- return 0;
- }
#include <iostream> #include <omp.h> int main() { #pragma omp parallel { #pragma omp for for(int i = 0; i < 10; ++i) { for(int j = 0; j < 10; ++j) { c[i][j] = a[i][j] + b[i][j]; } } } return 0; }
编译器会让第一个CPU完成
view plain
copy to clipboard
print
?
- for(int i = 0; i < 5; ++i)
- {
- for(int j = 0; j < 5; ++j)
- {
- c[i][j] = a[i][j] + b[i][j];
- }
- }
for(int i = 0; i < 5; ++i) { for(int j = 0; j < 5; ++j) { c[i][j] = a[i][j] + b[i][j]; } }
让第二个CPU完成
view plain
copy to clipboard
print
?
·········10········20········30········40········50········60········70········80········90········100·······110·······120·······130·······140·······150
- for(int i = 5; i < 10; ++i)
- {
- for(int j = 5; j < 10; ++j)
- {
- c[i][j] = a[i][j] + b[i][j];
- }
- }