|
Blaise Barney, Lawrence Livermore National Laboratory
目录表
摘要 |
在多处理器共享内存的架构中(如:对称多处理系统SMP),线程可以用于实现程序的并行性。历史上硬件销售商实现了各种私有版本的多线程库,使得软件开发者不得不关心它的移植性。对于UNIX系统,IEEE POSIX 1003.1标准定义了一个C语言多线程编程接口。依附于该标准的实现被称为POSIX theads 或 Pthreads。
该教程介绍了Pthreads的概念、动机和设计思想。内容包含了Pthreads API主要的三大类函数:线程管理(Thread Managment)、互斥量(Mutex Variables)和条件变量(Condition Variables)。向刚开始学习Pthreads的程序员提供了演示例程。
适于:刚开始学习使用线程实现并行程序设计;对于C并行程序设计有基本了解。不熟悉并行程序设计的可以参考EC3500: Introduction To Parallel Computing。
Pthreads 概述 |
什么是线程?
UNIX PROCESS |
THREADS WITHIN A UNIX PROCESS |
Pthreads 概述 |
什么是 Pthreads?
Pthreads 概述 |
为什么使用 Pthreads?
例如,下表比较了fork()函数和pthread_create()函数所用的时间。计时反应了50,000个进程/线程的创建,使用时间工具实现,单位是秒,没有优化标志。
备注:不要期待系统和用户时间加起来就是真实时间,因为这些SMP系统有多个CPU同时工作。这些都是近似值。
平台 |
fork() |
pthread_create() |
||||
real |
user |
sys |
real |
user |
sys |
|
AMD 2.4 GHz Opteron (8cpus/node) |
41.07 |
60.08 |
9.01 |
0.66 |
0.19 |
0.43 |
IBM 1.9 GHz POWER5 p5-575 (8cpus/node) |
64.24 |
30.78 |
27.68 |
1.75 |
0.69 |
1.10 |
IBM 1.5 GHz POWER4 (8cpus/node) |
104.05 |
48.64 |
47.21 |
2.01 |
1.00 |
1.52 |
INTEL 2.4 GHz Xeon (2 cpus/node) |
54.95 |
1.54 |
20.78 |
1.64 |
0.67 |
0.90 |
INTEL 1.4 GHz Itanium2 (4 cpus/node) |
54.54 |
1.07 |
22.22 |
2.03 |
1.26 |
0.67 |
fork_vs_thread.txt
Platform |
MPI Shared Memory Bandwidth |
Pthreads Worst Case |
AMD 2.4 GHz Opteron |
1.2 |
5.3 |
IBM 1.9 GHz POWER5 p5-575 |
4.1 |
16 |
IBM 1.5 GHz POWER4 |
2.1 |
4 |
Intel 1.4 GHz Xeon |
0.3 |
4.3 |
Intel 1.4 GHz Itanium 2 |
1.8 |
6.4 |
Pthreads 概述 |
使用线程设计程序
并行编程:
共享内存模型(Shared Memory Model):
线程安全(Thread-safeness):
Pthreads API |
Routine Prefix |
Functional Group |
pthread_ |
线程本身和各种相关函数 |
pthread_attr_ |
线程属性对象 |
pthread_mutex_ |
互斥量 |
pthread_mutexattr_ |
互斥量属性对象 |
pthread_cond_ |
条件变量 |
pthread_condattr_ |
条件变量属性对象 |
pthread_key_ |
线程数据键(Thread-specific data keys) |
编译多线程程序 |
Compiler / Platform |
Compiler Command |
Description |
IBM |
xlc_r / cc_r |
C (ANSI / non-ANSI) |
xlC_r |
C++ |
|
xlf_r -qnosave |
Fortran - using IBM's Pthreads API (non-portable) |
|
INTEL |
icc -pthread |
C |
icpc -pthread |
C++ |
|
PathScale |
pathcc -pthread |
C |
pathCC -pthread |
C++ |
|
PGI |
pgcc -lpthread |
C |
pgCC -lpthread |
C++ |
|
GNU |
gcc -pthread |
GNU C |
g++ -pthread |
GNU C++ |
线程管理(Thread Management) |
创建和结束线程
函数:
pthread_create (thread,attr,start_routine,arg) pthread_exit (status) pthread_attr_init (attr) pthread_attr_destroy (attr) |
创建线程:
|
|
Q:一个线程被创建后,怎么知道操作系统何时调度该线程使之运行? A:除非使用了Pthreads的调度机制,否则线程何时何地被执行取决于操作系统的实现。强壮的程序应该不依赖于线程执行的顺序。 |
线程属性:
结束终止:
例子: Pthread 创建和终止
Example Code - Pthread Creation and Termination #include #include #define NUM_THREADS 5
void *PrintHello(void *threadid) { int tid; tid = (int)threadid; printf("Hello World! It's me, thread #%d!/n", tid); pthread_exit(NULL); }
int main (int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc, t; for(t=0; t printf("In main: creating thread %d/n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("ERROR; return code from pthread_create() is %d/n", rc); exit(-1); } } pthread_exit(NULL); }
|
线程管理 |
向线程传递参数
|
|
Q:怎样安全地向一个新创建的线程传递数据? A:确保所传递的数据是线程安全的(不能被其他线程修改)。下面三个例子演示了那个应该和那个不应该。 |
Example 1 - Thread Argument Passing 下面的代码片段演示了如何向一个线程传递一个简单的整数。主线程为每一个线程使用一个唯一的数据结构,确保每个线程传递的参数是完整的。 int *taskids[NUM_THREADS];
for(t=0; t { taskids[t] = (int *) malloc(sizeof(int)); *taskids[t] = t; printf("Creating thread %d/n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *) taskids[t]); ... }
|
Example 2 - Thread Argument Passing 例子展示了用结构体向线程设置/传递参数。每个线程获得一个唯一的结构体实例。 struct thread_data{ int thread_id; int sum; char *message; };
struct thread_data thread_data_array[NUM_THREADS];
void *PrintHello(void *threadarg) { struct thread_data *my_data; ... my_data = (struct thread_data *) threadarg; taskid = my_data->thread_id; sum = my_data->sum; hello_msg = my_data->message; ... }
int main (int argc, char *argv[]) { ... thread_data_array[t].thread_id = t; thread_data_array[t].sum = sum; thread_data_array[t].message = messages[t]; rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &thread_data_array[t]); ... }
|
Example 3 - Thread Argument Passing (Incorrect) 例子演示了错误地传递参数。循环会在线程访问传递的参数前改变传递给线程的地址的内容。 int rc, t;
for(t=0; t { printf("Creating thread %d/n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &t); ... }
|
线程管理 |
连接(Joining)和分离(Detaching)线程
函数:
pthread_join (threadid,status) pthread_detach (threadid,status) pthread_attr_setdetachstate (attr,detachstate) pthread_attr_getdetachstate (attr,detachstate) |
连接:
可连接(Joinable or Not)?
分离(Detaching):
建议:
例子: Pthread Joining
Example Code - Pthread Joining 这个例子演示了用Pthread join函数去等待线程终止。因为有些实现并不是默认创建线程是可连接状态,例子中显式地将其创建为可连接的。 #include #include #define NUM_THREADS 3
void *BusyWork(void *null) { int i; double result=0.0; for (i=0; i<1000000; i++) { result = result + (double)random(); } printf("result = %e/n",result); pthread_exit((void *) 0); }
int main (int argc, char *argv[]) { pthread_t thread[NUM_THREADS]; pthread_attr_t attr; int rc, t; void *status;
/* Initialize and set thread detached attribute */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for(t=0; t { printf("Creating thread %d/n", t); rc = pthread_create(&thread[t], &attr, BusyWork, NULL); if (rc) { printf("ERROR; return code from pthread_create() is %d/n", rc); exit(-1); } }
/* Free attribute and wait for the other threads */ pthread_attr_destroy(&attr); for(t=0; t { rc = pthread_join(thread[t], &status); if (rc) { printf("ERROR; return code from pthread_join() is %d/n", rc); exit(-1); } printf("Completed join with thread %d status= %ld/n",t, (long)status); }
pthread_exit(NULL); }
|
线程管理 |
栈管理
函数:
pthread_attr_getstacksize (attr, stacksize) pthread_attr_setstacksize (attr, stacksize) pthread_attr_getstackaddr (attr, stackaddr) pthread_attr_setstackaddr (attr, stackaddr) |
防止栈问题:
在LC上的一些实际例子:
Node |
#CPUs |
Memory (GB) |
Default Size |
AMD Opteron |
8 |
16 |
2,097,152 |
Intel IA64 |
4 |
8 |
33,554,432 |
Intel IA32 |
2 |
4 |
2,097,152 |
IBM Power5 |
8 |
32 |
196,608 |
IBM Power4 |
8 |
16 |
196,608 |
IBM Power3 |
16 |
16 |
98,304 |
例子: 栈管理
Example Code - Stack Management 这个例子演示了如何去查询和设定线程栈大小。 #include #include #define NTHREADS 4 #define N 1000 #define MEGEXTRA 1000000
pthread_attr_t attr;
void *dowork(void *threadid) { double A[N][N]; int i,j,tid; size_t mystacksize;
tid = (int)threadid; pthread_attr_getstacksize (&attr, &mystacksize); printf("Thread %d: stack size = %li bytes /n", tid, mystacksize); for (i=0; i for (j=0; j A[i][j] = ((i*j)/3.452) + (N-i); pthread_exit(NULL); }
int main(int argc, char *argv[]) { pthread_t threads[NTHREADS]; size_t stacksize; int rc, t;
pthread_attr_init(&attr); pthread_attr_getstacksize (&attr, &stacksize); printf("Default stack size = %li/n", stacksize); stacksize = sizeof(double)*N*N+MEGEXTRA; printf("Amount of stack needed per thread = %li/n",stacksize); pthread_attr_setstacksize (&attr, stacksize); printf("Creating threads with stack size = %li bytes/n",stacksize); for(t=0; t rc = pthread_create(&threads[t], &attr, dowork, (void *)t); if (rc){ printf("ERROR; return code from pthread_create() is %d/n", rc); exit(-1); } } printf("Created %d threads./n", t); pthread_exit(NULL); } |
线程管理 |
其他各种函数:
pthread_self () pthread_equal (thread1,thread2) |
pthread_once (once_control, init_routine) |
pthread_once_t once_control = PTHREAD_ONCE_INIT;
互斥量(Mutex Variables) |
概述
Thread 1 |
Thread 2 |
Balance |
Read balance: $1000 |
|
$1000 |
|
Read balance: $1000 |
$1000 |
|
Deposit $200 |
$1000 |
Deposit $200 |
|
$1000 |
Update balance $1000+$200 |
|
$1200 |
|
Update balance $1000+$200 |
$1200 |
互斥量(Mutex Variables) |
创建和销毁互斥量
函数:
pthread_mutex_init (mutex,attr) pthread_mutex_destroy (mutex) pthread_mutexattr_init (attr) pthread_mutexattr_destroy (attr) |
用法:
互斥量初始化后是解锁的。
注意所有实现都提供了这三个可先的互斥量属性。
互斥量(Mutex Variables) |
锁定和解锁互斥量
函数:
pthread_mutex_lock (mutex) pthread_mutex_trylock (mutex) pthread_mutex_unlock (mutex) |
用法:
· Thread 1 Thread 2 Thread 3
· Lock Lock
· A = 2 A = A+1 A = A*B
· Unlock Unlock
|
|
Q:有多个线程等待同一个锁定的互斥量,当互斥量被解锁后,那个线程会第一个锁定互斥量? A:除非线程使用了优先级调度机制,否则,线程会被系统调度器去分配,那个线程会第一个锁定互斥量是随机的。 |
例子:使用互斥量
Example Code - Using Mutexes 例程演示了线程使用互斥量处理一个点积(dot product)计算。主数据通过一个可全局访问的数据结构被所有线程使用,每个线程处理数据的不同部分,主线程等待其他线程完成计算并输出结果。 #include #include #include
/* The following structure contains the necessary information to allow the function "dotprod" to access its input data and place its output into the structure. */
typedef struct { double *a; double *b; double sum; int veclen; } DOTDATA;
/* Define globally accessible variables and a mutex */
#define NUMTHRDS 4 #define VECLEN 100 DOTDATA dotstr; pthread_t callThd[NUMTHRDS]; pthread_mutex_t mutexsum;
/* The function dotprod is activated when the thread is created. All input to this routine is obtained from a structure of type DOTDATA and all output from this function is written into this structure. The benefit of this approach is apparent for the multi-threaded program: when a thread is created we pass a single argument to the activated function - typically this argument is a thread number. All the other information required by the function is accessed from the globally accessible structure. */
void *dotprod(void *arg) {
/* Define and use local variables for convenience */
int i, start, end, offset, len ; double mysum, *x, *y; offset = (int)arg;
len = dotstr.veclen; start = offset*len; end = start + len; x = dotstr.a; y = dotstr.b;
/* Perform the dot product and assign result to the appropriate variable in the structure. */
mysum = 0; for (i=start; i { mysum += (x[i] * y[i]); }
/* Lock a mutex prior to updating the value in the shared structure, and unlock it upon updating. */ pthread_mutex_lock (&mutexsum); dotstr.sum += mysum; pthread_mutex_unlock (&mutexsum);
pthread_exit((void*) 0); }
/* The main program creates threads which do all the work and then print out result upon completion. Before creating the threads, the input data is created. Since all threads update a shared structure, we need a mutex for mutual exclusion. The main thread needs to wait for all threads to complete, it waits for each one of the threads. We specify a thread attribute value that allow the main thread to join with the threads it creates. Note also that we free up handles when they are no longer needed. */
int main (int argc, char *argv[]) { int i; double *a, *b; void *status; pthread_attr_t attr;
/* Assign storage and initialize values */ a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double)); b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
for (i=0; i { a[i]=1.0; b[i]=a[i]; }
dotstr.veclen = VECLEN; dotstr.a = a; dotstr.b = b; dotstr.sum=0;
pthread_mutex_init(&mutexsum, NULL);
/* Create threads to perform the dotproduct */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for(i=0; i { /* Each thread works on a different set of data. The offset is specified by 'i'. The size of the data for each thread is indicated by VECLEN. */ pthread_create( &callThd[i], &attr, dotprod, (void *)i); }
pthread_attr_destroy(&attr);
/* Wait on the other threads */ for(i=0; i { pthread_join( callThd[i], &status); }
/* After joining, print out the results and cleanup */ printf ("Sum = %f /n", dotstr.sum); free (a); free (b); pthread_mutex_destroy(&mutexsum); pthread_exit(NULL); } Serial version |
条件变量(Condition Variables) |
概述
主线程(Main Thread) o 声明和初始化需要同步的全局数据/变量(如“count”) o 生命和初始化一个条件变量对象 o 声明和初始化一个相关的互斥量 o 创建工作线程A和B |
|
Thread A o 工作,一直到一定的条件满足(如“count”等于一个指定的值) o 锁定相关互斥量并检查全局变量的值 o 调用pthread_cond_wait()阻塞等待Thread-B的信号。注意pthread_cond_wait()能够自动地并且原子地解锁相关的互斥量,以至于它可以被Thread-B使用。 o 当收到信号,唤醒线程,互斥量被自动,原子地锁定。 o 显式解锁互斥量 o 继续 |
Thread B o 工作 o 锁定相关互斥量 o 改变Thread-A所等待的全局变量 o 检查全局变量的值,若达到需要的条件,像Thread-A发信号。 o 解锁互斥量 o 继续 |
Main Thread Join / Continue |
条件变量(Condition Variables) |
创建和销毁条件变量
Routines:
pthread_cond_init (condition,attr) pthread_cond_destroy (condition) pthread_condattr_init (attr) pthread_condattr_destroy (attr) |
Usage:
注意所有实现都提供了线程共享属性。
条件变量(Condition Variables) |
在条件变量上等待(Waiting)和发送信号(Signaling)
函数:
pthread_cond_wait (condition,mutex) pthread_cond_signal (condition) pthread_cond_broadcast (condition) |
用法:
|
使用这些函数时适当的锁定和解锁相关的互斥量是非常重要的。如:
|
例子:使用条件变量
Example Code - Using Condition Variables 例子演示了使用Pthreads条件变量的几个函数。主程序创建了三个线程,两个线程工作,根系“count”变量。第三个线程等待count变量值达到指定的值。 #include #include
#define NUM_THREADS 3 #define TCOUNT 10 #define COUNT_LIMIT 12
int count = 0; int thread_ids[3] = {0,1,2}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv;
void *inc_count(void *idp) { int j,i; double result=0.0; int *my_id = idp;
for (i=0; i pthread_mutex_lock(&count_mutex); count++;
/* Check the value of count and signal waiting thread when condition is reached. Note that this occurs while mutex is locked. */ if (count == COUNT_LIMIT) { pthread_cond_signal(&count_threshold_cv); printf("inc_count(): thread %d, count = %d Threshold reached./n", *my_id, count); } printf("inc_count(): thread %d, count = %d, unlocking mutex/n", *my_id, count); pthread_mutex_unlock(&count_mutex);
/* Do some work so threads can alternate on mutex lock */ for (j=0; j<1000; j++) result = result + (double)random(); } pthread_exit(NULL); }
void *watch_count(void *idp) { int *my_id = idp;
printf("Starting watch_count(): thread %d/n", *my_id);
/* Lock mutex and wait for signal. Note that the pthread_cond_wait routine will automatically and atomically unlock mutex while it waits. Also, note that if COUNT_LIMIT is reached before this routine is run by the waiting thread, the loop will be skipped to prevent pthread_cond_wait from never returning. */ pthread_mutex_lock(&count_mutex); if (count pthread_cond_wait(&count_threshold_cv, &count_mutex); printf("watch_count(): thread %d Condition signal received./n", *my_id); } pthread_mutex_unlock(&count_mutex); pthread_exit(NULL); }
int main (int argc, char *argv[]) { int i, rc; pthread_t threads[3]; pthread_attr_t attr;
/* Initialize mutex and condition variable objects */ pthread_mutex_init(&count_mutex, NULL); pthread_cond_init (&count_threshold_cv, NULL);
/* For portability, explicitly create threads in a joinable state */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_create(&threads[0], &attr, inc_count, (void *)&thread_ids[0]); pthread_create(&threads[1], &attr, inc_count, (void *)&thread_ids[1]); pthread_create(&threads[2], &attr, watch_count, (void *)&thread_ids[2]);
/* Wait for all threads to complete */ for (i=0; i pthread_join(threads[i], NULL); } printf ("Main(): Waited on %d threads. Done./n", NUM_THREADS);
/* Clean up and exit */ pthread_attr_destroy(&attr); pthread_mutex_destroy(&count_mutex); pthread_cond_destroy(&count_threshold_cv); pthread_exit(NULL);
}
|
没有覆盖的主题 |
Pthread API的几个特性在该教程中并没有包含。把它们列在下面:
Pthread 库API参考 |
Pthread Functions |
|
Thread Management |
pthread_create |
pthread_exit |
|
pthread_join |
|
pthread_once |
|
pthread_kill |
|
pthread_self |
|
pthread_equal |
|
pthread_yield |
|
pthread_detach |
|
Thread-Specific Data |
pthread_key_create |
pthread_key_delete |
|
pthread_getspecific |
|
pthread_setspecific |
|
Thread Cancellation |
pthread_cancel |
pthread_cleanup_pop |
|
pthread_cleanup_push |
|
pthread_setcancelstate |
|
pthread_getcancelstate |
|
pthread_testcancel |
|
Thread Scheduling |
pthread_getschedparam |
pthread_setschedparam |
|
Signals |
pthread_sigmask |
Pthread Attribute Functions |
|
Basic Management |
pthread_attr_init |
pthread_attr_destroy |
|
Detachable or Joinable |
pthread_attr_setdetachstate |
pthread_attr_getdetachstate |
|
Specifying Stack Information |
pthread_attr_getstackaddr |
pthread_attr_getstacksize |
|
pthread_attr_setstackaddr |
|
pthread_attr_setstacksize |
|
Thread Scheduling Attributes |
pthread_attr_getschedparam |
pthread_attr_setschedparam |
|
pthread_attr_getschedpolicy |
|
pthread_attr_setschedpolicy |
|
pthread_attr_setinheritsched |
|
pthread_attr_getinheritsched |
|
pthread_attr_setscope |
|
pthread_attr_getscope |
|
Mutex Functions |
|
Mutex Management |
pthread_mutex_init |
pthread_mutex_destroy |
|
pthread_mutex_lock |
|
pthread_mutex_unlock |
|
pthread_mutex_trylock |
|
Priority Management |
pthread_mutex_setprioceiling |
pthread_mutex_getprioceiling |
|
Mutex Attribute Functions |
|
Basic Management |
pthread_mutexattr_init |
pthread_mutexattr_destroy |
|
Sharing |
pthread_mutexattr_getpshared |
pthread_mutexattr_setpshared |
|
Protocol Attributes |
pthread_mutexattr_getprotocol |
pthread_mutexattr_setprotocol |
|
Priority Management |
pthread_mutexattr_setprioceiling |
pthread_mutexattr_getprioceiling |
|
Condition Variable Functions |
|
Basic Management |
pthread_cond_init |
pthread_cond_destroy |
|
pthread_cond_signal |
|
pthread_cond_broadcast |
|
pthread_cond_wait |
|
pthread_cond_timedwait |
|
Condition Variable Attribute Functions |
|
Basic Management |
pthread_condattr_init |
pthread_condattr_destroy |
|
Sharing |
pthread_condattr_getpshared |
pthread_condattr_setpshared |
参考资料 |
译者序 |
三天时间,终于在工作期间,抽空把上一篇POSIX threads programing翻译完了。由于水平有限,翻译质量差强人意,若有不合理或错误之处,请您之处,在此深表感谢!有疑问点此查看原文。在参考部分提及的几本关于Pthreads库的大作及该文章原文和译文可在下面的连接下载:
author: david([email protected])
code page:http://code.google.com/p/heavenhell/