|
Blaise Barney, Lawrence Livermore National Laboratory
在多处理器共享内存的架构中(如:对称多处理系统),线程可以用于实现程序的并行性。历史上硬件销售商实现了各种私有版本的多线程库,使得软件开发者不得不关心它的移植性。对于IEEE POSIX 1003.1标准定义了一个语言多线程编程接口。依附于该标准的实现被称为POSIX theads Pthreads
该教程介绍了Pthreads的概念、动机和设计思想。内容包含了Pthreads API主要的三大类函数:线程管理(Thread Managment)、互斥量(Mutex Variables)和条件变量(Condition Variables)。向刚开始学习Pthreads的程序员提供了演示例程。
适于:刚开始学习使用线程实现并行程序设计;对于并行程序设计有基本了解。不熟悉并行程序设计的可以参考EC3500: Introduction To Parallel Computing
Pthreads |
什么是线程
|
|
UNIX PROCESS |
THREADS WITHIN A UNIX PROCESS |
Pthreads |
Pthreads?
Pthreads |
为什么使用 Pthreads?
例如,下表比较了fork()pthread_create()函数所用的时间。计时反应了50,000线程的创建,使用时间工具实现,单位是秒,没有优化标志。
备注:不要期待系统和用户时间加起来就是真实时间,因为这些系统有多个同时工作。这些都是近似值。
fork() |
pthread_create() |
|||||
AMD 2.4 GHz Opteron (8cpus/node) |
41.07 |
60.08 |
||||
IBM 1.9 GHz POWER5 p5-575 (8cpus/node) |
64.24 |
30.78 |
27.68 |
|||
IBM 1.5 GHz POWER4 (8cpus/node) |
104.05 |
48.64 |
47.21 |
|||
INTEL 2.4 GHz Xeon (2 cpus/node) |
54.95 |
20.78 |
||||
INTEL 1.4 GHz Itanium2 (4 cpus/node) |
54.54 |
22.22 |
fork_vs_thread.txt
Platform |
MPI Shared Memory Bandwidth |
Pthreads Worst Case |
AMD 2.4 GHz Opteron |
||
IBM 1.9 GHz POWER5 p5-575 |
||
IBM 1.5 GHz POWER4 |
||
Intel 1.4 GHz Xeon |
||
Intel 1.4 GHz Itanium 2 |
Pthreads |
使用线程设计程序
共享内存模型(Shared Memory Model
线程安全(Thread-safeness
Pthreads API |
Routine Prefix |
Functional Group |
pthread_ |
线程本身和各种相关函数 |
pthread_attr_ |
线程属性对象 |
pthread_mutex_ |
|
pthread_mutexattr_ |
互斥量属性对象 |
pthread_cond_ |
|
pthread_condattr_ |
条件变量属性对象 |
pthread_key_ |
线程数据键(Thread-specific data keys |
编译多线程程序 |
Compiler / Platform |
Compiler Command |
Description |
IBM |
xlc_r / cc_r |
C (ANSI / non-ANSI) |
xlC_r |
||
xlf_r -qnosave |
Fortran - using IBM's Pthreads API (non-portable) |
|
INTEL |
icc -pthread |
|
icpc -pthread |
||
PathScale |
pathcc -pthread |
|
pathCC -pthread |
||
PGI |
pgcc -lpthread |
|
pgCC -lpthread |
||
GNU |
gcc -pthread |
|
g++ -pthread |
GNU C++ |
线程管理(Thread Management |
创建和结束线程
:一个线程被创建后,怎么知道操作系统何时调度该线程使之运行? :除非使用了Pthreads的调度机制,否则线程何时何地被执行取决于操作系统的实现。强壮的程序应该不依赖于线程执行的顺序。 |
: Pthread 创建和终止
Example Code - Pthread Creation and Termination #include #include #define NUM_THREADS void *PrintHello(void *threadid) int tid; tid = (int)threadid; printf("Hello World! It's me, thread #%d!\n", tid); pthread_exit(NULL); int main (int argc, char *argv[]) pthread_t threads[NUM_THREADS]; int rc, t; for(t=0; t printf("In main: creating thread %d\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); pthread_exit(NULL); |
向线程传递参数
:怎样安全地向一个新创建的线程传递数据? :确保所传递的数据是线程安全的(不能被其他线程修改)。下面三个例子演示了那个应该和那个不应该。 |
Example 1 - Thread Argument Passing 下面的代码片段演示了如何向一个线程传递一个简单的整数。主线程为每一个线程使用一个唯一的数据结构,确保每个线程传递的参数是完整的。 int *taskids[NUM_THREADS]; for(t=0; t taskids[t] = (int *) malloc(sizeof(int)); *taskids[t] = t; printf("Creating thread %d\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *) taskids[t]); |
Example 2 - Thread Argument Passing 例子展示了用结构体向线程设置传递参数。每个线程获得一个唯一的结构体实例。 struct thread_data{ thread_id; char *message; struct thread_data thread_data_array[NUM_THREADS]; void *PrintHello(void *threadarg) struct thread_data *my_data; my_data = (struct thread_data *) threadarg; taskid = my_data->thread_id; sum = my_data->sum; hello_msg = my_data->message; int main (int argc, char *argv[]) thread_data_array[t].thread_id = t; thread_data_array[t].sum = sum; thread_data_array[t].message = messages[t]; rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &thread_data_array[t]); |
Example 3 - Thread Argument Passing (Incorrect) 例子演示了错误地传递参数。循环会在线程访问传递的参数前改变传递给线程的地址的内容。 int rc, t; for(t=0; t printf("Creating thread %d\n", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &t); |
Joining)和分离(Detaching
Joinable or Not
Detaching
: Pthread Joining
Example Code - Pthread Joining 这个例子演示了用Pthread join函数去等待线程终止。因为有些实现并不是默认创建线程是可连接状态,例子中显式地将其创建为可连接的。 #include #include #define NUM_THREADS void *BusyWork(void *null) int i; double result=0.0; for (i=0; i<1000000; i++) result = result + (double)random(); printf("result = %e\n",result); pthread_exit((void *) 0); int main (int argc, char *argv[]) pthread_t thread[NUM_THREADS]; pthread_attr_t attr; int rc, t; void *status; /* Initialize and set thread detached attribute */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); for(t=0; t printf("Creating thread %d\n", t); rc = pthread_create(&thread[t], &attr, BusyWork, NULL); if (rc) printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); /* Free attribute and wait for the other threads */ pthread_attr_destroy(&attr); for(t=0; t rc = pthread_join(thread[t], &status); if (rc) printf("ERROR; return code from pthread_join() is %d\n", rc); exit(-1); printf("Completed join with thread %d status= %ld\n",t, (long)status); pthread_exit(NULL); |
防止栈问题
上的一些实际例子
Node |
#CPUs |
Memory (GB) |
Default Size |
AMD Opteron |
2,097,152 |
||
Intel IA64 |
33,554,432 |
||
Intel IA32 |
2,097,152 |
||
IBM Power5 |
196,608 |
||
IBM Power4 |
196,608 |
||
IBM Power3 |
98,304 |
Example Code - Stack Management 这个例子演示了如何去查询和设定线程栈大小。 #include #include #define NTHREADS 4 #define N 1000 #define MEGEXTRA 1000000 pthread_attr_t attr; void *dowork(void *threadid) double A[N][N]; int i,j,tid; size_t mystacksize; tid = (int)threadid; pthread_attr_getstacksize (&attr, &mystacksize); printf("Thread %d: stack size = %li bytes \n", tid, mystacksize); for (i=0; i for (j=0; j A[i][j] = ((i*j)/3.452) + (N-i); pthread_exit(NULL); int main(int argc, char *argv[]) pthread_t threads[NTHREADS]; size_t stacksize; int rc, t; pthread_attr_init(&attr); pthread_attr_getstacksize (&attr, &stacksize); printf("Default stack size = %li\n", stacksize); stacksize = sizeof(double)*N*N+MEGEXTRA; printf("Amount of stack needed per thread = %li\n",stacksize); pthread_attr_setstacksize (&attr, stacksize); printf("Creating threads with stack size = %li bytes\n",stacksize); for(t=0; t pthread_create(&threads[t], &attr, dowork, (void *)t); if (rc){ printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); printf("Created %d threads.\n", t); pthread_exit(NULL) |
其他各种函数:
pthread_once (once_control, init_routine) |
pthread_once_t once_control = PTHREAD_ONCE_INIT;
Mutex Variables |
Thread 1 |
Thread 2 |
Balance |
Read balance: $1000 |
$1000 |
|
Read balance: $1000 |
$1000 |
|
Deposit $200 |
$1000 |
|
Deposit $200 |
$1000 |
|
Update balance $1000+$200 |
$1200 |
|
Update balance $1000+$200 |
$1200 |
Mutex Variables |
创建和销毁互斥量
互斥量初始化后是解锁的。
注意所有实现都提供了这三个可先的互斥量属性。
Mutex Variables |
锁定和解锁互斥量
Thread 1 Thread 2 Thread 3
A = A+1 A = A*B
Unlock Unlock
:有多个线程等待同一个锁定的互斥量,当互斥量被解锁后,那个线程会第一个锁定互斥量? :除非线程使用了优先级调度机制,否则,线程会被系统调度器去分配,那个线程会第一个锁定互斥量是随机的。 |
例子:使用互斥量
Example Code - Using Mutexes 例程演示了线程使用互斥量处理一个点积(dot product)计算。主数据通过一个可全局访问的数据结构被所有线程使用,每个线程处理数据的不同部分,主线程等待其他线程完成计算并输出结果。 #include #include #include The following structure contains the necessary information to allow the function "dotprod" to access its input data and place its output into the structure. typedef struct double double double veclen; } DOTDATA; /* Define globally accessible variables and a mutex */ #define NUMTHRDS 4 #define VECLEN 100 DOTDATA dotstr; pthread_t callThd[NUMTHRDS]; pthread_mutex_t mutexsum; The function dotprod is activated when the thread is created. All input to this routine is obtained from a structure of type DOTDATA and all output from this function is written into this structure. The benefit of this approach is apparent for the multi-threaded program: when a thread is created we pass a single argument to the activated function - typically this argument is a thread number. All the other information required by the function is accessed from the globally accessible structure. void *dotprod(void *arg) /* Define and use local variables for convenience */ int i, start, end, offset, len ; double mysum, *x, *y; offset = (int)arg; len = dotstr.veclen; start = offset*len; = start + len; x = dotstr.a; y = dotstr.b; Perform the dot product and assign result to the appropriate variable in the structure. mysum = 0; for (i=start; i mysum += (x[i] * y[i]); Lock a mutex prior to updating the value in the shared structure, and unlock it upon updating. pthread_mutex_lock (&mutexsum); dotstr.sum += mysum; pthread_mutex_unlock (&mutexsum); pthread_exit((void*) 0); The main program creates threads which do all the work and then print out result upon completion. Before creating the threads, the input data is created. Since all threads update a shared structure, we need a mutex for mutual exclusion. The main thread needs to wait for all threads to complete, it waits for each one of the threads. We specify a thread attribute value that allow the main thread to join with the threads it creates. Note also that we free up handles when they are no longer needed. int main (int argc, char *argv[]) int i; double *a, *b; void *status; pthread_attr_t attr; /* Assign storage and initialize values */ a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double)); b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double)); for (i=0; i a[i]=1.0; b[i]=a[i]; dotstr.veclen = VECLEN; dotstr.a = a; dotstr.b = b; dotstr.sum=0; pthread_mutex_init(&mutexsum, NULL); /* Create threads to perform the dotproduct pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); for(i=0; i Each thread works on a different set of data. The offset is specified by 'i'. The size of the data for each thread is indicated by VECLEN. pthread_create( &callThd[i], &attr, dotprod, (void *)i); pthread_attr_destroy(&attr); /* Wait on the other threads */ for(i=0; i pthread_join( callThd[i], &status); /* After joining, print out the results and cleanup */ printf ("Sum = %f \n", dotstr.sum); free (a); free (b); pthread_mutex_destroy(&mutexsum); pthread_exit(NULL); Serial version |
条件变量(Condition Variables |
Main Thread 声明和初始化需要同步的全局数据变量(如“count 生命和初始化一个条件变量对象 声明和初始化一个相关的互斥量 创建工作线程 |
|
Thread A 工作,一直到一定的条件满足(如“count”等于一个指定的值) 锁定相关互斥量并检查全局变量的值 pthread_cond_wait()Thread-B的信号。注意pthread_cond_wait()能够自动地并且原子地解锁相关的互斥量,以至于它可以被Thread-B 当收到信号,唤醒线程,互斥量被自动,原子地锁定。 显式解锁互斥量 |
Thread B 锁定相关互斥量 Thread-A所等待的全局变量 检查全局变量的值,若达到需要的条件,像Thread-A 解锁互斥量 |
Main Thread Join / Continue |
条件变量(Condition Variables |
创建和销毁条件变量
Routines:
Usage:
注意所有实现都提供了线程共享属性。
条件变量(Condition Variables |
在条件变量上等待(Waiting)和发送信号(Signaling
使用这些函数时适当的锁定和解锁相关的互斥量是非常重要的。如:
|
例子:使用条件变量
Example Code - Using Condition Variables 例子演示了使用Pthreads条件变量的几个函数。主程序创建了三个线程,两个线程工作,根系“count”变量。第三个线程等待count变量值达到指定的值。 #include #include #define NUM_THREADS #define TCOUNT 10 #define COUNT_LIMIT 12 count = 0; thread_ids[3] = {0,1,2}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; void *inc_count(void *idp) int j,i; double result=0.0; int *my_id = idp; for (i=0; i pthread_mutex_lock(&count_mutex); count++; Check the value of count and signal waiting thread when condition is reached. Note that this occurs while mutex is locked. if (count == COUNT_LIMIT) { pthread_cond_signal(&count_threshold_cv); printf("inc_count(): thread %d, count = %d Threshold reached.\n", *my_id, count); printf("inc_count(): thread %d, count = %d, unlocking mutex\n", *my_id, count); pthread_mutex_unlock(&count_mutex); /* Do some work so threads can alternate on mutex lock */ for (j=0; j<1000; j++) result = result + (double)random(); pthread_exit(NULL); void *watch_count(void *idp) int *my_id = idp; printf("Starting watch_count(): thread %d\n", *my_id); Lock mutex and wait for signal. Note that the pthread_cond_wait routine will automatically and atomically unlock mutex while it waits. Also, note that if COUNT_LIMIT is reached before this routine is run by the waiting thread, the loop will be skipped to prevent pthread_cond_wait from never returning. pthread_mutex_lock(&count_mutex); if (count pthread_cond_wait(&count_threshold_cv, &count_mutex); printf("watch_count(): thread %d Condition signal received.\n", *my_id); pthread_mutex_unlock(&count_mutex); pthread_exit(NULL); int main (int argc, char *argv[]) int i, rc; pthread_t threads[3]; pthread_attr_t attr; /* Initialize mutex and condition variable objects */ pthread_mutex_init(&count_mutex, NULL); pthread_cond_init (&count_threshold_cv, NULL); /* For portability, explicitly create threads in a joinable state */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_create(&threads[0], &attr, inc_count, (void *)&thread_ids[0]); pthread_create(&threads[1], &attr, inc_count, (void *)&thread_ids[1]); pthread_create(&threads[2], &attr, watch_count, (void *)&thread_ids[2]); /* Wait for all threads to complete */ for (i=0; i pthread_join(threads[i], NULL); printf ("Main(): Waited on %d threads. Done.\n", NUM_THREADS); /* Clean up and exit */ pthread_attr_destroy(&attr); pthread_mutex_destroy(&count_mutex); pthread_cond_destroy(&count_threshold_cv); pthread_exit(NULL); |
没有覆盖的主题 |
Pthread API的几个特性在该教程中并没有包含。把它们列在下面:
Pthread |
Pthread Functions |
|
Thread Management |
pthread_create |
pthread_exit |
|
pthread_join |
|
pthread_once |
|
pthread_kill |
|
pthread_self |
|
pthread_equal |
|
pthread_yield |
|
pthread_detach |
|
Thread-Specific Data |
pthread_key_create |
pthread_key_delete |
|
pthread_getspecific |
|
pthread_setspecific |
|
Thread Cancellation |
pthread_cancel |
pthread_cleanup_pop |
|
pthread_cleanup_push |
|
pthread_setcancelstate |
|
pthread_getcancelstate |
|
pthread_testcancel |
|
Thread Scheduling |
pthread_getschedparam |
pthread_setschedparam |
|
Signals |
pthread_sigmask |
Pthread Attribute Functions |
|
Basic Management |
pthread_attr_init |
pthread_attr_destroy |
|
Detachable or Joinable |
pthread_attr_setdetachstate |
pthread_attr_getdetachstate |
|
Specifying Stack Information |
pthread_attr_getstackaddr |
pthread_attr_getstacksize |
|
pthread_attr_setstackaddr |
|
pthread_attr_setstacksize |
|
Thread Scheduling Attributes |
pthread_attr_getschedparam |
pthread_attr_setschedparam |
|
pthread_attr_getschedpolicy |
|
pthread_attr_setschedpolicy |
|
pthread_attr_setinheritsched |
|
pthread_attr_getinheritsched |
|
pthread_attr_setscope |
|
pthread_attr_getscope |
|
Mutex Functions |
|
Mutex Management |
pthread_mutex_init |
pthread_mutex_destroy |
|
pthread_mutex_lock |
|
pthread_mutex_unlock |
|
pthread_mutex_trylock |
|
Priority Management |
pthread_mutex_setprioceiling |
pthread_mutex_getprioceiling |
|
Mutex Attribute Functions |
|
Basic Management |
pthread_mutexattr_init |
pthread_mutexattr_destroy |
|
Sharing |
pthread_mutexattr_getpshared |
pthread_mutexattr_setpshared |
|
Protocol Attributes |
pthread_mutexattr_getprotocol |
pthread_mutexattr_setprotocol |
|
Priority Management |
pthread_mutexattr_setprioceiling |
pthread_mutexattr_getprioceiling |
|
Condition Variable Functions |
|
Basic Management |
pthread_cond_init |
pthread_cond_destroy |
|
pthread_cond_signal |
|
pthread_cond_broadcast |
|
pthread_cond_wait |
|
pthread_cond_timedwait |
|
Condition Variable Attribute Functions |
|
Basic Management |
pthread_condattr_init |
pthread_condattr_destroy |
|
Sharing |
pthread_condattr_getpshared |
pthread_condattr_setpshared |
三天时间,终于在工作期间,抽空把上一篇POSIX threads programing翻译完了。由于水平有限,翻译质量差强人意,若有不合理或错误之处,请您之处,在此深表感谢!有疑问点此查看原文。在参考部分提及的几本关于Pthreads库的大作及该文章原文和译文可在下面的连接下载:
author: david([email protected])
code page:http://code.google.com/p/heavenhell/