1, Multithreading is a technique that allows one program to do multiple tasks concurrently. -- multithreaded programming (MT)
2, MT provides exactly the right programming paradigm to make maximal use of SMP. -- symmetric multiprocessors (SMP)
Multithreading obviously makes it possible to obtain vastly greater performance than was ever before possible; by taking advantage of multiprocessor machines.
In many instances uniprocessors will also experience a significant performance improvement.
3, Multithreading is not a magic bullet for all your ills1, and it does introduce a new set of programming issues which must be mastered, but it goes a long way toward making your work easier and your programs more efficient.
4, The single threaded program can do only one thing at a time.
5. A thread is an abstract concept that comprises everything a computer does in executing a traditional program. It is the program state that gets scheduled on a CPU, it is the “thing” that does the work. If a process comprises data, code, kernel state, and a set of CPU registers, then a thread is embodied in the contents of those registers—the program counter, the general registers, the stack pointer, etc., and the stack. A thread, viewed at an instant of time, is the state of the computation.
a process is a heavy-weight, kernel-level entity and includes such things as a virtual memory map, file descriptors, user ID, etc., and each process has its own collection of these. The only way for your program to access data in the process structure, to query or change its state, is via a system call.
A thread is a light-weight entity, comprising the registers, stack, and some other data. The rest of the process structure is shared by all threads: the address space, file descriptors, etc. Much (and sometimes all) of the thread structure is in user-space, allowing for very fast access.
6. The actual code (functions, routines, signal handlers, etc.) is global and can be executed on any thread. All threads in a process share the state of that process (below2). They reside in the exact same memory space, see the same functions, see the same data. When one thread alters a process variable (say, the working directory), all the others will see the change when they next access it. If one thread opens a file to read it, all the other threads can also read from it.
7. Concurrency vs. Parallelism
Concurrency means that two or more threads (or LWPs, or traditional processes) can be in the middle of executing code at the same time; it could be the same code, it could be different code. They may or may not be actually executing at the same time, but they are in the middle of it (i.e., one started executing, it was interrupted, and the other one started). Every multitasking operating system has always had numerous concurrent processes, even though only one could be on the CPU at any given time.
Parallelism means that two or more threads actually run at the same time on different CPUs. On a multiprocessor machine, many different threads can run in parallel. They are, of course, also running concurrently.
If your program is correctly written on a uniprocessor, it will run correctly on a multiprocessor. The probability of running into a race condition is the same on both a UP and an MP. If it deadlocks on one, it will deadlock on the other.
8. Synchronization is the method of ensuring that multiple threads coordinate their activities so that one thread doesn’t accidently change data that another thread is working on. This is done by providing function calls that can limit the number of threads that can access some data concurrently. In the simplest case (a Mutual Exclusion Lock—a mutex), only one thread at a time can execute a given piece of code.
9. The main benefits of writing multithreaded programs are:
• Performance gains from multiprocessing hardware (parallelism)
• Increased application throughput
• Increased application responsiveness
• Replacing process-to-process communications
• Efficient use of system resources
• Simplified realtime processing
• Simplified signal handling
• The ability to make use of the inherent concurrency of distributed objects
• There is one binary that runs well on both uniprocessors and multiprocessors
• The ability to create well-structured programs
• There can be a single source for multiple platforms
10. Anything that you can do with threads, you can also do with processes sharing memory. You can do everything with shared memory. It just won’t be as easy or run as fast.
11. A lightweight process can be thought of as a virtual CPU that is available for executing code. Each LWP is separately scheduled by the kernel. It can perform independent system calls and incur independent page faults, and multiple LWPs in the same process can run in parallel on multiple processors. LWPs are scheduled onto the available CPU resources according to their scheduling class and priority. Because scheduling is done on a per-LWP basis, each LWP collects its own kernel statistics—user time, system time, page faults, etc. This also implies that a process with two LWPs will generally get twice as much CPU time as a process with only one LWP. (This is a wild generalization, but you get the idea—the kernel is scheduling LWPs, not processes.)
12. What actually makes up a thread are: its own stack and stack pointer; a program counter; some thread information, such as scheduling priority, and signal mask, stored in the thread structure; and the CPU registers (the stack pointer and program counter are actually just registers). Everything else comes either from the process or (in a few cases) the LWP. The stack is just memory drawn from the program’s heap. A thread could look into and even alter the contents of another thread’s stack if it so desired.(Although you, being a good programmer, would never do this, your bugs might.)
This definition of threads residing in a single address space means that the entire address space is seen identically by all threads. A change in shared data by one thread can be seen by all the other threads in the process. If one thread is writing a data structure while another thread is reading it, there will be problems (see Race Conditions).
As threads share the same process structure, they also share most of the operating system state. Each thread sees the same open files, the same user ID, the same working directory, each uses the same file descriptors, including the file position pointer. If one thread opens a file, another thread can read it. If one thread does an lseek() while another thread is doing a series of reads on the same file descriptor, the results may be, uh..., surprising.
13. A detached thread will clean up after itself upon exit, returning its thread structure, TSD array, and stack to the heap for reuse. A nondetached thread will clean up after itself only after it has been joined.
14. Specifying Scope, Policy, Priority, and Inheritance
There are four aspects of scheduling attributes which you can set when creating a new thread. You can set the scheduling:
Scope: pthread_attr_setscope() allows you to select either PTHREAD_SCOPE_PROCESS (local scheduling, unbound threads) or PTHREAD_SCOPE_SYSTEM (global scheduling, bound threads).
Policy: pthread_attr_setschedpolicy() allows you to select SCHED_RR, SCHED_FIFO, or SCHED_OTHER, or other implementation-defined policies.
Priority: pthread_attr_setschedparam() allows you to set the priority level of a thread by setting the sched_param struct element param.sched_priority. You can also change the parameters of a running thread via pthread_setschedparam(). POSIX gives no advice on how to use the priority levels provided. All you know is that for any given policy, the priority level must be between sched_get_priority_max(policy) and sched_get_priority_min(policy).
Inheritance: pthread_setinheritsched() allows you to specify if the scheduling policy and parameters will be inherited from the creating thread (PTHREAD_INHERIT_SCHED), or will be set directly by the other functions (PTHREAD_EXPLICIT_SCHED).
Unless you are doing realtime work, only scope is of interest, and that will almost always be set to PTHREAD_SCOPE_SYSTEM. POSIX does not specify default
values for these attributes, so you should really set all four.
15. Different Models of Kernel Scheduling
Many Threads on One LWP: Programming on such a model will give you a superior programming paradigm, but running your program on an MP machine will not give you any speedup, and when you make a blocking system call, the whole process will block. However, the thread creation, scheduling, and synchronization is all done 100% in user space, so it’s fast and cheap and uses no kernel resources.
One Thread per LWP: The “One-to-One” model allocates one LWP for each thread. This model allows many threads to run simultaneously on different CPUs. It also allows one or more threads to issue blocking system calls while the other threads continue to run—even on a uniprocessor.
Many Threads on Many LWPs (Strict): The third model is the strict “Many-to-Many” model. Any number of threads are multiplexed onto some (smaller or equal) number of LWPs. Thread creation is done completely in user space, as is scheduling and synchronization (well, almost). The number of LWPs may be tuned for the particular application and machine. Numerous threads can run in parallel on different CPUs, and a blocking system call need not block the whole process. As in the Many-to-One model, the only limit on the number of threads is the size of virtual memory. No one actually uses this strict version.
16. Thread Scheduling
There are two basic ways of scheduling threads:
process local scheduling (known as
Process Contention Scope, or Unbound Threads—the Many-to-Many model) and
system global scheduling (known as
System Contention Scope, or Bound Threads—the One-to-One model). These scheduling classes are known as the scheduling contention scope, and are defined only in POSIX.
Process contention scope scheduling means that all of the scheduling mechanism for the thread is local to the process—the threads library has full control over which thread will be scheduled on an LWP. This also implies the use of either the Many-to-One or Many-to-Many model.
System contention scope scheduling means that the scheduling is done by the kernel. Globally scheduled threads also have a policy and a priority associated with them which further refines the scheduling details at the kernel level. These policies are part of the optional portion of the POSIX specification, and currently none of the vendors implement every possible option.
Use PCS threads only when you are going to have very large numbers of threads and use SCS threads normally.
17. Atomic Actions and Atomic Instructions
Implementation of synchronization requires the existence of an atomic test and set instruction in hardware. This is true for uniprocessor, as well as multiprocessor, machines. Because threads can be preempted at any time, between any two instructions, you must have such an instruction. Sure, there might be only a 10 nanosecond window for disaster to strike, but you still want to avoid it. A test and set instruction tests (or just loads into a register) a word from memory and sets it to some value (typically 1), all in one instruction with no possibility of anything happening in between the two halves (e.g., an interrupt or a write by a different CPU). If the value of the target word was 0, then it gets set to 1 and you are considered to have ownership of the lock. If it already was 1, then it gets set to 1 (i.e., no change) and you don’t have ownership. All synchronization is based upon the existence of this instruction.
18. Critical Sections
A critical section is a section of code that must be allowed to complete atomically with no interruption that affects its completion. We create critical sections by locking a lock (as above), manipulating the data, then releasing the lock afterwards. Such things as incrementing a counter or updating a record in a database need to be critical sections. Other things may go on at the same time, and the thread that is executing in the critical section may even lose its processor, but no other thread may enter the critical section. Should another thread want to execute that same critical section, it will be forced to wait until the first thread finishes.
19. Lock Your Shared Data!
All shared data must be protected by locks. Failure to do so will result in truly ugly bugs. Keep in mind that all means all. All data structures that can be accessed by multiple threads are included. Static variables are included.
20. There are two basic things you want to do. Thing one is that you want to protect shared data. This is what locks do. Thing two is that you want to prevent threads from running when there’s nothing for them to do. You don’t want them spinning, wasting time. This is what semaphores, condition variables, join, barriers, etc. are for.
21. Mutexes
The mutual exclusion lock is the simplest and most primitive synchronization variable. It provides a single, absolute owner for the section of code (thus a critical section) that it brackets between the calls to pthread_mutex_lock() and pthread_mutex_unlock(). The first thread that locks the mutex gets ownership, and any subsequent attempts to lock it will fail, causing the calling thread to go to sleep. When the owner unlocks it, one of the sleepers will be awakened, made runnable, and given the chance to obtain ownership. It is possible that some other thread will call pthread_mutex_lock() and get ownership before the newly awakened thread does. This is perfectly correct behavior and must not
affect the correctness of your program.
Because mutexes protect sections of code, it is not legal for one thread to lock a mutex and for another thread to unlock it.
It is important to realize that although locks are used to protect data, what they really do is protect that section of code that they bracket. There’s nothing that forces another programmer (who writes another function that uses the same data) to lock his code. Nothing but good programming practice.
22. Semaphores
A counting semaphore is a variable that you can increment arbitrarily high, but decrement only to zero. A sem_post() operation increments the semaphore, while a sem_wait() attempts to decrement it. If the semaphore is greater than zero, the operation succeeds; if not, then the calling thread must go to sleep until a different thread increments it.
Although there is a function sem_getvalue() which will return the current value of a semaphore, it is virtually impossible to use correctly because what it returns is what the value of the semaphore was. By the time you use the value it returned, it may well have changed. If you find yourself using sem_getvalue(), look twice, there’s probably a better way to do what you want.
POSIX semaphores are unique among synchronization variables in one particular fashion: They are async safe, meaning that it is legal to call sem_post() from a signal handler . No other synchronization variable is async safe. So, if you want to write a signal handler that causes some other thread to wake up, this is the way to do it.
If you look at the definition of semaphores, you will also notice that they may return from a wait with a legal, non-zero value, -1, with errno set to EINTR.This means that the semaphore was interrupted by a signal and it did not successfully decrement. Your program must not continue from this point as if it did. Correct usage of a semaphore therefore requires that it be executed in a loop. If you block out all signals, then, and only then, can you use semaphores without a loop. Of course you would need to be completely certain that no one who maintains your code ever allows signals. In all of our code, we simply use a help function:
Code Example Semaphore Ignoring EINTR
void SEM_WAIT(sem_t *sem)
{ while (sem_wait(sem) != 0) {}}
23. Condition Variables
In the figure below we show a flow chart for a generalization on semaphores. Here the mutex is visible to the programmer and the condition is arbitrary. The programmer is responsible for locking and unlocking the mutex, testing and changing the condition, and waking up sleepers. Otherwise, it is exactly like a semaphore.
It works like this: A thread obtains a mutex (condition variables always have an associated mutex) and tests the condition under the mutex’s protection. No other thread should alter any aspect of the condition without holding the mutex. If the condition is true, your thread completes its task, releasing the mutex when appropriate. If the condition isn’t true, the mutex is released for you, and your thread goes to sleep on the condition variable. When some other thread changes some aspect of the condition (e.g., it reserves a plane ticket for granny), it calls pthread_cond_signal()7, waking up one sleeping thread. Your thread then reacquires the mutex8, reevaluates the condition, and either succeeds or goes back to sleep, depending upon the outcome.
You must reevaluate the condition! First,the other thread may not have tested the complete condition before sending the wakeup. Second, even if the condition was true when the signal was sent, it could have changed before your thread got to run. Third, condition variables allow for spurious wakeups. They are allowed to wakeup for no discernible reason what-so-ever!
Depending upon your program, you may wish to wake up all the threads that are waiting on a condition. Perhaps they were all waiting for the right time of day to begin background work, or were waiting for a certain network device to become active. A
pthread_cond_broadcast() is used exactly like
pthread_cond_signal(). It is called after some aspect of the condition has changed. It then wakes all of the sleeping threads (in an undefined order), which then must all hurry off to reevaluate the condition. This may cause some contention for the mutex, but that’s OK.
24. Readers/Writer Locks
Sometimes you will find yourself with a shared data structure that gets read often, but written only seldom. The reading of that structure may require a significant amount of time (perhaps it’s a long list through which you do searches). It would seem a real waste to put a mutex around it and require all the threads to go through it one at a time when they’re not changing anything. Hence, readers/writer locks.
As a rule of thumb, a simple global variable will always be locked with a mutex, while searching down a 1000-element, linked list will often be locked with an RWlock.
The operation of RWlocks is as follows: The first reader that requests the lock will get it. Subsequent readers also get the lock, and all of them are allowed to read the data concurrently. When a writer requests the lock, it is put on a sleep queue until all the readers exit. A second writer will also be put on the writer’s sleep queue in priority order. Should a new reader show up at this point, it will be put on the reader’s sleep queue until all the writers have completed. Further writers will also be placed on the same writer’s sleep queue as the others (hence, in front of the waiting reader), meaning that writers are always favored over readers. (Writer priority is simply a choice we made in our implementation, you may make a different choice.) The writers will obtain the lock one at a time, each waiting for the previous writer to complete. When all writers have completed, the entire set of sleeping readers are awakened and can then attempt to acquire the lock. Readers’ priorities are not used.
25. Thread Cancellation
You must ensure that any thread that you are going to cancel is able to release any locks it might hold, free any memory it may have allocated for its own use, and that it leaves the world in a consistent state.
Asynchronous cancellation: when a thread hold some locks, own some resource, have malloc'd some memory, another thread cancell it, there is a trouble.
POSIX has a more elaborate version of cancellation. It defines a cancellation state for each thread which will enable or disable cancellation for that thread. Thus you can disable cancellation during critical sections and reenable it afterwards. Cancellation state makes it feasible to use asynchronous cancellation safely, although there are still significant problems to be dealt with. For example, if your thread has malloc’d some storage and is then cancelled, how do you free that storage?
The other type of cancellation, defined only in POSIX, is known as
deferred cancellation. In this type of cancellation, a thread only exits when it polls the library to find out if it should exit, or when it is blocked in a library call which is a
cancellation point. This polling is done by calling the function
pthread_testcancel(), which in turn just checks to see if a bit has been set. If a request is pending, then pthread_testcancel() will not return at all, and the thread will simply die. Otherwise, it will return, and the thread will continue. You may call
pthread_testcancel() in your own code. In deferred cancellation, a thread may run for an arbitrary amount of time after a cancellation has been issued, thus allowing critical sections to execute without having to disable/enable cancellation.
All pthreads start life with deferred cancellation enabled.
Avoid cancellation if at all possible.
26. Signals
It is always best to avoid using signals in conjunction with threads. At the same time, it is often not possible or practical to keep them separate. When signals and threads meet, beware. If at all possible, use only pthread_sigmask to mask signals in the main thread, and sigwait to handle signals synchronously within a single thread dedicated to that purpose. If you must use sigaction (or equivalent) to handle synchronous signals (such as SIGSEGV) within threads, be especially cautious. Do as little work as possible within the signal-catching function.
All signal actions are process-wide. A program must coordinate any use of sigaction between threads.
While modifying the process signal action, for a signal number is itself thread-safe, there is no protection against some other thread setting a new signal action immediately afterward.
Signals that are not "tied" to a specific hardware execution context are delivered to one arbitrary thread within the process. That means a SIGCHLD raised by a child process termination, for example, may not be delivered to the thread that created the child.
The synchronous "hardware context" signals, including SIGFPE, SIGSEGV, and SIGTRAP, are delivered to the thread that caused the hardware condition, never to another thread.
You cannot kill a thread by sending it a SIGKILL or stop a thread by sending it o SIGSTOP.
Signal actions must always be under the control of a single component, at least, and to assign that responsibility to the main program makes the most sense in nearly all situations.
Each thread has its own private signal mask, which is modified by calling
pthread_sigmask.
A thread can block or unblock signals without affecting the ability of other threads to handle the signal. T
When a thread is created, it inherits the signal mask of the thread that created it--if you want a signal to be masked everywhere, mask it first thing in main.
Within a process, one thread can send a signal to a specific thread (including itself) by calling pthread_kill. When calling pthread_kill, you specify not only the signal number to be delivered, but also the pthread_t identifier for the thread to which you want the signal sent. You cannot use pthread_kill to send a signal to a thread in another process, however, because a thread identifier (pthread_t) is meaningful only within the process that created it.
The signal sent by pthread_kill is handled like any other signal. If the "target" thread has the signal masked, it will be marked pending against that thread. If the thread is waiting for the signal in sigwait , the thread will receive the signal. If the thread does not have the signal masked, and is not blocked in sigwait, the current signal action will be taken.
Remember that, aside from signal-catching functions, signal actions affect the process. Sending the SIGKILL signal to a specific thread using pthread_kill will kill the process, not just the specified thread. Use pthread_cancel to get rid of a particular thread. Sending SIGSTOP to a thread will stop all threads in the process until a SIGCONT is sent by some other process.
Pthreads specifies that
raise(SIGABRT) is the same as
pthread_kill(pthread_self (), SIGABRT).
Remember, you cannot use a mutex within a signal-catching function.
Always use
sigwait to work with asynchronous signals within threaded code. The signals for which you
sigwait must be masked in the sigwaiting thread, and should usually be masked in all threads. Signals are delivered only once. If two threads are blocked in
sigwait, only one of them will receive a signal that's sent to the process.