目录
1. Overview
2. Multicore Programming 多核编程
2.1 Concurrency vs. Parallelism 并发vs.并行
2.2 Programming challenges 编程挑战
3. Multithreading Models 多线程模型
3.1 Threading support 线程支持
3.2 User-level threads (ULT) 用户线程
3.3 Kernel-level threads (KLT) 内核线程
3.4 Multithreading Models 多线程模型
4. Thread Libraries 线程库
5. Implicit threading
5.1 Managing threads 显示线程
5.2 Implicit threading 隐式线程
6. Threading issues / Designing multithreaded programs
6.1 Threading issues 多线程问题
6.1 fork() system call
6.2 exec() system call
6.3 Semantics of fork() and exec()
6.4 Signal Handling 信号处理
6.5 Thread Cancellation 线程关闭
OS view: A thread is an independent stream of instructions that can be scheduled to run by the OS. 一个线程是一个独立的指令流,可以由操作系统安排运行。
Software developer view: A thread can be considered as a “procedure” that runs independently from the main program. 一个线程可以被视为一个独立于主程序运行的 "程序"。
Benefits of threads
Examples
用户在word输入文本:打开一个文件并输入文本(一个线程),文本自动格式化(另一个线程),文本自动识别拼写错误(另一个线程),文件自动保存到磁盘(另一个线程)。
Typing in a file is a process, formatting, spells checking, saving these are threads.
Threads are scheduled on a processor, and each thread can execute a set of instructions independent of other processes and threads. 线程被安排在处理器上,每个线程可以独立于其他进程和线程执行一组指令。
Thread Control Block (TCB) stores the information about a thread
它与属于同一进程的其他线程共享其代码区、数据区和其他操作系统资源,如开放文件和信号。
A thread is a basic unit of CPU utilization; it comprises a thread ID, a program counter, a register set, and a stack. 线程是CPU利用率的基本单位。
线程状态:
Whether the cores appear across CPU chips or within CPU chips, we call these systems multicore or multiprocessor systems. Multithreaded programming provides a mechanism for more effificient use of these multiple computing cores and improved concurrency.
Concurrent execution on single-core system: Concurrency means multiple tasks which start, run, and complete in overlapping time periods, in no specific order. 并发意味着多个任务在重叠的时间段内启动、运行和完成,没有特定的顺序。
Parallelism on a multi-core system: A system is parallel if it can perform more than one task simultaneously. 如果一个系统可以同时执行一个以上的任务,那么它就是并行的。
并发(Concurrency)和并行 (Parallelism) 的区别:
并发是指一个处理器同时处理多个任务。
并行是指多个处理器或者是多核的处理器同时处理多个不同的任务。
并发是逻辑上的同时发生,而并行是物理上的同时发生。
来个比喻:并发是一个人同时吃三个馒头,而并行是三个人同时吃三个馒头。
In general, five areas present challenges in programming for multicore systems:
User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. Virtually all contemporary operating systems—including Windows, Linux, Mac OS X, and Solaris—support kernel threads. 用户线程在内核之上得到支持,在没有内核支持的情况下进行管理,而内核线程则由操作系统直接支持和管理。几乎所有当代操作系统--包括Windows、Linux、Mac OS X和Solaris--都支持内核线程。
Multithreading can be supported by
A user process wants to create one or more threads. Kernel can create one (or more) thread(s) for the process. Even a kernel does not support threading, it can create one thread per process (i.e., it can create a process which is a single thread of execution).
User thread: User thread is the unit of execution that is implemented by users and the kernel is not aware of the existence of these threads.
User-level threads are much faster than kernel level threads. All thread management is done by the application by using a thread library.
Advantages and disadvantages of ULT
Advantages
Disadvantages
Kernel thread: Kernel thread is the unit of execution that is scheduled by the kernel to execute on the CPU. Kernel thread are handled by the operating system directly and the thread management is done by the kernel.
Advantages and disadvantages of KLT
Advantages
Disadvantsges
Examples: Solaris
Process includes the user’s address space, stack, and process control block.
User-level threads (threads library): 1) be invisible to the OS. 2) they are the interface for application parallelism.
Kernel threads: the unit that can be dispatched on a processor. 可以在处理器上调度的单元。
Lightweight processes (LWP) - layer between kernel threads and user threads. Each LWP supports one or more ULTs and maps to exactly one KLT. 用户线程和内核线程之间的一层。
Task 2 is equivalent to a pure ULT approach. Tasks 1 and 3 map one or more ULT’s onto a fixed number of LWP’s (&KLT’s). Note how task 3 maps a single ULT to a single LWP bound to a CPU
Finally, a relationship must exist between user threads and kernel thread(s). Mapping user level threads to kernel level threads. In a combined system, multiple threads within the same application can run in parallel on multiple processors.
Multithreading models are three types
1. Many-to-One Model
Many user-level threads are mapped to a single kernel thread. The process can only run one user-level thread at a time because there is only one kernel-level thread associated with the process. Thread management done at user space, by a thread library. 许多用户级线程被映射到一个内核线程。 进程一次只能运行一个用户级线程,因为只有一个内核级线程与该进程相关。 线程管理在用户空间进行,由一个线程库完成。However, very few systems continue to use the model because of its inability to take advantage of multiple processing cores.
Examples: 1) Solaris Green Threads 2) GNU Portable Threads
2. One-to-One Model
Each user thread mapped to one kernel thread. Kernel may implement threading and can manage threads, schedule threads. Kernel is aware of threads. Provides more concurrency; when a thread blocks, another can run. 每个用户线程映射到一个内核线程。 内核可以实现线程,可以管理线程,安排线程。 内核知道线程。 提供更多的并发性;当一个线程阻塞时,另一个可以运行。
The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread. Because the overhead of creating kernel threads can burden the performance of an application, most implementations of this model restrict the number of threads supported by the system. Linux, along with the family of Windows operating systems, implement the one-to-one model. 大部分主流操作系统在用这个
3. Many-to-Many Model
Allows many user level threads to be mapped to many kernel threads. Allows the operating system to create a sufficient number of kernel threads. Number of kernel threads may be specific to an either a particular application or a particular machine. The user can create any number of threads and corresponding kernel level threads can run in parallel on multiprocessor. 允许许多用户级线程被映射到许多内核线程。 允许操作系统创建足够数量的内核线程。 内核线程的数量可以具体到某个特定的应用程序或特定的机器。 用户可以创建任意数量的线程,相应的内核级线程可以在多处理器上并行运行。
One variation on the many-to-many model still multiplexes many user level threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread. This variation is sometimes referred to as the two-level model.
No matter which thread is implemented, threads can be created, used, and terminated via a set of functions that are part of a Thread API (a thread library).
Thread library provides programmer with API (Application Programming Interface) for creating and managing threads
三个主要的线程库 Three primary thread libraries: POSIX threads, Java threads, Win32 threads.
Thread creation 创建线程
同步:父线程需要等待子线程结束运行,需要较多数据共享。
异步:父线程创建子线程然后互不影响,几乎不需要数据共享。
Two approaches for implementing thread library:
1) To provide a library entirely in user space with no kernel support
2) To implement a kernel-level library supported directly by the operating system.
1) Explicit threading - the programmer creates and manages threads. 程序员手动创建的线程——显式线程
2) Implicit threading - the compilers and run-time libraries create and manage threads. 编译器和即时库创建的线程——隐式线程
Three alternative approaches for designing multithreaded programs: 设计多线程程序的三种方式:
Other commercial approaches include parallel and concurrent libraries, such as Intel’s Threading Building Blocks (TBB) and several products from Microsoft. The Java language and API have seen signifificant movement toward supporting concurrent programming as well. A notable example is the java.util.concurrent package, which supports implicit thread creation and management.
There are a variety of issues to consider with multithreaded programming
The fork() system call is used to create a separate, duplicate process. The semantics of the fork() and exec() system calls change in a multithreaded program.
Creating a thread is done with a fork() system call. A newly created thread is called a child thread, and the thread that is initiated to create the new thread is considered a parent thread. 为创建新线程而启动的线程被称为父线程。
If a thread invokes the exec() system call, the program specifified in the parameter to exec() will replace the entire process—including all threads. The exec() system call family replaces the currently running thread with a new thread. 在exec()的参数中指定的程序将替换整个进程——包括所有线程。
The original thread identifier remains the same, and all the internal details, such as stack, data, and instructions. The new thread replaces the executables. 原始线程标识符保持不变,所有内部细节(如堆栈、数据和指令)保持不变。新线程替换可执行文件。
Does fork() duplicate only the calling thread or all threads? How should we implement fork?
Logical thing to do is:
1) If exec() will be called after fork(), there is no need to duplicate the threads. They will be replaced anyway.
2) If exec() will not be called, then it is logical to duplicate the threads as well; so that the child will have as many threads as the parent has.
使用fork()的两个版本中的哪一个,取决于应用程序的情况。 如果exec()在fork后立即被调用,那么复制所有线程是不必要的,因为exec()参数中指定的程序将取代进程。在这种情况下,只复制calling线程是合适的。 然而,如果独立进程在fork后没有调用exec(),那么独立进程应该复制所有线程。
A signal is used in UNIX systems to notify a process that a particular event has occurred. A signal may be received either synchronously or asynchronously, depending on the source of and the reason for the event being signaled. All signals, whether synchronous or asynchronous, follow the same pattern: 信号在UNIX系统中被用来通知一个进程发生了一个特定的事件。 信号可以被同步或异步接收,这取决于信号事件的来源和原因。所有的信号,无论是同步的还是异步的,都遵循相同的模式:
There are several signals available in the Unix system. The signal is handled by a signal handler (all signals are handled exactly once).
Terminating a thread before it has finished. Need at various cases. 在线程完成之前关闭线程。例如:加载页面时点击取消
Two general approaches:
课堂笔记,整理不易,有失偏颇的地方还请大佬们多多指正