操作系统——Threads 线程

目录​​​​​​​

1. Overview 

2. Multicore Programming 多核编程

2.1 Concurrency vs. Parallelism 并发vs.并行

2.2 Programming challenges 编程挑战

3. Multithreading Models 多线程模型

 3.1 Threading support 线程支持

3.2 User-level threads (ULT) 用户线程

3.3 Kernel-level threads (KLT) 内核线程

3.4 Multithreading Models 多线程模型

 4. Thread Libraries 线程库

5. Implicit threading 

5.1 Managing threads 显示线程

5.2 Implicit threading 隐式线程

6. Threading issues / Designing multithreaded programs

6.1 Threading issues 多线程问题

6.1 fork() system call 

 6.2 exec() system call

6.3 Semantics of fork() and exec()

6.4 Signal Handling 信号处理

6.5 Thread Cancellation 线程关闭


1. Overview 

OS view: A thread is an independent stream of instructions that can be scheduled to run by the OS. 一个线程是一个独立的指令流,可以由操作系统安排运行。

Software developer view: A thread can be considered as a “procedure” that runs independently from the main program. 一个线程可以被视为一个独立于主程序运行的 "程序"。

  • Sequential program: a single stream of instructions in a program.
  • Multi-threaded program: a program with multiple streams. 使用多内核/CPUs需要多线程。

操作系统——Threads 线程_第1张图片

 Benefits of threads

  1. Takes less time to create a new thread than a process.
  2. Less time to terminate a thread than a process.
  3. Switching between two threads takes less time than switching between processes.
  4. Threads enhance efficiency in communication between programs.

Examples

用户在word输入文本:打开一个文件并输入文本(一个线程),文本自动格式化(另一个线程),文本自动识别拼写错误(另一个线程),文件自动保存到磁盘(另一个线程)。

Typing in a file is a process, formatting, spells checking, saving these are threads.

操作系统——Threads 线程_第2张图片

Threads are scheduled on a processor, and each thread can execute a set of instructions independent of other processes and threads.  线程被安排在处理器上,每个线程可以独立于其他进程和线程执行一组指令。

Thread Control Block (TCB) stores the information about a thread

操作系统——Threads 线程_第3张图片

它与属于同一进程的其他线程共享其代码区、数据区和其他操作系统资源,如开放文件和信号。

 A thread is a basic unit of CPU utilization; it comprises a thread ID, a program counter, a register set, and a stack.  线程是CPU利用率的基本单位。

操作系统——Threads 线程_第4张图片

线程状态:

  1. Undefined: 未创建的线程处于undefined状态;
  2. Ready: 当线程就绪时,处于ready状态;
  3. Running: 同一时刻只能有一个线程处于running状态;
  4. Suspended: 当线程被中断运行(suspend),处于suspended状态,随后可调用resume()回归就绪状态;
  5. Terminated & Destroyed: 处于中断terminated状态的进程会被释放,随后处于销毁destroyed状态;

2. Multicore Programming 多核编程

Whether the cores appear across CPU chips or within CPU chips, we call these systems multicore or multiprocessor systems. Multithreaded programming provides a mechanism for more effificient use of these multiple computing cores and improved concurrency.

2.1 Concurrency vs. Parallelism 并发vs.并行

Concurrent execution on single-core system: Concurrency means multiple tasks which start, run, and complete in overlapping time periods, in no specific order. 并发意味着多个任务在重叠的时间段内启动、运行和完成,没有特定的顺序。

操作系统——Threads 线程_第5张图片

 Parallelism on a multi-core system: A system is parallel if it can perform more than one task simultaneously.  如果一个系统可以同时执行一个以上的任务,那么它就是并行的。

操作系统——Threads 线程_第6张图片

并发(Concurrency)和并行 (Parallelism) 的区别:

并发是指一个处理器同时处理多个任务。 
并行是指多个处理器或者是多核的处理器同时处理多个不同的任务。 
并发是逻辑上的同时发生,而并行是物理上的同时发生。 
来个比喻:并发是一个人同时吃三个馒头,而并行是三个人同时吃三个馒头。 

2.2 Programming challenges 编程挑战

In general, five areas present challenges in programming for multicore systems:

  1. Identifying tasks. 这涉及到检查应用程序,以找到可以划分为独立的、并发的任务的区域。理想情况下,任务之间是相互独立的,因此可以在各个核心上并行运行。
  2. Balance. 在确定可以并行运行的任务的同时,程序员还必须确保这些任务执行同等价值的工作。
  3. Data splitting. 正如应用程序被划分为独立的任务一样,任务所访问和操作的数据也必须被划分为在不同的核心上运行。
  4. Data dependency. 任务所访问的数据必须检查两个或多个任务之间的依赖关系。当一个任务依赖于另一个任务的数据时,程序员必须确保任务的执行是同步的,以适应数据的依赖性。
  5. Testing and debugging. 当一个程序在多个核心上并行运行时,有可能出现许多不同的执行路径。测试和 调试这样的并发程序,本质上要比 测试和调试单线程的应用程序更加困难。

3. Multithreading Models 多线程模型

 3.1 Threading support 线程支持

User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system. Virtually all contemporary operating systems—including Windows, Linux, Mac OS X, and Solaris—support kernel threads. 用户线程在内核之上得到支持,在没有内核支持的情况下进行管理,而内核线程则由操作系统直接支持和管理。几乎所有当代操作系统--包括Windows、Linux、Mac OS X和Solaris--都支持内核线程。

Multithreading can be supported by

  • User level libraries (without Kernel being aware of it): Library creates and manages threads (user level implementation)
  • Kernel level - Kernel itself: Kernel creates and manages threads (kernel space implementation)

A user process wants to create one or more threads. Kernel can create one (or more) thread(s) for the process. Even a kernel does not support threading, it can create one thread per process (i.e., it can create a process which is a single thread of execution). 

3.2 User-level threads (ULT) 用户线程

User thread: User thread is the unit of execution that is implemented by users and the kernel is not aware of the existence of these threads.

User-level threads are much faster than kernel level threads. All thread management is done by the application by using a thread library.

Advantages and disadvantages of ULT 

Advantages

  • ​​​​​​​Thread switching does not involve the kernel: no mode switching. Therefore fast.
  • Scheduling can be application specific: choose the best algorithm for the situation.
  • Can run on any OS. We only need a thread library.

Disadvantages

  • Most system calls are blocking for processes. So, all threads within a process will be implicitly blocked.
  • The kernel can only assign processors to processes. Two threads within the same process cannot run simultaneously on two processors.

3.3 Kernel-level threads (KLT) 内核线程

Kernel thread: Kernel thread is the unit of execution that is scheduled by the kernel to execute on the CPU. Kernel thread are handled by the operating system directly and the thread management is done by the kernel.

Advantages and disadvantages of KLT

Advantages 

  • The kernel can schedule multiple threads of the same process on multiple processors.
  • Blocking at thread level, not process level (If a thread blocks, the CPU can be assigned to another thread in the same process).
  • Even the kernel routines can be multithreaded.

Disadvantsges

  • Thread switching always involves the kernel. This means 2 mode switches per thread switch.
  • So, it is slower compared to User Level Threads (But faster than a full process switch).

Examples: Solaris

Process includes the user’s address space, stack, and process control block.

User-level threads (threads library): 1) be invisible to the OS. 2) they are the interface for application parallelism.

Kernel threads: the unit that can be dispatched on a processor. 可以在处理器上调度的单元。

Lightweight processes (LWP) - layer between kernel threads and user threads. Each LWP supports one or more ULTs and maps to exactly one KLT. 用户线程和内核线程之间的一层。

Task 2 is equivalent to a pure ULT approach. Tasks 1 and 3 map one or more ULT’s onto a fixed number of LWP’s (&KLT’s).  Note how task 3 maps a single ULT to a single LWP bound to a CPU

3.4 Multithreading Models 多线程模型

Finally, a relationship must exist between user threads and kernel thread(s). Mapping user level threads to kernel level threads. In a combined system, multiple threads within the same application can run in parallel on multiple processors.

Multithreading models are three types

1. Many-to-One Model 

Many user-level threads are mapped to a single kernel thread. The process can only run one user-level thread at a time because there is only one kernel-level thread associated with the process. Thread management done at user space, by a thread library. 许多用户级线程被映射到一个内核线程。 进程一次只能运行一个用户级线程,因为只有一个内核级线程与该进程相关。 线程管理在用户空间进行,由一个线程库完成。However, very few systems continue to use the model because of its inability to take advantage of multiple processing cores.

操作系统——Threads 线程_第7张图片

Examples: 1) Solaris Green Threads 2) GNU Portable Threads

2. One-to-One Model

Each user thread mapped to one kernel thread. Kernel may implement threading and can manage threads, schedule threads. Kernel is aware of threads. Provides more concurrency; when a thread blocks, another can run. 每个用户线程映射到一个内核线程。 内核可以实现线程,可以管理线程,安排线程。 内核知道线程。 提供更多的并发性;当一个线程阻塞时,另一个可以运行。

操作系统——Threads 线程_第8张图片

 

The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread. Because the overhead of creating kernel threads can burden the performance of an application, most implementations of this model restrict the number of threads supported by the system. Linux, along with the family of Windows operating systems, implement the one-to-one model. 大部分主流操作系统在用这个

3. Many-to-Many Model

Allows many user level threads to be mapped to many kernel threads. Allows the operating system to create a sufficient number of kernel threads. Number of kernel threads may be specific to an either a particular application or a particular machine. The user can create any number of threads and corresponding kernel level threads can run in parallel on multiprocessor. 允许许多用户级线程被映射到许多内核线程。 允许操作系统创建足够数量的内核线程。 内核线程的数量可以具体到某个特定的应用程序或特定的机器。 用户可以创建任意数量的线程,相应的内核级线程可以在多处理器上并行运行。

操作系统——Threads 线程_第9张图片

 

One variation on the many-to-many model still multiplexes many user level threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread. This variation is sometimes referred to as the two-level model.

操作系统——Threads 线程_第10张图片


 4. Thread Libraries 线程库

No matter which thread is implemented, threads can be created, used, and terminated via a set of functions that are part of a Thread API (a thread library).

Thread library provides programmer with API (Application Programming Interface) for creating and managing threads

  • Programmer just have to know the thread library interface (API).
  • Threads may be implemented in user space or kernel space
  • library may be entirely in user space or may get kernel support for threading

三个主要的线程库 Three primary thread libraries: POSIX threads, Java threads, Win32 threads.

Thread creation 创建线程

同步:父线程需要等待子线程结束运行,需要较多数据共享。

异步:父线程创建子线程然后互不影响,几乎不需要数据共享。

Two approaches for implementing thread library:

1) To provide a library entirely in user space with no kernel support

  • all code and data structures for the library exist in user space.
  • invoking a function in the library results in a local function call in user space and not a system call.

2) To implement a kernel-level library supported directly by the operating system.

  • code and data structures for the library exist in kernel space
  • invoking a function in the API for the library typically results in a system call to the kernel.

5. Implicit threading 

5.1 Managing threads 显示线程

1) Explicit threading - the programmer creates and manages threads. 程序员手动创建的线程——显式线程

2) Implicit threading - the compilers and run-time libraries create and manage threads. 编译器和即时库创建的线程——隐式线程

5.2 Implicit threading 隐式线程

Three alternative approaches for designing multithreaded programs: 设计多线程程序的三种方式:

  • Thread pool - create a number of threads at process startup and place them into a pool, where they sit and wait for work. 线程池
  • OpenMP is a set of compiler directives available for C, C++, and Fortran programs that instruct the compiler to automatically generate parallel code where appropriate. 指令集
  • Grand Central Dispatch (GCD) - is an extension to C and C++ available on Apple’s MacOS X and iOS operating systems to support parallelism. 多线程优化技术,更多支持并行的指令集

Other commercial approaches include parallel and concurrent libraries, such as Intel’s Threading Building Blocks (TBB) and several products from Microsoft. The Java language and API have seen signifificant movement toward supporting concurrent programming as well. A notable example is the java.util.concurrent package, which supports implicit thread creation and management.


6. Threading issues / Designing multithreaded programs

6.1 Threading issues 多线程问题

There are a variety of issues to consider with multithreaded programming

  • Semantics of fork() and exec()
  • Signal handling 信号处理
  • Thread cancellation 线程取消

6.1 fork() system call 

The fork() system call is used to create a separate, duplicate process. The semantics of the fork() and exec() system calls change in a multithreaded program.

Creating a thread is done with a fork() system call. A newly created thread is called a child thread, and the thread that is initiated to create the new thread is considered a parent thread. 为创建新线程而启动的线程被称为父线程。

操作系统——Threads 线程_第11张图片

 6.2 exec() system call

If a thread invokes the exec() system call, the program specifified in the parameter to exec() will replace the entire process—including all threads. The exec() system call family replaces the currently running thread with a new thread. 在exec()的参数中指定的程序将替换整个进程——包括所有线程。

The original thread identifier remains the same, and all the internal details, such as stack, data, and instructions. The new thread replaces the executables. 原始线程标识符保持不变,所有内部细节(如堆栈、数据和指令)保持不变。新线程替换可执行文件。

操作系统——Threads 线程_第12张图片

6.3 Semantics of fork() and exec()

Does fork() duplicate only the calling thread or all threads? How should we implement fork?

Logical thing to do is:

1) If exec() will be called after fork(), there is no need to duplicate the threads. They will be replaced anyway. 

2) If exec() will not be called, then it is logical to duplicate the threads as well; so that the child will have as many threads as the parent has. 

使用fork()的两个版本中的哪一个,取决于应用程序的情况。 如果exec()在fork后立即被调用,那么复制所有线程是不必要的,因为exec()参数中指定的程序将取代进程。在这种情况下,只复制calling线程是合适的。 然而,如果独立进程在fork后没有调用exec(),那么独立进程应该复制所有线程。

6.4 Signal Handling 信号处理

A signal is used in UNIX systems to notify a process that a particular event has occurred. A signal may be received either synchronously or asynchronously, depending on the source of and the reason for the event being signaled. All signals, whether synchronous or asynchronous, follow the same pattern: 信号在UNIX系统中被用来通知一个进程发生了一个特定的事件。 信号可以被同步或异步接收,这取决于信号事件的来源和原因。所有的信号,无论是同步的还是异步的,都遵循相同的模式: 

  1. A signal is generated by the occurrence of a particular event.
  2. The signal is delivered to a process
  3. Once delivered, the signal must be handled

There are several signals available in the Unix system. The signal is handled by a signal handler (all signals are handled exactly once).

  • asynchronous signal is generated from outside the process that receives it. 异步信号是由接收它的进程之外产生的。发信号后不等
  • synchronous signal is delivered to the same process that caused the signal to occur. 同步信号被传递给导致信号发生的同一进程。发信号后一直等

6.5 Thread Cancellation 线程关闭

Terminating a thread before it has finished. Need at various cases. 在线程完成之前关闭线程。例如:加载页面时点击取消

Two general approaches:

  • Asynchronous cancellation terminates the target thread immediately
  • Deferred cancellation allows the target thread to periodically check if it should be cancelled. Cancelled thread has sent the cancellation request.

课堂笔记,整理不易,有失偏颇的地方还请大佬们多多指正

你可能感兴趣的:(Operating,System,windows,架构)