进程、线程和协程

进程、线程和协程

22/3/1阿里面试,面试官对项目(一个基于N线程M协程的服务器框架)中线程和协程这个问题讨论的非常深入,但自己无奈只是知道一些基本概念,但项目中我也只是调库来完成的,关于协程也只是知道它是比线程更轻量级的存在。这里搬运一些stackoverflow上面的讨论,对进程线程和协程进行了一并讨论。

进程和线程的差别

来自StackOverflow

回答1(讨论了线程和进行各自具有的资源)

Process
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

每个进程都提供程序执行所需的资源。进程具有虚拟地址空间、可执行程序、系统对象的打开句柄、安全上下文、唯一的进程标识符、环境变量、优先级、最小和最大工作集以及至少一个执行线程。每个进程都从一个线程开始,通常称为主线程,但可以从它的任何线程创建额外的线程。

Thread
A thread is an entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread’s process. Threads can also have their own security context, which can be used for impersonating clients.

线程是进程中可以调度执行的实体。一个进程的所有线程共享它的虚拟地址空间和系统资源。此外,每个线程都维护异常处理程序、调度优先级、线程内存空间、唯一线程标识符以及线程上下文(用于线程调度)。线程上下文包括线程的机器寄存器集、内核堆栈、线程环境块和线程进程地址空间中的用户堆栈。线程也可以有自己的安全上下文,可用于模拟客户端。

回答2(讨论了并发和并行)

An executing instance of a program is called a process.
Some operating systems use the term ‘task‘ to refer to a program that is being executed.
A process is always stored in the main memory also termed as the primary memory or random access memory.
Therefore, a process is termed as an active entity. It disappears if the machine is rebooted.
Several process may be associated with a same program.
On a multiprocessor system, multiple processes can be executed in parallel.
On a uni-processor system, though true parallelism is not achieved, a process scheduling algorithm is applied and the processor is scheduled to execute each process one at a time yielding an illusion of concurrency.
Example: Executing multiple instances of the ‘Calculator’ program. Each of the instances are termed as a process.

进程:

  • 程序的执行实例称为进程。
  • 一些操作系统使用术语“任务”来指代正在执行的程序。
  • 进程始终存储在主存中,也称为主存储器或随机存取存储器。
  • 因此,进程被称为活动实体。如果机器重新启动,它就会消失。
  • 几个进程可能与同一个程序相关联。
  • 在多处理器系统上,可以并行执行多个进程。
  • 在单处理器系统上,虽然没有实现真正的并行性,但应用了进程调度算法,并且处理器被调度为一次执行每个进程,从而产生并发错觉。
  • 示例:执行“计算器”程序的多个实例。每个实例都称为一个进程。

Thread:
A thread is a subset of the process.
It is termed as a ‘lightweight process’, since it is similar to a real process but executes within the context of a process and shares the same resources allotted to the process by the kernel.
Usually, a process has only one thread of control – one set of machine instructions executing at a time.
A process may also be made up of multiple threads of execution that execute instructions concurrently.
Multiple threads of control can exploit the true parallelism possible on multiprocessor systems.
On a uni-processor system, a thread scheduling algorithm is applied and the processor is scheduled to run each thread one at a time.
All the threads running within a process share the same address space, file descriptors, stack and other process related attributes.
Since the threads of a process share the same memory, synchronizing the access to the shared data within the process gains unprecedented importance.

线程

  • 线程是进程的子集。
  • 它被称为“轻量级进程”,因为它类似于真实进程,但在进程的上下文中执行并共享内核分配给进程的相同资源。
  • 通常,一个进程只有一个控制线程——一次执行一组机器指令。
    一个进程也可以由同时执行指令的多个执行线程组成。
  • 多线程控制可以利用多处理器系统上可能的真正并行性(这里应该指的是内核态线程)。
  • 在单处理器系统上,应用了线程调度算法,处理器被调度为一次运行每个线程。
  • 在一个进程中运行的所有线程共享相同的地址空间、文件描述符、栈和其他与进程相关的属性(这里的栈应该指的是进程虚拟内存中的栈空间,实际上每一个线程拥有自己的寄存器和堆栈)。
  • 由于进程的线程共享相同的内存,同步访问进程内的共享数据变得前所未有的重要。

很明显,上面已经讲清楚了线程和进程之间的区别,也非常符合我们课本上的定义“进程是资源分配的基本单位,线程是调度的基本单位”。

进程最突出特点就是进程之间资源是独立,这也就导致进程之间的隔离性,必须依赖一些通信手段来实现进程通信。而线程属于进程上的一个调度单元,多线程共享进程的内存空间,所以线程的通信可以利用进程内的空间进行通信,这也就导致了同步问题,因此对共享资源进行读写时必须依赖同步/互斥机制,包括锁/信号量等实现。

线程和协程(或者称为纤程Fiber)的差别

来自StackOverflow

回答1

In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.
With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.
With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.

简单讲,线程通常被认为是抢占式的(不一定正确,具体取决于操作系统),而协程被认为是轻量级的协作线程。两者都是应用程序的执行方式。

使用线程:当前执行路径可能随时被中断或抢占(注意:此语句是一个概括,可能并不总是适用,具体取决于操作系统/线程包/等)。这意味着对于线程而言,数据完整性是一个大问题,因为一个线程可能会在更新一大块数据的过程中停止,从而使数据的完整性处于不良或不完整的状态。这也意味着操作系统可以通过同时运行多个线程来利用多个 CPU 和 CPU 内核,并将其留给开发人员来保护数据访问。

使用协程:当前执行路径仅在协程产生执行时才会中断(与上述相同)。这意味着协程总是在明确定义的位置开始和停止,因此数据完整性不再是问题。此外,由于协程通常在用户空间中进行管理,因此无需进行昂贵的上下文切换和 CPU 状态更改,这使得从一根协程更改为下一个协程非常有效。另一方面,由于没有两条协程可以同时运行,因此仅使用协程并不能利用多个 CPU 或多个 CPU 内核。

回答2

First I would recommend reading this explanation of the difference between processes and threads as background material.
Once you’ve read that it’s pretty straight forward. Threads cans be implemented either in the kernel, in user space, or the two can be mixed. Fibers are basically threads implemented in user space.
What is typically called a thread is a thread of execution implemented in the kernel: what’s known as a kernel thread. The scheduling of a kernel thread is handled exclusively by the kernel, although a kernel thread can voluntarily release the CPU by sleeping if it wants. A kernel thread has the advantage that it can use blocking I/O and let the kernel worry about scheduling. It’s main disadvantage is that thread switching is relatively slow since it requires trapping into the kernel.
Fibers are user space threads whose scheduling is handled in user space by one or more kernel threads under a single process. This makes fiber switching very fast. If you group all the fibers accessing a particular set of shared data under the context of a single kernel thread and have their scheduling handled by a single kernel thread, then you can eliminate synchronization issues since the fibers will effectively run in serial and you have complete control over their scheduling. Grouping related fibers under a single kernel thread is important, since the kernel thread they are running in can be pre-empted by the kernel. This point is not made clear in many of the other answers. Also, if you use blocking I/O in a fiber, the entire kernel thread it is a part of blocks including all the fibers that are part of that kernel thread.
In section 11.4 “Processes and Threads in Windows Vista” in Modern Operating Systems, Tanenbaum comments:
Although fibers are cooperatively scheduled, if there are multiple threads scheduling the fibers, a lot of careful synchronization is required to make sure fi­bers do not interfere with each other. To simplify the interaction between threads and fibers, it is often useful to create only as many threads as there are processors to run them, and affinitize the threads to each run only on a distinct set of avail­able processors, or even just one processor. Each thread can then run a particular subset of the fibers, establishing a one­ to-many relationship between threads and fibers which simplifies synchronization. Even so there are still many difficulties with fibers. Most Win32 libraries are completely unaware of fibers, and applications that attempt to use fibers as if they were threads will encounter various failures. The kernel has no knowledge of fi­bers, and when a fiber enters the kernel, the thread it is executing on may block and the kernel will schedule an arbitrary thread on the processor, making it unavailable to run other fibers. For these reasons fibers are rarely used except when porting code from other systems that explicitly need the functionality pro­vided by fibers.

首先,我建议阅读此对进程和线程之间差异的作为背景材料。

线程既可以在内核中实现,也可以在用户空间中实现,或者两者可以混合使用。协程基本上是在用户空间中实现的线程。

通常所讲的线程一般指的是内核线程。内核线程的调度由内核独占处理,优点是可以根据需要(如I/O阻塞)通过休眠来自愿释放 CPU。它的主要缺点是线程切换相对较慢,因为它需要陷入内核。

协程是用户空间线程,其调度由单个进程下的一个或多个内核线程在用户空间处理。这使得协程切换非常快。如果将所有访问特定共享数据集的协程分组在单个内核线程的上下文中,并由其处理它们的调度,可以消除同步问题,因为协程将有效地串行运行(注意这里应该指的是1线程/N协程)。将相关协程分组在单个内核线程下很重要,因为它们正在运行的内核线程可以被内核抢占。这一点在许多其他答案中都没有明确说明。此外,如果您在协程中使用阻塞 I/O,则这个内核线程及其相关的协程都会阻塞。
在现代操作系统的第 11.4 节“Windows Vista 中的进程和线程”中,Tanenbaum 评论道:

虽然协程是协作调度的,但如果有多个线程调度协程,则需要进行大量仔细的同步以确保协程不会相互干扰(这也就指明了N线程/M协程仍需要同步)。为了简化线程和协程之间的交互,通常只创建与运行它们的处理器一样多的线程,并将线程关联到多个可用处理器上运行,甚至只在一个处理器上运行。然后每个线程可以运行特定的协程子集,从而在线程和协程之间建立一对多的关系,从而简化同步。即便如此,纤维仍然存在许多困难。大多数 Win32 库完全不知道协程,并且尝试像线程一样使用协程的应用程序会遇到各种故障。内核不知道协程,当协程进入内核时,它正在执行的线程可能会阻塞,内核会在处理器上调度随机线程,使其无法运行其他协程。由于这些原因,除非从明确需要协程提供的功能的其他系统移植代码,否则很少使用协程。

核心观点:多线程可以利用起多核CPU的性能,提高对多个事件的整体响应速率,但线程的调度由操作系统指定,对同一块数据的读写存在同步的问题;协程一般在用户态执行,无需进行昂贵上下文切换(线程切换需要进入内核态,涉及用户空间和内核空间的切换,还包括私有的栈和寄存器切换,而协程切换只涉及CPU上下文【注1】,完全在用户态执行),但协程只能跑在线程上面,也就是说如果仅仅开1个线程跑M个协程无法发挥多核CPU的全部性能,所以有了N线程M协程(或者称为N:M线程/调度)这样的模型,而且单个协程的阻塞会导致整个线程的阻塞。所以若非特定场景下,其实多协程模型能做到的多线程一样能做到,N线程/M协程做到的Reactor模型也能做到。

【注1】协程还分为有栈协程和无栈协程,切换时有栈协程除了会切换CPU寄存器外,还会切换私有的函数调用栈(这就意味着有栈协程可以在任意函数调用嵌套中yield和swapin);而无栈协程只切换CPU寄存器(这不意味着无栈协程没有栈,其使用运行时栈/系统栈,导致无法从任意函数嵌套中返回)。

你可能感兴趣的:(服务器,linux,后端,centos,windows)