操作系统——Processes 进程

目录​​​​​​​

1. 导入——什么是操作系统?

Operating System (OS) Operations

2. The notion of a process

Process State

Process Control Block

3. Process Scheduling

Schedulers

 Context switch

4. Operations on Processes

5. Inter-Process Communication (IPC)

Shared-Memory Systems

Message-Passing Systems


1. 导入——什么是操作系统?

首先,我们可以将计算机系统大致分为四个部分(如下图Fig 1所示):

1) the hardware: CPU(central processing unit), the memory and the input/output (I/O) devices

2) the application programs: spreadsheets, compilers and Web browsers (应用程序就是对硬件资源进行合理分配来解决用户的计算问题)

3) the operating system: 针对不同的用户,控制和协调硬件在不同应用程序中的使用

4) the users

操作系统(operating system)像是空气(准确来说是一种媒介),它本身并不具备功能,就像空气只是给人类提供一个环境使得人类能够生存,操作系统也只是提供一个环境使得应用程序能够运行。

操作系统——Processes 进程_第1张图片​ Fig 1

Operating System (OS) Operations

现代操作系统都是中断驱动的(interrupt driven),什么是中断驱动?中断驱动就是把CPU从等待中解脱出来,让它去忙其他事情,而不是一直在这里等待。对于每一种类型的中断,操作系统内都会有相对应的代码段来决定该采取什么行动。这里会专门给一套中断服务程序来解决中断。

因为用户和操作系统共享这台计算机的硬件和软件,因此,我们需要确保在一个程序中的错误只能对该程序的运行造成影响。因为是共享的,一个程序的错误可能会对很多进程造成不利影响。综上所述,我们设计的操作系统一定要确保一个程序中的异常不会对其他程序产生影响。

回归正题,介绍一下OS执行的最常用的操作:

1) Process Management: (本篇文章主要介绍的,后面会持续更新~)

  • Scheduling processes and threads on the CPUs 
  • Creating and deleting both user and system processes
  • Suspending and resuming processes
  • Providing mechanisms for process synchronization
  • Providing mechanisms for process communication

2) Memory Management

3) Storage Management

  • File-System Management
  • Mass-Storage Management
  • Caching
  • I/O Systems

4) Protetion and Security


2. The notion of a process

要想了解process,就要先了解program。

  • program: passive entity, such as a file containing a list of instructions stored on disk (often called an executable fifile).
  • process: active entity, with a program counter specifying the next instruction to execute and a set of associated resources.

Thus, A program becomes a process when an executable file is loaded into memory. In a nutshell, process is a program in execution. 

An operating system executes a variety of programs:

  1. Batch System executes jobs
  2. Time-shared System has use programs or tasks

Batch System: Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.      将作业按照它们的性质分组(或分批),然后再成组(或成批)地提交给计算机系统,由计算机自动完成后再输出结果,从而减少作业建立和结束过程中的时间浪费。批处理可以在预定时间自动运行,也可以根据计算机资源的可用性来运行。

Time-shared System: In computing, time-sharing is the sharing of a computing resource among many users at the same time by means of multi-programming and multi-tasking.          在早期的计算机系统中,计算机处理多个用户发送出的指令的时候,处理的方案即为分时,即计算机把它的运行时间分为多个时间段,并且将这些时间段平均分配给用户们指定的任务。轮流地为每一个任务运行一定的时间,如此循环,直至完成所有任务。

The structure of a process in memory  (如下图Fig 2所示)

  • Stack: Stack contains temporary data, such as function parameters, return addresses, and local variables.
  • Heap: Heap is memory that is dynamically allocated during process run time.
  • Data: A data section contains global variables.
  • Text: A text section includes the current activity, as represented by the value of the program counter and the contents of the processor’s registers.
操作系统——Processes 进程_第2张图片​ Fig 2

Note: A process itself can be an execution environment for other code. (进程本身是可以为代码提供运行环境的) 例如:Java编程环境。在大多数情况下,一个可执行的Java程序是在JVM中执行的。JVM的执行作为一个进程,它解释了java代码转成本机机器指令并代表java代码做出行动。java 命令运行JVM作为一个进程,在虚拟机里依次执行java程序。

Process State

As a process executes, it changes state. The state of a process is defifined in part by the current activity of that process. (如下表所示)

New

The process is being created.

Running

Instructions are being executed.

Waiting

The process is waiting for some event to occur (such as an I/O completion or reception of a signal).

Ready

The process is waiting to be assigned to a processor.

Terminated

The process has fifinished execution.

It is important to realize that only one process can be running on any processor at any instant. Many processes may be ready and waiting. (任何时刻,在任何处理器上只能运行一个进程)

操作系统——Processes 进程_第3张图片​ Fig 3  Diagram of process state

Process Control Block

  • Process Control block is a data structure used for storing the information about a process.
  • Each process is represented in the operating system by a PCB. 
  • PCB of each process resides in the main memory. (PCB的每个进程都驻留在主存中)
  • PCB of all the processes are present in a linked list. (PCB的所有进程存于一个链表中)
  • PCB is important in multiprogramming environment as it captures the information pertaining to the number of processes running simultaneously. (PCB能够捕获与同时运行的进程数量有关的信息)
操作系统——Processes 进程_第4张图片​ Fig 4 PCB

PCB contains many pieces of information associated with a specifific process, including these:

Process state

上面提到过

Program counter

The counter indicates the address of the next instruction to be executed for this process.

CPU registers

寄存器的数量和类型不同,这取决于计算机体系结构。它们包括累加器,索引寄存器,堆栈指针,通用寄存器,以及任何条件代码 信息。与程序计数器一起,这个状态信息必须在中断发生时保存,以确保进程继续正确进行 (Fig 5)

CPU-scheduling information

包含进程优先级,调度队列的指针,其他调度参数

Memory-management information

内存管理信息:

This information may include such items as the value of the base and limit registers and the page tables, or the segment tables, depending on the memory system used by the operating system

Accounting information

This information includes the amount of CPU and real time used, time limits, account numbers, job or process numbers, and so on.

I/O status information

This information includes the list of I/O devices allocated to the process, a list of open fifiles, and so on.

CPU switch from process to process:

操作系统——Processes 进程_第5张图片​ Fig 5 

Diagram showing CPU switch from process to process.

PCB仅仅是一个信息存储库,存储的信息因进程不同而不同。其作用是使一个在多道程序环境下不能独立运行的程序(含数据),成为一个能独立运行的基本单位或与其它进程并发执行的进程。或者说,OS是根据PCB来对并发执行的进程进行控制和管理的。 PCB通常是系统内存占用区中的一个连续存区,它存放着操作系统用于描述进程情况及控制进程运行所需的全部信息

Threads(线程): A process is a program that performs a single thread of execution. Most modern operating systems have extended the process concept to allow a process to have multiple threads of execution and thus to perform more than one task at a time. On a system that supports threads, the PCB is expanded to include information for each thread.


3. Process Scheduling

The process scheduler selects an available process (possibly from a set of several available processes) for program execution on the CPU.

Scheduling Queues (调度队列)

  • Job queue:  As processes enter the system, they are put into a job queue, which consists of all processes in the system. 
  • Ready queue: The processes that are residing in main memoryand are ready and waiting to execute are kept on a list called the ready queue.This queue is generally stored as a linked list.
  • Device queue: The list of processes waiting for a particular I/O device is called a device queue. Each device has its own device queue

当所有进程进入系统会被统一带到job queue。驻留在主存中的进程,准备就绪的进程和等待执行的进程会被保留在list中,这个list就是ready queueready queue通常是用链表实现。ready queue 头部包含指向第一个和最后一个PCB的指针,在ready queue中,每个PCB也会有指针字段指向下一个PCB。 假设一个进程向共享设备(例如磁盘)发送I/O请求,因为系统中有许多进程,这个时候磁盘可能这忙着其他进程的I/O请求,那么这个进程就要等磁盘。因此,这些等磁盘的进程排成的队列就叫device queue。每个设备都有自己的device queue(示意图如Fig 6所示) 。(类似于去医院看病,一个医生肯定要给很多病人看病,那么医生就是设备,病人排成的队就是device queue,每个医生办公室门口都有一排病人排队 PS: 可能不太恰当,理解这个意思就好辣~)

操作系统——Processes 进程_第6张图片​ Fig 6

The ready queue and various I/O device queues

A common representation of process scheduling is a queueing diagram (如图Fig 7所示). A new process is initially put in the ready queue. It waits there until it is selected for execution, or dispatched. Once the process is allocated the CPU and is executing, one of several events could occur:

  • The process could issue an I/O request and then be placed in an I/O queue.
  • The process could create a new child process and wait for the child’s termination.

  • The process could be removed forcibly from the CPU, as a result of an interrupt, and be put back in the ready queue.

在前两项中,进程最终会从等待状态转变为准备就绪状态之后放回ready queue中,一个进程继续这个循环,直到它终止,这时它被从所有队列中删除,其PCB和资源被取消分配。

操作系统——Processes 进程_第7张图片​ Fig 7

Queueing-diagram representation of process scheduling

Schedulers

一个进程在其生命周期中穿梭于各个调度队列,出于调度目的,操作系统必须以某种方式从这些队列中选取进程,这个选择的过程就是由合适调度器进行。

  1. Long-term scheduler / job schedulerOften, in a batch system, more processes are submitted than can be executed immediately. These processes are spooled to a mass-storage device (typically a disk), where they are kept for later execution. The long-term scheduler, or job scheduler, selects processes from this pool and loads them into memory for execution.
    在批处理系统中,提交的进程往往比立即执行的进程要多得多,那么这些进程就会被“卷”在一个大容量存储器里(典型的就是磁盘)。这些进程被存起来用来之后的执行。Long-term scheduler / job scheduler就会从这个存储池中选择进程并且把它们加载进内存用于执行。
  2. Short-term scheduler / CPU scheduler : The short-term scheduler, or CPU scheduler, selects from among the processes that are ready to execute and allocates the CPU to one of them. 它会从准备好的进程中选择,并把其中之一交给CPU。

  3. Mid-term scheduler : The key idea behind a medium-term scheduler is that sometimes it can be advantageous to remove a process from memory and thus reduce the degree of multiprogramming. Later, the process can be reintroduced into memory, and its execution can be continued where it left off. 有时将一个进程从内存中移除是有利的,这会降低了多程序的程度。之后,该进程可以被重新引入到内存中,其执行可以在它停止的地方继续(如图Fig 8所示)。

The primary distinction between these two schedulers lies in frequency of execution.  Often, the short-term scheduler executes at least once every 100 milliseconds. The long-term scheduler may need to be invoked only when a process leaves the system. Because of the longer interval between executions, the long-term scheduler can afford to take more time to decide which process should be selected for execution.

In general, most processes can be described as either I/O-bound or CPU-bound. It is important that the long-term scheduler select a good process mix of I/O-bound and CPU-bound processes.

  1. I/O-bound process: An I/O-bound process is one that spends more of its time doing I/O than it spends doing computations.
  2. CPU-bound process: CPU-bound process, in contrast, generates I/O requests infrequently, using more of its time doing computations.

举个,一些分时系统,例如UNIX或者Microsoft Windows系统,这些系统经常是不会有long-term scheduler的,它们仅仅是把进程放在内存中给short-term scheduler。这些系统的稳定性,要么依赖于物理限制(如可用终端的数量),要么依赖于人类用户的自我调节的特性。如果性能下降到多用户系统无法接受的级别,一些用户就会退出。

操作系统——Processes 进程_第8张图片​ Fig 8

Addition of medium-term scheduling to the queueing diagram

 Context switch

1) Switching the CPU to another process requires performing a state save of the current process and a state restore of a different process. This task is known as a context switch.

2) When CPU switches to another process, the system must save the state of the old process and load the saved state for the new process via a context switch

3) Context of a process represented in the PCB

4) The context is represented in the PCB of the process. It includes the value of the CPU registers, the process state, and memory-management information. 


4. Operations on Processes

大多数系统中的进程可以并发执行,而且它们可以动态地创建和删除。因此,这些系统必须为进程的创建终止提供一个机制。

Parent process create children processes, which, in turn create other processes, forming a tree of processes

Most operating systems (including UNIX, Linux, and Windows) identify processes according to a unique process identififier (or pid), which is typically an integer number. The pid provides a unique value for each process in the system, and it can be used as an index to access various attributes of a process within the kernel. pid可以作为索引访问内核中进程的各种属性

操作系统——Processes 进程_第9张图片​ Fig 9

A tree of processes on a typical Linux system

 Resource sharing options

  • Parent and children share all resources
  • Children share subset of parent’ s resources
  • Parent and child share no resources

When a process creates a new process, two possibilities for execution exist:

  • The parent continues to execute concurrently with its children.
  • The parent waits until some or all of its children have terminated.

There are also two address-space possibilities for the new process:

  • The child process is a duplicate of the parent process (it has the same program and data as the parent).
  • The child process has a new program loaded into it.

在UNIX中,每个进程都有一个独一无二的pid,一个新的进程被fork()系统调用所创建,这个进程由原进程的空间地址副本所组成。这个机制就是的父进程与子进程之间的交流更加容易。这两个进程执行fork()后的指令有一个区别:fork()系统调用返回给子进程的值是0,而子类的非0的pid则返回给了父进程。

Process executes last statement and then asks the operating system to delete it using the exit() system call.

  • Returns status data from child to parent
  • Process’ resources are deallocated by operating system

The parent waits for the child process to complete with the wait() system call. When the child process completes (by either implicitly or explicitly invoking exit()), the parent process resumes from the call to wait(), where it completes using the exit() system call.(如图Fig 10所示)

操作系统——Processes 进程_第10张图片 Fig 10
​​ Process creation using the fork() system call

 

Note that a parent needs to know the identities of its children if it is to terminate them. Thus, when one process creates a new process, the identity of the newly created process is passed to the parent. 父进程想要终止子进程就必须知道子进程的pid,因此,当一个父进程创建一个子进程时,这个子进程的pid是要传给父进程的。

A parent may terminate the execution of one of its children for a variety of reasons, such as these:

  • The child has exceeded its usage of some of the resources that it has been allocated. (To determine whether this has occurred, the parent must have a mechanism to inspect the state of its children.
  • The task assigned to the child is no longer required.
  • The parent is exiting, and the operating system does not allow a child to continue if its parent terminates.

在一些操作系统中,如果父进程终止了,那么子进程是不允许存在的。在这类系统中,不管一个进程是正常还是非正常终止,它们的子进程一定都会被终止。这叫做cascading termination(级联终止),通常由操作系统启用。


5. Inter-Process Communication (IPC)

Independent processes:  A process is independent if it cannot affect or be affected by the other processes executing in the system. Any process that does not share data with any other process is independent. 

Cooperating processes:  A process is cooperating if it can affect or be affected by the other processes executing in the system. Clearly, any process that shares data with other processes is a cooperating process.

There are several reasons for providing an environment that allows process cooperation:

  • Information sharing. Since several users may be interested in the same piece of information (for instance, a shared fifile), we must provide an environment to allow concurrent access to such information.
  • Computation speedup. If we want a particular task to run faster, we must break it into subtasks, each of which will be executing in parallel with the others. Notice that such a speedup can be achieved only if the computer has multiple processing cores.
  • Modularity. We may want to construct the system in a modular fashion, dividing the system functions into separate processes or threads.
  • Convenience. Even an individual user may work on many tasks at the same time. For instance, a user may be editing, listening to music, and compiling in parallel.

Cooperating processes require an interprocess communication (IPC) mechanism that will allow them to exchange data and information. There are two fundamental models of inter-process communication: shared memory and message passing.

In the shared-memory model, a region of memory that is shared by cooperating processes is established. Processes can then exchange information by reading and writing data to the shared region.

In the message-passing model, communication takes place by means of messages exchanged between the cooperating processes.(两种模型如下图Fig 11所示)

操作系统——Processes 进程_第11张图片 Fig 11
​​​​​ (a) Message passing.           (b) Shared memory

 

Shared-Memory Systems

使用共享内存的进程间通信需要通信的进程建立一个共享内存区域。通常情况下,共享内存区域位于创建共享内存段的进程的地址空间中。

producer–consumer problem: To illustrate the concept of cooperating processes, we consider the producer-consumer problem. A producer process produces information that is consumed by a consumer process. 举个,一个编译器能生产供汇编机消耗的汇编码,反过来,汇编机也可以生产让加载器消耗的对象模块。The producer–consumer problem also provides a useful metaphor for the client–server paradigm. We generally think of a server as a producer and a client as a consumer. 

One solution to the producer–consumer problem uses shared memory. To allow producer and consumer processes to run concurrently, we must have available a buffer of items that can be filled by the producer and emptied by the consumer. This buffer will reside in a region of memory that is shared by the producer and consumer processes.

The producer and consumer must be synchronized, so that the consumer does not try to consume an item that has not yet been produced.

Two types of buffers can be used:

  1. The unbounded buffer: The unbounded buffer places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items.
  2. The bounded buffer: The bounded buffer assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full.

Message-Passing Systems

Operating system provides the means for cooperating processes to communicate with each other via a message-passing facility. Message passing provides a mechanism to allow processes to communicate and to synchronize their actions without sharing the same address space.

A message-passing facility provides at least two operations:

  • send(message)
  • receive(message)

Messages sent by a process can be either fixed or variable in size. 如果仅仅是传送尺寸固定的信息,系统层就会被快速启用,但程序的任务会变的更加困难;相反,传送尺寸可变的信息需要很复杂的系统层启用,但程序的任务会变的简单。这是操作系统设计中常见的一种权衡。

If processes P and Q want to communicate, they must send messages to and receive messages from each other: a communication link must exist between them. This link can be implemented in a variety of ways. We are concerned here not with the link’s physical implementation but rather with its logical implementation. 

Here are several methods for logically implementing a link and the send()/receive() operations:

  • Direct or indirect communication
  • Synchronous or asynchronous communication
  • Automatic or explicit buffering

Direct communication: 每个想进行通信的进程都必须明确地命名通信的接收者或发送者。

The send() and receive() primitives are defifined as:

  • send(P, message)—Send a message to process P.
  • receive(Q, message)—Receive a message from process Q.

A communication link in this scheme has the following properties:

  • A link is established automatically between every pair of processes that want to communicate. The processes need to know only each other’s identity to communicate.
  • A link is associated with exactly two processes.

  • Between each pair of processes, there exists exactly one link.

Direct Communication is implemented when the processes use specific process identifier for the communication, but it is hard to identify the sender ahead of time.

Indirect communication: The messages are sent to and received from mailboxes, or ports. A mailbox can be viewed abstractly as an object into which messages can be placed by processes and from which messages can be removed. Each mailbox has a unique identifification. 

The send() and receive() primitives are defifined as follows:

  • send(A, message)—Send a message to mailbox A.
  • receive(A, message)—Receive a message from mailbox A.

In this scheme, a communication link has the following properties:

  • A link is established between a pair of processes only if both members of the pair have a shared mailbox.
  • A link may be associated with more than two processes.
  • Between each pair of communicating processes, a number of different links may exist, with each link corresponding to one mailbox.

A mailbox may be owned either by a process or by the operating system. If the mailbox is owned by a process (mailbox是地址空间的一部分), then we distinguish between the owner (只能通过maibox接收信息) and the user (只能通过mailbox发送信息). When a process that owns a mailbox terminates, the mailbox disappears. Any process that subsequently sends a message to this mailbox must be notified that the mailbox no longer exists. 

In contrast, a mailbox that is owned by the operating system has an existence of its own.  The operating system then must provide a mechanism that allows a process to do the following:

  1. Create a new mailbox.
  2. Send and receive messages through the mailbox.
  3. Delete a mailbox.

Synchronization

Message passing may be either blocking or nonblocking— also known as synchronous and asynchronous. 

  • Blocking send. The sending process is blocked until the message is received by the receiving process or by the mailbox.
  • Nonblocking send. The sending process sends the message and resumes operation.
  • Blocking receive. The receiver blocks until a message is available.
  • Nonblocking receive. The receiver retrieves either a valid message or a null.

Blocking is considered synchronous, non-blocking is considered asynchronous.

Buffering

无论通信是直接的还是间接的,由通信进程交换的消息都驻留在一个临时队列中。这些队列基本上可以用三种方式实现:

  1. Zero capacity. The queue has a maximum length of zero; thus, the link cannot have any messages waiting in it. In this case, the sender must block until the recipient receives the message.
  2. Bounded capacity. The queue has fifinite length n; thus, at most n messages can reside in it. If the queue is not full when a new message is sent, the message is placed in the queue (either the message is copied or a pointer to the message is kept), and the sender can continue execution without waiting. The link’s capacity is fifinite, however. If the link is full, the sender must block until space is available in the queue.
  3. Unbounded capacity. The queue’s length is potentially infifinite; thus, any number of messages can wait in it. The sender never blocks.

The zero-capacity case is sometimes referred to as a message system with no buffering. The other cases are referred to as systems with automatic buffering.

课堂笔记,码字不易,有失偏颇的地方还请大佬们多多指正

你可能感兴趣的:(Operating,System,架构,windows,unix)