Fuchsia concepts

总结官方文档的Fuchsia的基础concept

Fuchsia Conceptions

1. component

  1. component Framework:支持component通信、library、建立等等的部分
  2. component manager:

    1. 启动系统(系统中最早启动/最晚关闭的component)同时启动别的必要的component,例如filesystem
    2. 中介作用,调用capability routing等
    3. 支持component与环境交互、支持扩展
  3. component manifest:针对特定component的一系列描述/配置文件
  4. component lifecycle:被component framework or component runner决定Fuchsia concepts_第1张图片

    Bind:A调用B的capability称为A binds to B

    • eager binding:如果b是B的child切eager,A联系到B,那么也bind到b
    • reboot:component退出之后会重启(包括运行成功退出)
  5. topology、identifier、realm都是之前在getstart里面提到过的
  6. Environment:让develpoer可以设定component realm的行为

Driver

  1. Fuchsia中,driver可以 bind to matching "parent" devices, and publish "children" of their own.

    This hierarchy extends as required: one driver might publish a child, only to have another driver consider that child their parent, with the second driver publishing its own children, and so on.

driver启动步骤:

  • system start 开启root,root请求bind driver
  • system在系统中找合适的driver并绑定到root上
  • driver运行,可能会创建新的root去请求新的driver

    • 例如PIC driver发现有一个新的外围设备,就会创建一个新的parent node,然后这个新node会请求一个新的driver来绑定,每次发现一个新的外围设备都会重复一遍此步骤
  • 绑定之后,进行init driver,包括init interface等等
$ dm dump
[root]
    pid=1509
      [null] pid=1509 /boot/driver/builtin.so
      [zero] pid=1509 /boot/driver/builtin.so
   [misc]
       pid=1645
         [console] pid=1645 /boot/driver/console.so
         [dmctl] pid=1645 /boot/driver/dmctl.so
         [ptmx] pid=1645 /boot/driver/pty.so
         [i8042-keyboard] pid=1645 /boot/driver/pc-ps2.so
            [hid-device-001] pid=1645 /boot/driver/hid.so
         [i8042-mouse] pid=1645 /boot/driver/pc-ps2.so
            [hid-device-002] pid=1645 /boot/driver/hid.so
   [sys]
       pid=1416 /boot/driver/bus-acpi.so
         [acpi] pid=1416 /boot/driver/bus-acpi.so
         [pci] pid=1416 /boot/driver/bus-acpi.so
            [00:00:00] pid=1416 /boot/driver/bus-pci.so
            [00:01:00] pid=1416 /boot/driver/bus-pci.so
               <00:01:00> pid=2015 /boot/driver/bus-pci.proxy.so
                  [bochs_vbe] pid=2015 /boot/driver/bochs-vbe.so
                     [framebuffer] pid=2015 /boot/driver/framebuffer.so
            [00:02:00] pid=1416 /boot/driver/bus-pci.so
               <00:02:00> pid=2052 /boot/driver/bus-pci.proxy.so
                  [e1000] pid=4628 /boot/driver/e1000.so
                     [ethernet] pid=2052 /boot/driver/ethernet.so
            [00:1f:00] pid=1416 /boot/driver/bus-pci.so
            [00:1f:02] pid=1416 /boot/driver/bus-pci.so
               <00:1f:02> pid=2156 /boot/driver/bus-pci.proxy.so
                  [ahci] pid=2156 /boot/driver/ahci.so
            [00:1f:03] pid=1416 /boot/driver/bus-pci.so

设备sys拥有driver host,然后这时候加载了[acpi]设备和相应的driverbus-acpi.so

随后,ACPI遍历枚举,找到一个pci bus 于是创建一个parent(包含一些protocol) [pci] pid=1416 /boot/driver/bus-acpi.so;driver host这时候把bus-pci.so这个driver绑定上去;

During its binding,这个driver扫描所有的pic bus上的devices

其中,PIC device 00:02:00是 intel ethernet interface, 在系统中我们又找到了e1000.so这个 driver适合绑定(protocol合适)。

这是,PIC driver创建一个 parent(包含一些protocol),同时又创建了一个新的driver host(2052)

随后创建一个代理 <00:02:00> pid=host 2052 /boot/driver/bus-pci.proxy.so;这个代理用于driver host(2052)和PIC driver的接口

随后进行DSO(e1000.so)和driver host的绑定

随后,这个DSO publishes a ZX_PROTOCOL_ETHERNET_IMPL, which binds to a matching child (the ethernet.so DSO on line 9; it's considered a match because it has a ZX_PROTOCOL_ETHERNET_IMPL protocol).

这时候在device filesystem中最终的那个ethernet device为:

/dev/sys/platform/pci/00:02:00/e1000

ethernet.so)publishes a ZX_PROTOCOL_ETHERNET用于给client调用

  1. Driver binding

    要遵循一定的规范

  2. Driver ops

    Fuchsia concepts_第2张图片

这些hook(图中方块中)在运行时被别的driver调用

  1. Driver lifecycle
  2. Device driver lifecycle:

    1. binding program被 binding compiler 编译,产生ZIRCON_DRIVER 宏,指导把 binding program放入 ELF NOTE section,Device Coordinator可以不用加载整个driver就可以查看到信息
    2. init():被需要特殊初始化的driver/do not want to visibly publish their device(s) until that succeeds的driver调用
    3. bind():offer the driver a device to bind to,driver需要创建一个child device
    4. create()
    5. release():现在已经不启用该method
  3. device lifecycle:

    1. device_add():增加一个child device;parent device是device passed in to the bind() 或者another device which has been created by the same device driver.
    2. device_async_remove():remove;The removal of a device consists of four parts: running the device's unbind() hook, removal of the device from the Device Filesystem, dropping the reference acquired by device_add() and running the device's release() hook.
    3. unbind():可选,运行过程中保证不会接受外来信息
    4. parent device保证child device在关闭的时候对相关请求返回错误信息
    5. release和unbind:递归的释放设备;例如在下图中

                  +------------+
                  | USB Device | .unbind()
                  +------------+ .release()
                        |
                  +------------+
                  |  WLAN PHY  | .unbind()
                  +------------+ .release()
                    |        |
          +------------+  +------------+
          | WLAN MAC 0 |  | WLAN MAC 1 | .unbind()
          +------------+  +------------+ .release()

​ .unbind() 从USB device开始向下,到底之后,两个MAC开始release,然后反向release

  1. device power management
  2. device protocol: 这个地方有点没看懂,提到了process与device、device protocol的调用等全部的过程,大概理解了一些

    • 大概是一个约束,任何遵从本约束的driver都应当提供一系列的function
    • Platform dependent vs platform independen:dependent指的是client和driver中多加一层,例如buffrer调用功能等,减少代码重复
    • process:Fuchsia based on driver host

      • driverhost: a process contains a protocol stack ,driverhost 动态加载driver

    具体解释:见上面driver启动步骤:

  3. platform bus

这些是底层的driver,为高层driver提供接口、支持等等,在系统启动的时候会预先加载

Filesystem

  1. File lifecycle

    1. Establishing a Connection:用户 发送RPC requests 给 filesystem servers using a FIDL
    2. namespace:完全在client端。 which is a table of "absolute path" -> "handle" mappings. All paths accessed from within a process are opened by directing requests through this namespace mapping.
    3. passing data:也用RPC messages,use the FIDL protocol
    4. mmap:给client返回的是 virtual memory objects;只应用于read-only的文件
    5. Other Operations acting on paths: 比如rename(old,new),需要两个路径, Fuchsia filesystems use this ability to refer to one Vnode while acting on the other.
    6. vnode:用于标记路径、一个文件等等
  2. Filesystem Lifecycle

  1. Filesystem Management:只有管理员有权限
  2. Mounting:先init,后和parent (mounting) filesystem相连 ;what mountpoints exist elsewhere 取决于具体情况,不是所有地方都可以访问到
  3. FVM: keep virtual mapping from (virtual partitions, blocks) to (slice, physical block).

          +---------------------------------+ <- Physical block 0
          |           metadata              |
          | +-----------------------------+ |
          | |       metadata copy 1       | |
          | |  +------------------------+ | |
          | |  |    superblock          | | |
          | |  +------------------------+ | |
          | |  |    partition table     | | |
          | |  +------------------------+ | |
          | |  | slice allocation table | | |
          | |  +------------------------+ | |
          | +-----------------------------+ | <- Size of metadata is described by
          | |       metadata copy 2       | |    superblock
          | +-----------------------------+ |
          +---------------------------------+ <- Superblock describes start of
          |                                 |    slices
          |             Slice 1             |
          +---------------------------------+
          |                                 |
          |             Slice 2             |
          +---------------------------------+
          |                                 |
          |             Slice 3             |
          +---------------------------------+
          |                                 |

partition table :name,partation ID,这个partation中已经分配出去的slice的数量

slice allocation table: 由slice entries构成

每一个slice entry包含:
allocation status
if it is allocated,
        what partition it belongs to and
        what logical slice within the partition the slice maps to
  1. MinFs: MinFS is a simple, unix-like filesystem built for Zircon.
  2. BlobFs: BlobFS is a content-addressable filesystem optimized for write-once, 主要用于package

​ BlobFs下disk结构:

  • The Superblock storing filesystem-wide metadata,
  • The Block Map, a bitmap used to keep track of free and allocated data blocks,
  • The Node Map, a flat array of Inodes (reference to where a blob's data starts on disk) or ExtentContainers (reference to several extents containing some of a blob's data).

    • node分两种,Inodes, ExtentContainers
    • Properties of the node linked-list:存在一些规范,保证extent是有序的,否则将认为是错误
  • The Journal, a log of filesystem operations that ensures filesystem integrity, even if the device reboots or loses power during an operation, and
  • The Data Blocks, where blob contents and their verification metadata are stored in a series of extents.

    • Currently BlobFS does not perform defragmentation.
  1. Random access compression in BlobFS

    1. 默认是zstd
    2. 为保障page demand,将文件分成frame来压缩/解压缩(chunked compression)
  2. Block devices:和filesystem一样,program作为client,随后向devhost发送请求(通过RPC)

    fast block i/o:register a “transaction buffer”,传递例如:写入位置+写入内容起始地址等等,避免拷贝造成的大量开销

  3. zxcrypt
  4. Life of an 'Open':在Fuchsia中,open不是一个system call,client通过channel连接filesystem;process初始化后,将会被附以namespace

    1. standard library定义了open函数
    2. Fdio:为 files, sockets, services,等多种提供统一的接口
    3. FIDL:一些协议,保证client和server的交互

Process

  1. core library

    1. FBL:继承了一些c++结构,也添加了一些
       2. FXL:is a platform-independent library containing basic C++ building blocks
  2. Namespace

    1. namespaces are defined per-component 每一个component有他自己的root
  3. Object:The items within a namespace are called objects,例如一个namespace指向一个object,这个object是一个file或者是一个dict

    1. access:用FIDL,可以创建新的obj,也可以访问子obj
    2. obj name:可以有不同的名字指向同一个obj,这个名字又上一层container决定(类似于dict)
  4. Object Relative Path Expressions:例如a/b/c的路径名称,但是不支持访问container外(例如..)
  5. Client Interpreted Path Expressions: 用户可以自定义root位置

SandBox

  1. process 创建的时候,没有任何权限,通常会赋予一些handle等
  2. process的namespace很重要
  3. Component capabilities:是process的component将会获得一个/svc directory 在namespace中
  4. Legacy components:/svc提供的service是environemnt中service的子集

JOB

In Fuchsia, jobs are a means of organizing, controlling, and regulating processes

  1. job可以有child jobexception逆向传播(p<-c),policy"a正向传播(p->c)
  2. 从root job开始,往下形成job tree

Booting

启动步骤
  1. Kernal启动之后,userspace先boot
  2. userboot job要求快速,kernal给userboot a handle to the ZBI,usrboot在ZBI中找到bootfs image,然后decompress,找到需要的library等等。
  3. 随后启动第一个process-> component manager
  4. component manager启动如下几个component

Fuchsia concepts_第3张图片

  1. driver manager->start processes:driver hosts,driver hosts run driver
  2. fshost:start filesystem,finding block devices,找到并load fvm和zxcrypt,随后启动minfs和blobfs文件系统
  3. appmgr:component manager uses the /pkgfs handle from fshost to load appmgr. 用于share capabilities

Startup sequence

Fuchsia concepts_第4张图片

appmgr创建app realm,app realm创建sysmgr,sysmgr创建sys realm

The sys realm holds a large number of FIDL services,sys realm 会开启很多service并且管理、lazy start一些component

至此,boot complete

FIDL

1   library fidl.examples.echo;
2
3   @discoverable
4   protocol Echo {
5       EchoString(struct {
6           value string:optional;
7       }) -> (struct {
8           response string:optional;
9       });
10  };

这里是:创建了一个class,这个class Echo可以被clinet看到,有一个me1:thod叫EchoString,参数是value,返回操作是response一个string

IPC models in FIDL
1 library fidl.examples.echo;
2
3   @discoverable
4   protocol Echo {
5       EchoString(struct {
6           value string:optional;
7       }) -> (struct {
8           response string:optional;
9       });
10
11      SendString(struct { value string:optional; });
12
13      ->ReceiveString(struct { response string:optional; });
14  };

SendString函数是一个只发送的函数,client发送之后,不管是否有回复,直接继续运行

ReceiveString函数是一个event函数,client不请求数据,只在server发送data过来之后运行

Workflow
  1. 用户构建*.fidl文件,并存在FIDL library里面,不同的library可以相互import
  2. publisher:FIDL libraries被放在SDK或者public respository中
  3. consumer:用FIDL compiler生成适合用户自身语言的代码

Life of a handle

主要讲解了FIDL如何转移handle权限

kernal

system call:系统调用,大多数通过handle调用

Handles and Rights:可以传递、可以复制(复制的时候可以减少权限)

Kernel Object IDs:Every object in the kernel has a "kernel object id" or "koid",用于标识,进而调整lifecycle等等

Running Code: Jobs, Processes, and Threads:job包含process,process包含thread

​ Without a Job Handle, it is not possible for a Thread within a Process to create another Process or another Job.

Message Passing: Sockets and Channels:socket面向流,channel有一个buffer

Objects and Signals:每个object有最多32个signal,signal标记例如:object是否有读权限

Waiting: Wait One, Wait Many, and Ports

Events, Event Pairs:event是最简单的object, Event Pairs是相互通信的一对event

Shared Memory: Virtual Memory Objects (VMOs):represent a set of physical pages of memory,

Virtual Memory Address Regions (VMARs) :provide an abstraction for managing a process's address space.

LK

zircon 基于LK进行开发

kernal objects

handle

handle绑定在一个process或者kernal上,handle bound to the kernel we say it's 'in-transit'.

handle链接process和指定的kernal-object,创建的时候有一些初始的权限,这些权限在复制时可以被抛弃。

回收:kernal-object在没有任何一个refer的时候,被销毁或者放入回收站;每一个handle对应的kernal object一定是保证valid。

Signal

1 bit信息,用于交互信息,例如:channle里是否有未被读出的内容。

system call

Scheduling

design

每一个logical CPU有自己的scheduler,scheduler之间通过IPI交流

每个CPU有自己的一组FIFO queue,这些queue有不同的权限(总共分32个权限),In each queue is an ordered list of runnable threads awaiting execution

对于这些queue:

  1. CUP先选择高优先级的queue,popfront
  2. 如果这个进程在timeslice没执行完,放在合适的队列队尾
  3. 如果timeslice没用完,放在队首,但是下一次只能执行剩下的timeslice时间
  4. 如果wait share resource,放在等待队列,如果这个进程在timeslice没执行完,放在合适的队列队尾,如果timeslice没用完,放在队首,但是下一次只能执行剩下的timeslice时间
Priority management
  1. 总共有0-31这32个权限分级
  2. 权限boost between [-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ]当:
  • When a thread is unblocked, after waiting on a shared resource or sleeping, it is given a one point boost.
  • When a thread yields (volunteers to give up control), or volunteers to reschedule, its boost is decremented by one but is capped at 0 (won’t go negative).
  • When a thread is preempted and has used up its entire timeslice, its boost is decremented by one but is able to go negative.
  1. 如果一个thread控制resource导致另一个更高权限的thread被block,it is given a temporary boost up
CPU assignment and migration

每个thread有一个CPU affinity mask:例如喜欢1和3CPU,就是0b101,用两个1的位置表示。

When selecting a CPU for a thread the scheduler will choose, in order:

  1. The CPU doing the selection, if it is idle and in the affinity mask.
  2. The CPU the thread last ran on, if it is idle and in the affinity mask.
  3. Any idle CPU in the affinity mask.
  4. The CPU the thread last ran on, if it is active and in the affinity mask.
  5. The CPU doing the selection, if it is the only one in the affinity mask or all cpus in the mask are not active.
  6. Any active CPU in the affinity mask

Zircon Fair Scheduler

Briefly, these properties are:

  • Intuitive bandwidth allocation mechanism: A thread with twice the weight of another thread will receive approximately twice the CPU time, relative to the other thread over time. Whereas, a thread with the same weight as another will receive approximately the same CPU time, relative to the other thread over time.
  • Starvation free for all threads: Proportional bandwidth division ensures that all competing threads receive CPU time in a timely manner, regardless of how low the thread weight is relative to other threads. Notably, this property prevents unbounded priority inversion.
  • Fair response to system overload: When the system is overloaded, all threads share proportionally in the slowdown. Solving overload conditions is often simpler than managing complex priority interactions required in other scheduling disciplines.
  • Stability under evolving demands: Adapts well to a wide range of workloads with minimal intervention compared to other scheduling disciplines.

在Zircon中,使用的是最坏情况Fair scheduler:Worst-Case Fair Weighted Fair Queuing (WF2Q)

Security

each thread has two stacks instead of the usual one: a "safe stack" and an "unsafe stack".

unsafe的用来存放例如指向heap的指针,safe的用来存储例如return addr,防止栈溢出等

shadow call stack pointer为shadow-call-stack代码提供支持

Cryptographically Secure Pseudo Random Number Generator:随机数生成

Errors

error: 被分为不同的category:The first error code in each category is the generic code and is used when no more specific code applie

和传统的没有什么很大的区别

Zircon Kernel IPC Limits

如果读取kernal buffer速度比写入慢,可能造成run out of kernel buffers

waiting

Timer Slack:

Slack defines how the system may alter the timer's deadline. Timer指的是例如一个object等待一定的时间或者等待timer取消。

slack表示timer可以合并,从而增加等待时间;Amount is the allowed deviation from the deadline;

Tracing

用于检测kernal/user space的进程状态

trace provider写入buffer,manager通过socket传送给trace client

trace client通过manager管理trace provider是否运行:

A trace client contacts the trace manager to request that tracing should either start or stop. A trace client can also request to save collected trace data.

Fuchsia concepts_第5张图片

packages

一组文件,提供一个或多个程序

  1. package 从Fuchsia server BLOB上下载,有同样内容的BLOB同名

Base packages:These are the packages that are part of the foundation of the Fuchsia operating system

cached packages:These are packages on the device which are not part of base. These packages exist when the device is flashed or paved, so these packages are usable if the device boots without a network connection

Universe packages:在Fuchsia server伤的package

package结构
  • meta.far

    • meta/package:a JSON file that contains the name and version of the package.
    • meta/contents:content file。pm指令生成的
  • BLOBs outside of meta/

    • most files of a package exist outside of the meta/directory and each are a BLOB.
package url
fuchsia-pkg:///?hash=#
Developing with Fuchsia packages
  1. development host提供HHTTP支持,target host 通过TCP port 8083 通过IP地址连接
  2. 用fx bulid 指令bulid
  3. Triggering package updates

Security

verified exec(VX):

Fuchsia has taken verification into the runtime of the system

VX considers two security models: the running software model and the verified boot model.

The Verified Boot Security Model:The goal of a defender in this security model is to recover by eliminating untrustworthy states (code and data) in which attackers could persist control across reboots.

  1. 有untrust state,直接reboot,随后一处untrustworthy state
  2. 拒绝回滚,防止attacker通过回滚到历史版本绕过攻击

The Running Software Model

In this security model, the aim of a defender is to solve or mitigate possible vulnerabilities by hardening code against malicious input.

Phases of Verification:

Phase Zero: Hardware to First Bootloader:the hardware is assumed to be trusted.

Phase One: First Bootloader to Main Bootloader:第一个bootloader被验证过后,就获得验证、执行软件的权限,也是硬件

Main Bootloader to Preauthorized Code:main bootloader检验Preauthorized Code,Preauthorized Code在硬件上运行,包含例如kernal,driver,package manager等等。

Phase Three: Non-Preauthorized Code

Non-Preauthorized Code可能不会被一些device运行

上述的Implementation

Main Bootloader

The main bootloader implementation relies on Android Verified Boot for verification and kernel rollback protection.

BlobFS

BlobFS is a cryptographic, content-addressed filesystem purpose-built to support verified execution.

Package Management System

Component Framework

Source code

vendoring = third party code

Session

存储特定用户会话所需的属性及配置信息

Elements

UI添加到session上的component是element

你可能感兴趣的:(系统linux)