01-runC-介绍与命令

1、什么是runC ?

根据官方定义
  • runC是一个根据OCI(Open Container Initiative)标准创建并运行容器的CLI tool 
runC简介
  • 容器的工业级标准化组织OCI(Open Container Initiative)出炉,这是业界大佬为避免容器生态和Docker耦合过紧做的努力,也是Docker做出的妥协
  • runC是轻量级的可移植容器运行时,对环境能够抽象底层主机的详细信息(以实现可移植性),而无需完全重写应用程序(以实现普遍存在),并且不引入过多的性能开销(用于扩展)
runC功能包括
  • 全面支持Linux名称空间,包括用户名称空间;
  • 对Linux中可用的所有安全功能的本地支持:Selinux,Apparmor,seccomp,cgroups,capability,pivotroot,uid / gid删除等。如果Linux可以做到,runC也可以;
  • 在Parallels的CRIU团队的帮助下,对实时迁移的本机支持;
  • Microsoft工程师直接为Windows 10容器提供了本机支持•计划的对Arm,Power,Sparc的本机支持将由Arm,Intel,高通,IBM和整个硬件制造商生态系统;
  • 可移植的性能配置文件,由Google工程师根据他们在生产中部署容器的经验提供
  • 正式指定的配置格式,在Linux Foundation的主持下由Open Container Project管理。换句话说:这是一个真正的标准 (OCP 已更名为OCI)

2、runC 命令

runc 命令总览, 后面版本参数有小改动,但影响不大;
$ runc -h

VERSION:

   1.0.0-rc4+dev

commit: 3f2f8b84a77f73d38244dd690525642a72156c64

spec: 1.0.0

COMMANDS:

     checkpoint checkpoint a running container

     create create a container

     delete delete any resources held by the container often used with detached container

     events display container events such as OOM notifications, cpu, memory, and IO usage statistics

     exec execute new process inside the container

     init initialize the namespaces and launch the process (do not call it outside of runc)
     kill kill sends the specified signal (default: SIGTERM) to the container's init process

     list lists containers started by runc with the given root

     pause pause suspends all processes inside the container

     ps ps displays the processes running inside a container

     restore restore a container from a previous checkpoint

     resume resumes all processes that have been previously paused

     run create and run a container

     spec create a new specification file

     start executes the user defined process in a created container

     state output the state of a container

     update update container resource constraints

     help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:

   --debug enable debug output for logging

   --log value set the log file path where internal debug information is written (default: "/dev/null")

   --log-format value set the format used by logs ('text' (default), or 'json') (default: "text")

   --root value root directory for storage of container state (this should be located in tmpfs) (default: "/run/runc")

   --criu value path to the criu binary used for checkpoint and restore (default: "criu")

   --systemd-cgroup enable systemd cgroup support, expects cgroupsPath to be of form "slice:prefix:name" for e.g. "system.slice:runc:434234"

   --help, -h show help

   --version, -v print the version
源码文件入口
  • runc/main.go ; 所有的子命令都汇总在这作为入口

...

...

func main() {

 app := cli.NewApp()

 app.Name = "runc"

 app.Usage = usage

 var v []string

 if version != "" {

  v = append(v, version)

 }

 if gitCommit != "" {

  v = append(v, "commit: "+gitCommit)

 }

 v = append(v, "spec: "+specs.Version)

 v = append(v, "go: "+runtime.Version())

 if seccomp.IsEnabled() {

  major, minor, micro := seccomp.Version()

  v = append(v, fmt.Sprintf("libseccomp: %d.%d.%d", major, minor, micro))

 }

 app.Version = strings.Join(v, "\n")

 xdgRuntimeDir := ""

 root := "/run/runc"

 if shouldHonorXDGRuntimeDir() {

  if runtimeDir := os.Getenv("XDG_RUNTIME_DIR"); runtimeDir != "" {

   root = runtimeDir + "/runc"

   xdgRuntimeDir = root

  }

 }

 app.Flags = []cli.Flag{

  cli.BoolFlag{

   Name: "debug",

   Usage: "enable debug output for logging",

  },

  cli.StringFlag{

   Name: "log",

   Value: "",

   Usage: "set the log file path where internal debug information is written",

  },

  cli.StringFlag{

   Name: "log-format",

   Value: "text",

   Usage: "set the format used by logs ('text' (default), or 'json')",

  },

  cli.StringFlag{

   Name: "root",

   Value: root,

   Usage: "root directory for storage of container state (this should be located in tmpfs)",

  },

  cli.StringFlag{

   Name: "criu",

   Value: "criu",

   Usage: "path to the criu binary used for checkpoint and restore",

  },

  cli.BoolFlag{

   Name: "systemd-cgroup",

   Usage: "enable systemd cgroup support, expects cgroupsPath to be of form \"slice:prefix:name\" for e.g. \"system.slice:runc:434234\"",

  },

  cli.StringFlag{

   Name: "rootless",

   Value: "auto",

   Usage: "ignore cgroup permission errors ('true', 'false', or 'auto')",

  },

 }

// 所有子命令的入口

 app.Commands = []cli.Command{

  checkpointCommand,

  createCommand,

  deleteCommand,

  eventsCommand,

  execCommand,

  initCommand,

  killCommand,

  listCommand,

  pauseCommand,

  psCommand,

  restoreCommand,

  resumeCommand,

  runCommand,

  specCommand,

  startCommand,

  stateCommand,

  updateCommand,

 }

 app.Before = func(context *cli.Context) error {

  if !context.IsSet("root") && xdgRuntimeDir != "" {

   // According to the XDG specification, we need to set anything in

   // XDG_RUNTIME_DIR to have a sticky bit if we don't want it to get

   // auto-pruned.

   if err := os.MkdirAll(root, 0700); err != nil {

    fmt.Fprintln(os.Stderr, "the path in $XDG_RUNTIME_DIR must be writable by the user")

    fatal(err)

   }

   if err := os.Chmod(root, 0700|os.ModeSticky); err != nil {

    fmt.Fprintln(os.Stderr, "you should check permission of the path in $XDG_RUNTIME_DIR")

    fatal(err)

   }

  }

  if err := reviseRootDir(context); err != nil {

   return err

  }

  return logs.ConfigureLogging(createLogConfig(context))

 }

 // If the command returns an error, cli takes upon itself to print

 // the error on cli.ErrWriter and exit.

 // Use our own writer here to ensure the log gets sent to the right location.

 cli.ErrWriter = &FatalWriter{cli.ErrWriter}

 if err := app.Run(os.Args); err != nil {

  fatal(err)

 }

}

...
  • 子命令文件都是统一放在 main 文件的同层目录下,后面继续详解所有命令
2.1、runC 命令使用前准备
  • 获取一个镜像, 使用docker pull 镜像

$ docker pull busybox

$ mkdir -p /tmp/mycontainer/rootfs

$ cd /tmp/mycontainer

$ docker export $(docker create busybox) | tar -C rootfs -xvf -
  • 在  rootfs 目录下就是 busybox 镜像的文件系统,然后生成 config.json 文件, 使用runc 命令;

# 该命令是根据OCI 规范来生成配置文件,后面会该命令源码的解析,这里不再展开;     

$ runc spec 

$ ls 

config.json rootfs
  • 如果直接使用生成的 config.json,接下来的演示不会太流畅,所以简单起见,我们稍微修改一下刚刚生成的 config.json 文件。就是把 "terminal": true 改为 false,把 "args": ["sh"] 改为 "args": ["sleep", "3600"]
  • 为什么要把 terminal 的true 改为false ?是因为受限于 create 容器时如果开启 terminal 参数,是需要提供一个“可接收引用控制台伪终端主端的文件描述符” 即socket 文件的路径,否则启动失败;

{

"ociVersion": "1.0.0",

 "process": {

  "terminal":false,

  "user": {

   "uid": 0,

   "gid": 0

  },

  "args": [

   "/bin/sleep", "3600"

  ],

...

}
  • 容器状态
  • creating:使用 create 命令创建容器,这个过程称为创建中。
  • created:容器已经创建出来,但是还没有运行,表示镜像文件和配置没有错误,容器能够在当前平台上运行。
  • running:容器里面的进程处于运行状态,正在执行用户设定的任务。
  • stopped:容器运行完成,或者运行出错,或者kill 命令之后,容器处于暂停状态。这个状态,容器还有很多信息保存在平台中,并没有完全被删除。
  • paused:暂停容器中的所有进程,可以使用 resume 命令恢复这些进程的执行

3、runC 命令演示

ps: 演示时有些结果和实际书写时内容有些不一致,但不影响实验过程。

3.0、切换工作目录


$ cd /tmp/mycontainer

3.1、runc create

  • 创建一个容器

$ sudo runc create demo 1

$ runc list

image

  • 使用 state 命令查看容器的状态, 当前状态已经为created

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

  "status": "created",

  "bundle": "/tmp/mycontainer/",

"rootfs": "/tmp/mycontainer/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}
  • 使用 runc ps 可以查看当前容器在跑什么进程

$ sudo runc ps demo1 

image

可以看到现在该状态是跑着一个叫init 的进程,我们需要执行的sleep 命令并未执行;init 进程是帮我们初始化整个容器的运行环境,后面源码会详细介绍;

3.2、runc run 

  • 启动容器,这次是执行我们定义的命令

$ sudo runc start demo1

$ sudo runc list

image

  • 查看当前容器运行什么进程, 可以看到当前进程是我们设定的sleep 进程;

$ sudo runc ps demo1

image

3.3、runc exec

  • 该命令的意思是进入容器中执行命令,不同于ps 命令的实现,ps 命令只是从外部查看该进程的的状态类似于 ps aux | grep 的形式;exec 是使用c 代码setns 实现的nsenter,与linux nsenter 类似,用于进入程序的namespace;
  • 更多详解,期待后面的源码分析
  • 示例一

$ sudo runc exec  demo1 ps

image

  • 示例二, 分配一个 tty 进入sh程序

$ sudo runc exec  -t demo1 /bin/sh

# 进入了demo1 的进程空间

/ $ ls /

bin dev etc home proc root sys tmp usr var

/ $ ls ~

/ $

/ $ ps aux

PID USER TIME COMMAND

    1 root 0:00 /bin/sleep 100000000

    9 root 0:00 /bin/sh

   15 root 0:00 ps aux

/ $

3.4、runc pause/resume

  • 暂停容器,核心原理是利用了 cgroup 的 子系统 freezer 来实现进程的挂起,freezer成批作业管理系统很有用,可以成批启动/停止任务,以达到及其资源的调度。

* 如何使用freezer,期待后面的源码分析


$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

# 当前status 是running

  "status": "running",   

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

$ sudo runc pause demo1

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

# 当前状态是 paused

  "status": "paused",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

$ sudo runc resume demo1

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 29314,

  "status": "running",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

3.5、runc kill/delete

  • 停止容器进程,kill 实现比较简单,就是查找到容器的进程ID 发送kill 信号
  • 更多详解,期待后面的源码分析

# 15 是kill信号,默认也是15

$ sudo runc kill demo1 15

$ sudo runc state demo1

{

  "ociVersion": "1.0.0",

  "id": "demo1",

  "pid": 0,

  "status": "stopped",

  "bundle": "/data/docker_lab/bundle",

  "rootfs": "/data/docker_lab/bundle/rootfs",

  "created": "2020-11-29T06:42:30.366937499Z",

  "owner": ""

}

# 删除容器

$ sudo runc delete demo1

3.6、runc events

*events 命令能够向我们报告容器事件及其资源占用的统计信息

  • 更多详解,期待后面的源码分析

$ sudo runc events demo1

{"type":"stats","id":"demo1","data":{"cpu":{"usage":{"total":10140103,"percpu":[2003042,8137061],"kernel":0,"user":0},"throttling":{}},"memory":{"usage":{"limit":9223372036854771712,"usage":208896,"max":671744,"failcnt":0},"swap":{"limit":9223372036854771712,"usage":208896,"max":671744,"failcnt":0},"kernel":{"limit":9223372036854771712,"usage":172032,"max":176128,"failcnt":0},"kernelTCP":{"limit":9223372036854771712,"failcnt":0},"raw":{"active_anon":0,"active_file":0,"cache":0,"dirty":0,"hierarchical_memory_limit":9223372036854771712,"hierarchical_memsw_limit":9223372036854771712,"inactive_anon":0,"inactive_file":0,"mapped_file":0,"pgfault":66,"pgmajfault":0,"pgpgin":66,"pgpgout":43,"rss":98304,"rss_huge":0,"shmem":0,"swap":0,"total_active_anon":0,"total_active_file":0,"total_cache":0,"total_dirty":0,"total_inactive_anon":0,"total_inactive_file":0,"total_mapped_file":0,"total_pgfault":66,"total_pgmajfault":0,"total_pgpgin":66,"total_pgpgout":43,"total_rss":98304,"total_rss_huge":0,"total_shmem":0,"total_swap":0,"total_unevictable":0,"total_writeback":0,"unevictable":0,"writeback":0}},"pids":{"current":1},"blkio":{},"hugetlb":{"2MB":{"failcnt":0}}}}

3.7、runc update

  • update 命令主要用于控制cgroup 参数,先看看可以调整的subsystem, 主要包含 cpu, memory,kmemory

$ sudo runc update -h

Note: if data is to be read from a file or the standard input, all

other options are ignored.

   --blkio-weight value Specifies per cgroup weight, range is from 10 to 1000 (default: 0)

   --cpu-period value CPU CFS period to be used for hardcapping (in usecs). 0 to use system default

   --cpu-quota value CPU CFS hardcap limit (in usecs). Allowed cpu time in a given period

   --cpu-share value CPU shares (relative weight vs. other containers)

   --cpu-rt-period value CPU realtime period to be used for hardcapping (in usecs). 0 to use system default

   --cpu-rt-runtime value CPU realtime hardcap limit (in usecs). Allowed cpu time in a given period

   --cpuset-cpus value CPU(s) to use

   --cpuset-mems value Memory node(s) to use

   --kernel-memory value Kernel memory limit (in bytes)

   --kernel-memory-tcp value Kernel memory limit (in bytes) for tcp buffer

   --memory value Memory limit (in bytes)

   --memory-reservation value Memory reservation or soft_limit (in bytes)

   --memory-swap value Total memory usage (memory + swap); set '-1' to enable unlimited swap

   --pids-limit value Maximum number of pids allowed in the container (default: 0)
  • 限制demo1 使用内存为100MB

$ sudo runc update --memory 104857600 demo1

# 查看demo1 的cgroup 是否被设置了

$ cat /sys/fs/cgroup/memory/user.slice/demo1/memory.limit_in_bytes

104857600

3.8、runc checkpoint/restore

容器的热迁移简介
  • 它是通过criu工具对一个正在运行的程序进行冻结,并且checkpoint它到一系列的文件,然后你就可以使用这些文件在任何主机重新恢复这个程序到被冻结的那个点(白话就是实现对已运行程序的备份和恢复)。所以criu通常被用在程序或者容器的热迁移、快照、远程调试等;
实现原理
  • CRIU的功能的实现基本分为两个过程,checkpoint和restore。在checkpoint过程,criu主要通过ptrace机制把一段特殊代码动态注入到dumpee进程(待备份的程序进程)并运行,这段特殊代码就实现了收集dumpee进程的所有上下文信息,然后criu把这些上下文信息按功能分类存储为一个个镜像文件。在restore过程。criu解析checkpoint过程产生的镜像文件,以此来恢复程序备份前的状态没,让程序从备份前的状态继续运行。
注意/参考
  • 由于能力有限,无法对该部分做更多的详解,而且runc 对这块的支持是有问题的,后面演示这块功能时会提到;尽管在Docker中使用该功能也需要特别注意 Docker版本Linux内核CRIU版本一致,否则生成的文件会会有所不同,导致恢复不了。
  • 参考:https://blog.csdn.net/weixin_...
演示
  • 按正常的 create / start 启动的容器,执行checkpoint:

$ runc checkpoint demo1

criu failed: type NOTIFY errno 0

log file: /run/runc/demo1/criu.work/dump.log

# 查看dump 文件, 大概意思找不到标准输入文件, 具体原因(能力有限)不清楚为什么;google了下 也有不少issue 提到这个问题

$ cat/run/runc/demo1/criu.work/dump.log

...

(00.021115) Error (criu/files-reg.c:1294): Can't lookup mount=26 for fd=0 path=/dev/pts/0

...
  • 后面我直接使用run 命令并且保持在前台运行容器
$ cd /tmp/mycontainer
$ runc run demo1
  • 启动另一个终端,可以看到是运行状态

$ runc list

image

  • 下面开始使用checkpoint

# 保存了当前进程的运行状态,如快照,并退出了进程

$ runc checkpoint demo1 

# 会发现程序已经被停止

$ runc list

# 查看当前目录, 多了个checkpoint 目录

$ ls ./

checkpoint config.json rootfs

# 可以看到很多img 文件,就是保存程序运行时的一些状态;用于恢复使用;

$ ls ./checkpoint

cgroup.img files.img inventory.img mm-1.img pagemap-1.img route6-9.img tmpfs-dev-63.tar.gz.img

core-1.img fs-1.img ip6tables-9.img mountpoints-12.img pages-1.img route-9.img tmpfs-dev-65.tar.gz.img

descriptors.json ids-1.img ipcns-var-10.img netdev-9.img pipes-data.img seccomp.img tmpfs-dev-66.tar.gz.img

fdinfo-2.img ifaddr-9.img iptables-9.img netns-9.img pstree.img tmpfs-dev-61.tar.gz.img utsns-11.img
  • 下面进行 restore

# 必须在此目录下执行

$ cd/tmp/mycontainer

# 恢复后也是在前台执行的;

$ runc restore demo1

# 在另一个终端查看容器是否执行

$ runc list

3.9、runc init

  • init 命令是初始化容器运行环境,上面曾经提到过created状态的容器中运行着/proc/self/exe init 进程 , /proc/self/exe 等同于当前命令 runc,所以即 runc init;
  • init 命令是内部使用的命令,该进程会承载 cgroups, namespace,环境变量, 芯片参数设置,网卡接口创建等主要设置,后面会有更详细的源码解读进行分析;

4、结语

  • 后续会每周至少更新一篇runC 子模块的源码分析 【立个Flag】
  • 下周 【02-runC-源码分析-create&start】

5、参考

你可能感兴趣的:(容器技术,docker)