Docker热迁移工具CRIU原理系列:竞品分析CRIU、DMTCP、BLCR、OPENVZ


  热迁移,又叫动态迁移、实时迁移,即对程序或者虚拟机的保存/恢复,通常是将整个虚拟机的运行状态完整保存下来,同时可以快速的恢复到原有硬件平台甚至是不同硬件平台上。整个过程用户不会察觉到程序/虚拟机的变化。实现热迁移的原理就是在用户态或者内核层面实现对进程的Checkpoint/Restore(简称C/R)。能实现程序或者docker容器的热迁移工具很多,比如可以在用户态实现热迁移的工具有CRIU、DMTCP、BLCR、内核层面e的热迁移工具OPENVZ等。如果让我来为你推荐一款工具,我肯定会推荐CRIU。不仅因为CRIU功能更加全面,Docker容器官方指定工具、还有我是CRIU社区的成员,MIPS体系架构的维护者。

  本文就这几款常用的热迁移工具做简单介绍和功能对比。相信在你看过之后,在用户态的层面选择工具时,你会拥入CRIU的怀抱。

1、CRIU

  CRIU(Checkpoint/Restore In Userspace)运行在linux操作系统上的一个软件工具,其功能是在用户空间实现Checkpoint/Restore功能。使用这个工具,你可以冻结一个正在运行的程序,并且checkpoint它到一系列的文件,然后你就可以使用这些文件在任何主机重新恢复这个程序到被冻结的那个点(白话就是实现对已运行程序的备份和恢复)。所以criu通常被用在程序或者容器的热迁移、快照、远程调试等。CRIU 起初是Virtuozzo的一个项目,随着开源社区的帮助,现在也被整合到OpenVZ(它是 Virtuozzo 的开源版本), LXC/LXD, Docker, Podman等软件项目里。
源码地址:https://github.com/checkpoint-restore/criu

有阅读价值的文章:https://lwn.net/Articles/525675/

2、BLCR

  BLCR(全称Berkeley Lab Checkpoint/Restart)提供了用户态的libcr库和kernel module来完成相关的Checkpoint/Restart工作。 如果需要对一个进程进行Checkpoint,那么有两种方式使用BLCR相关工具:

  • 进程通过cr_run启动,如cr_run ./test
  • 进程在编译时链接了libcr库,如gcc -o test test.c -lcr

  这两种方法功能一样,都是会让程序在运行前优先加载某个动态链接库(libcr.so 或libcr-run.so)。这两个动态库主要功能就是提前向kernel注册一个信号。BLCR通过在kernel中的cr_module模块向要被Checkpoint的进程发送这个信号来通知进程对自己做Checkpoint。真正的Checkpoint工作由内核完成。C/R过程由如下两个命令完成:

$ ## Checkpoint过程
$ cr_checkpoint PID值

$ ## Restore过程
$ cr_restart context.PID值

更详细的介绍可以参考:https://blog.csdn.net/u012569600/article/details/24132479
https://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/

疑问:不过从这个网站News感觉到,BLCR好像已经停止了维护和更新。不知道为什么网络上有好多人还用此工具做Android 应用的快速启动。

3、DMTCP

  DMTCP(全称 Distributed MultiThreaded CheckPointing) 以library库的形式实现对一个进程的checkpoint/restore。这就意味着当你要使用DMTCP对一个程序热迁移,就要求这个程序之前必须是用DMTCP启动的。这样才能保证DMTCP Library动态库被链接到了这个程序。 当这样启动时,DMTCP库会截获应用程序中的一定数量的库调用(相当于代理),构建一个有关进程内部信息的影子数据库,然后将请求转发给glibc/kernel。收集的信息将用于生成应用程序的镜像文件。使用这种方法,只能checkpoint与DMTCP库一起成功运行的应用程序,但DMTCP不为所有内核API提供代理(例如,已知inotify()不受支持)。这种方法的另一个含义是由于请求代理而产生的潜在性能问题。 下面是DMTCP的demo:

# vi test/demo.c
> int main(int argc, char* argv[])
> {
      int count = 1;
>   while (1)
>   {
         printf(" %2d ",count++);
>    fflush(stdout);
>    sleep(2); }
>   return 0; }
# test/demo
1   2   3 ^C
# bin/dmtcp_launch --interval 5 test/demo
1   2   3   4   5   6   7 ^C

# ls ckpt_demo*
ckpt_demo_66e1c8437adb789-40000-5745d372.dmtcp

# bin/dmtcp_restart ckpt_demo*
7   8   9  10 ^C

  DMTCP实现进程的Restoration过程也很棘手。例如Checkpoint(备份)期间,getpid()可用于检索进程的PID,但在还原期间没有相应的API来设置进程的PID(fork()系统调用不允许调用方指定子进程的PID)。为了解决这个问题,DMTCP通过拦截getpid()的库调用并向应用程序提供假的PID值给进程。这样的行为是非常危险的,因为如果应用程序试图通过这个假的PID访问/proc文件系统中的文件,则可能会看到错误的文件。而CRIU为了解决这个问题,开发人员添加了一个API,可以用来控制下一个fork()调用选择的PID。

DMTCP源码地址:https://github.com/dmtcp/dmtcp

4、openVZ

  完全在内核里完成C/R功能。具体技术和使用还不了解。
目前CRIU实现了与OpenVZ基于内核的检查点/恢复机制几乎相同的所有功能。

5、各C/R工具的功能对比

功能 CRIU DMTCP BLCR OPVZ
Arch x86_64, ARM, AArch64, PPC64le、mips64el x86, x86_64, ARM x86, x86_64, PPC/PPC64, ARM x86, x86_64
Block devices No Looks like yes No No
Can be used as non-root user? Yes, but user can only manipulate tasks belonging to him Yes Yes No
Can be used without preloading special libraries before app start? Yes No No Yes
Can run unmodified programs? Yes Yes No. Statically linked and/or threaded apps are unsupported. Yes
Can run unprepared tasks? Yes No. It preloads the DMTCP library. That library runs before the routine main(). It creates a second thread. The checkpoint thread then creates a socket to the DMTCP coordinator and registers itself. The checkpoint thread also creates a signal handler. No. CR shall notify processes when a checkpoint is to occur (before the kernel takes a checkpoint) to allow the processes to prepare itself accordingly. Yes
Capture the contents of open files Yes, if file is unlinked Looks like no Not yet Yes
Character devices Yes, only /dev/null, /dev/zero, etc. are supported Yes, looks like null and zero are supported Yes, /dev/null and /dev/zero Yes
Containers Yes, LXC and OpenVZ containers No. It doesn’t support namespaces, so it probably can’t dump containers Looks like no Yes
Established TCP connection Yes No, but you can write a simple DMTCP plugin that tells DMTCP how you want to reconnect on restart No Yes
Infiniband No Not yet, developing is on the half-way No No
Live migration Yes, even if kernel, libs, etc are newer. Can use memory changes tracking to decrease freeze time Yes, if both kernels are recent Yes, but if all components are the same. Even if prelinked addresses are different, it will not restore, but it can save the whole used libs and localization files to restore program on the different machine Yes
Memory mappings Yes, all kinds Yes Partial Yes
Multiprocess Yes Yes Yes Yes
Multithread support Yes Yes Yes Yes
Namespaces Yes No No Yes
Non-POSIX files (inotify, signalfd, eventfd, etc) Yes, inotify, fanotify, epoll, signalfd, eventfd Yes, epoll, eventfd, signalfd are already supported and inotify will be supported in future Looks like no Yes
Parallel/distributed computations libraries No (planned) Yes. OpenMPI, MPICH2, OpenMP, Cilk are alredy supported and Infiniband is in progress Yes. Cray MPI, Intel MPI, LAM/MPI, MPICH-V, MPICH2, MVAPICH, Open MPI, SGI MPT Yes
Pipes Yes Yes Not yet Yes
Possible to C/R of gdb with debugged app? No, because they are using the same interface Yes No Yes
Process groups and sessions Yes Yes Not yet Yes
Ptraced programs No Yes No Yes
Retains behavior of the c/r-ed programs? Yes (but see What can change after C/R) No, because of wrappers on system calls No, because of wrappers on system calls Yes
Shared resources (files, mm, etc.) Yes. SysVIPC, files, fd table and memory Yes. System V shared memory(shmget, etc.), mmap-based shared memory, shared sockets, pipes, file descriptors No, but it is planned to support shared mmap regions Yes
Solutions for invocation in the custom software Yes, RPC and C API Yes, plugins and API Not yet Yes, via ioctl calls
System V IPC Yes Yes No Yes
TCP sockets Yes Yes Not yet Yes
Terminals Yes, but only Unix98 PTYs Yes Yes Yes
Timers Yes No. Any counter or timer active since the beginning of a process will consider the restarted process to be a new process. Yes Yes
UDP sockets Yes, both ipv4 and ipv6 Not yet. Developers of dmtcp had no request for this Not yet Yes
Unix sockets Yes Yes No Yes
Uses standard kernel? Yes, provided it’s 3.11 or later Yes Yes, just needs to load module No. OpenVZ kernel is required
X Window apps (KDE, GNOME, etc) Yes, via VNC Yes, via VNC Looks like no Yes, via VNC
Zombies Yes No No Yes

从上表可以看出在用户态完成c/r功能的工具中,criu是功能最全的,而且支持目前云计算领域最火的docker容器的c/r。criu的社区也是很活跃,issue和pr基本都可以在当天或者隔天得到响应和帮助。我也希望大家多多关注criu。

你可能感兴趣的:(docker,云计算)