CRIU(Checkpoint/Restore In Userspace)运行在linux操作系统上的一个软件工具,其功能是在用户空间实现Checkpoint/Restore功能。使用这个工具,你可以冻结一个正在运行的程序,并且checkpoint它到一系列的文件,然后你就可以使用这些文件在任何主机重新恢复这个程序到被冻结的那个点(白话就是实现对已运行程序的备份和恢复)。所以criu通常被用在程序或者容器的热迁移、快照、远程调试等。CRIU 起初是Virtuozzo的一个项目,随着开源社区的帮助,现在也被整合到OpenVZ(它是 Virtuozzo 的开源版本), LXC/LXD, Docker, Podman等软件项目里。
源码地址:https://github.com/checkpoint-restore/criu
有阅读价值的文章:https://lwn.net/Articles/525675/
BLCR(全称Berkeley Lab Checkpoint/Restart)提供了用户态的libcr库和kernel module来完成相关的Checkpoint/Restart工作。 如果需要对一个进程进行Checkpoint,那么有两种方式使用BLCR相关工具:
这两种方法功能一样,都是会让程序在运行前优先加载某个动态链接库(libcr.so 或libcr-run.so)。这两个动态库主要功能就是提前向kernel注册一个信号。BLCR通过在kernel中的cr_module模块向要被Checkpoint的进程发送这个信号来通知进程对自己做Checkpoint。真正的Checkpoint工作由内核完成。C/R过程由如下两个命令完成:
$ ## Checkpoint过程
$ cr_checkpoint PID值
$ ## Restore过程
$ cr_restart context.PID值
更详细的介绍可以参考:https://blog.csdn.net/u012569600/article/details/24132479
https://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/
疑问:不过从这个网站News感觉到,BLCR好像已经停止了维护和更新。不知道为什么网络上有好多人还用此工具做Android 应用的快速启动。
DMTCP(全称 Distributed MultiThreaded CheckPointing) 以library库的形式实现对一个进程的checkpoint/restore。这就意味着当你要使用DMTCP对一个程序热迁移,就要求这个程序之前必须是用DMTCP启动的。这样才能保证DMTCP Library动态库被链接到了这个程序。 当这样启动时,DMTCP库会截获应用程序中的一定数量的库调用(相当于代理),构建一个有关进程内部信息的影子数据库,然后将请求转发给glibc/kernel。收集的信息将用于生成应用程序的镜像文件。使用这种方法,只能checkpoint与DMTCP库一起成功运行的应用程序,但DMTCP不为所有内核API提供代理(例如,已知inotify()不受支持)。这种方法的另一个含义是由于请求代理而产生的潜在性能问题。 下面是DMTCP的demo:
# vi test/demo.c
> int main(int argc, char* argv[])
> {
int count = 1;
> while (1)
> {
printf(" %2d ",count++);
> fflush(stdout);
> sleep(2); }
> return 0; }
# test/demo
1 2 3 ^C
# bin/dmtcp_launch --interval 5 test/demo
1 2 3 4 5 6 7 ^C
# ls ckpt_demo*
ckpt_demo_66e1c8437adb789-40000-5745d372.dmtcp
# bin/dmtcp_restart ckpt_demo*
7 8 9 10 ^C
DMTCP实现进程的Restoration过程也很棘手。例如Checkpoint(备份)期间,getpid()可用于检索进程的PID,但在还原期间没有相应的API来设置进程的PID(fork()系统调用不允许调用方指定子进程的PID)。为了解决这个问题,DMTCP通过拦截getpid()的库调用并向应用程序提供假的PID值给进程。这样的行为是非常危险的,因为如果应用程序试图通过这个假的PID访问/proc文件系统中的文件,则可能会看到错误的文件。而CRIU为了解决这个问题,开发人员添加了一个API,可以用来控制下一个fork()调用选择的PID。
DMTCP源码地址:https://github.com/dmtcp/dmtcp
完全在内核里完成C/R功能。具体技术和使用还不了解。
目前CRIU实现了与OpenVZ基于内核的检查点/恢复机制几乎相同的所有功能。
功能 | CRIU | DMTCP | BLCR | OPVZ |
---|---|---|---|---|
Arch | x86_64, ARM, AArch64, PPC64le、mips64el | x86, x86_64, ARM | x86, x86_64, PPC/PPC64, ARM | x86, x86_64 |
Block devices | No | Looks like yes | No | No |
Can be used as non-root user? | Yes, but user can only manipulate tasks belonging to him | Yes | Yes | No |
Can be used without preloading special libraries before app start? | Yes | No | No | Yes |
Can run unmodified programs? | Yes | Yes | No. Statically linked and/or threaded apps are unsupported. | Yes |
Can run unprepared tasks? | Yes | No. It preloads the DMTCP library. That library runs before the routine main(). It creates a second thread. The checkpoint thread then creates a socket to the DMTCP coordinator and registers itself. The checkpoint thread also creates a signal handler. | No. CR shall notify processes when a checkpoint is to occur (before the kernel takes a checkpoint) to allow the processes to prepare itself accordingly. | Yes |
Capture the contents of open files | Yes, if file is unlinked | Looks like no | Not yet | Yes |
Character devices | Yes, only /dev/null, /dev/zero, etc. are supported | Yes, looks like null and zero are supported | Yes, /dev/null and /dev/zero | Yes |
Containers | Yes, LXC and OpenVZ containers | No. It doesn’t support namespaces, so it probably can’t dump containers | Looks like no | Yes |
Established TCP connection | Yes | No, but you can write a simple DMTCP plugin that tells DMTCP how you want to reconnect on restart | No | Yes |
Infiniband | No | Not yet, developing is on the half-way | No | No |
Live migration | Yes, even if kernel, libs, etc are newer. Can use memory changes tracking to decrease freeze time | Yes, if both kernels are recent | Yes, but if all components are the same. Even if prelinked addresses are different, it will not restore, but it can save the whole used libs and localization files to restore program on the different machine | Yes |
Memory mappings | Yes, all kinds | Yes | Partial | Yes |
Multiprocess | Yes | Yes | Yes | Yes |
Multithread support | Yes | Yes | Yes | Yes |
Namespaces | Yes | No | No | Yes |
Non-POSIX files (inotify, signalfd, eventfd, etc) | Yes, inotify, fanotify, epoll, signalfd, eventfd | Yes, epoll, eventfd, signalfd are already supported and inotify will be supported in future | Looks like no | Yes |
Parallel/distributed computations libraries | No (planned) | Yes. OpenMPI, MPICH2, OpenMP, Cilk are alredy supported and Infiniband is in progress | Yes. Cray MPI, Intel MPI, LAM/MPI, MPICH-V, MPICH2, MVAPICH, Open MPI, SGI MPT | Yes |
Pipes | Yes | Yes | Not yet | Yes |
Possible to C/R of gdb with debugged app? | No, because they are using the same interface | Yes | No | Yes |
Process groups and sessions | Yes | Yes | Not yet | Yes |
Ptraced programs | No | Yes | No | Yes |
Retains behavior of the c/r-ed programs? | Yes (but see What can change after C/R) | No, because of wrappers on system calls | No, because of wrappers on system calls | Yes |
Shared resources (files, mm, etc.) | Yes. SysVIPC, files, fd table and memory | Yes. System V shared memory(shmget, etc.), mmap-based shared memory, shared sockets, pipes, file descriptors | No, but it is planned to support shared mmap regions | Yes |
Solutions for invocation in the custom software | Yes, RPC and C API | Yes, plugins and API | Not yet | Yes, via ioctl calls |
System V IPC | Yes | Yes | No | Yes |
TCP sockets | Yes | Yes | Not yet | Yes |
Terminals | Yes, but only Unix98 PTYs | Yes | Yes | Yes |
Timers | Yes | No. Any counter or timer active since the beginning of a process will consider the restarted process to be a new process. | Yes | Yes |
UDP sockets | Yes, both ipv4 and ipv6 | Not yet. Developers of dmtcp had no request for this | Not yet | Yes |
Unix sockets | Yes | Yes | No | Yes |
Uses standard kernel? | Yes, provided it’s 3.11 or later | Yes | Yes, just needs to load module | No. OpenVZ kernel is required |
X Window apps (KDE, GNOME, etc) | Yes, via VNC | Yes, via VNC | Looks like no | Yes, via VNC |
Zombies | Yes | No | No | Yes |
从上表可以看出在用户态完成c/r功能的工具中,criu是功能最全的,而且支持目前云计算领域最火的docker容器的c/r。criu的社区也是很活跃,issue和pr基本都可以在当天或者隔天得到响应和帮助。我也希望大家多多关注criu。