man手册:
SYNOPSIS top
#include
int mount(const char *source, const char *target,
const char *filesystemtype, unsigned long mountflags,
const void *data);
当我们fork新的进程,子进程会使用父进程的文件系统。
但如果我们想要把子进程的 / 文件系统修改成 /opt/busybox 怎么办呢?
这时候就要使用 pivot_root 了
int pivot_root(const char *new_root, const char *put_old);
它的作用是将进程的 / 更改为 new_root,原 / 存放到 put_old 文件夹下。
我想使用 PivotRoot 来修改容器进程的根文件系统路径。
但每次进行pivot_root系统调用,总会出 Invalid arguments 错误
出错原因如下:
Not withstanding the fact that the default propagation type for new
mount points is in many cases MS_PRIVATE, MS_SHARED is typically more
useful. For this reason, systemd(1) automatically remounts all mount
points as MS_SHARED on system startup. Thus, on most modern systems,
the default propagation type is in practice MS_SHARED.
systemd会将 fs 修改为 shared,
查看 pivot_root的文档,Errors session无非是以下几种:
EINVAL new_root is not a mount point.
EINVAL put_old is not underneath new_root.
EINVAL The current root is on the rootfs (initial ramfs) filesystem.
EINVAL Either the mount point at new_root, or the parent mount of
that mount point, has propagation type MS_SHARED.EINVAL put_old is a mount point and has the propagation type
MS_SHARED.
其中第四条,pivot root 不允许 parent mount point 和 new mount point 是 shared。
可以修改为 MS_PRIVATE(mount --make-rprivate /)
根据 docker runC 注释
func rootfsParentMountPrivate(rootfs string) error {
...
if sharedMount {
return unix.Mount("", parentMount, "", unix.MS_PRIVATE, "")
}
...
}
// Make parent mount PRIVATE if it was shared. It is needed for two //
reasons. First of all pivot_root() will fail if parent mount is //
shared. Secondly when we bind mount rootfs it will propagate to //
parent namespace and we don’t want that to happen.
或者也可以修改为 MS_SLAVE:
根据 docker runC 注释
// Make oldroot rslave to make sure our unmounts don’t propagate to
the host (and thus bork the machine). We don’t use rprivate because
this is known to cause issues due to races where we still have a
reference to a mount while a process in the host namespace are trying
to operate on something they think has no mounts (devicemapper in
particular).
func pivotRoot(rootfs string) error {
...
if err := unix.Mount("", ".", "", unix.MS_SLAVE|unix.MS_REC, ""); err != nil {
return err
}
...
回到我个人的项目,
选择在创建容器时,挂载镜像之前,将 fs 修改为 MS_SLAVE,
int doContainer(void *param) {
auto *runParam = (RunParam *)param;
// Make parent mount MS_SLAVE
if (mount("", "/", NULL, MS_REC | MS_SLAVE, NULL) == -1) {
dprintf(STDERR_FILENO, "[%d] mount --make-rslave: %s\n", __LINE__,
strerror(errno));
exit(-1);
}
// mount proc system, then container could not find others process
// CLONE_NEWUSER TODO BUG??
if (mount("proc", "/proc", "proc", MS_NOEXEC | MS_NOSUID | MS_NODEV, NULL)) {
cerr << "mount proc error: " << getErr() << endl;
exit(-1);
}
...
}