fsnotify 与 too many open files

fsnotify

fsnotify 是用来监听文件、目录变化的一个 golang 开源库

在 Linux 系统使用中,遇到了too many open files问题

首次尝试

通常,有 2 处配置太小,会触发too many open files错误:

  • /etc/sysctl.conf文件中的fs.file-max
  • /etc/security/limits.conf文件中的hard nofilehard nofile

因此,更改了相关值:

[root@qa5 ~]# ulimit -a | grep open
open files                      (-n) 1024000

结果,还是出现too many open files问题

再次尝试

看官方文档,搜索到 README 中,也提到:

How many files can be watched at once?
There are OS-specific limits as to how many watches can be created:
Linux: /proc/sys/fs/inotify/max_user_watches contains the limit, reaching this limit results in a “no space left on device” error.
BSD / OSX: sysctl variables “kern.maxfiles” and “kern.maxfilesperproc”, reaching these limits results in a “too many open files” error.

因此,怀疑 max_user_watches 相关配置有问题

[root@qa5 ~]# sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 8192

fs.inotify.max_user_instances 只有 128 ,而这里使用的场景会开很多服务进程,每个服务都会 watch file

进程数也远超 128

修改 fs.inotify.max_user_instances 值,too many open files问题解除

检查 inotify 数量的工具

获取别人写的 shell 脚本

wget https://github.com/fatso83/dotfiles/blob/master/utils/scripts/inotify-consumers
chmod +x inotify-consumers

执行:

[root@qa5 ~]# ./inotify-consumers

   INOTIFY
   WATCHES
    COUNT     PID USER     COMMAND
--------------------------------------
       9      1 root     /usr/lib/systemd/systemd --switched-root --system --deserialize 22
       9   1186 root     /usr/sbin/NetworkManager --no-daemon
       8    869 root     /usr/lib/systemd/systemd-udevd
       7   1195 polkitd  /usr/lib/polkit-1/polkitd --no-debug
       3   1209 root     /usr/sbin/crond -n
       2   1620 root     /usr/sbin/rsyslogd -n
       2   1145 dbus     /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

      40  WATCHES TOTAL COUNT

40 WATCHES TOTAL COUNT的意思是,当前共有 40 个在监听文件变动

参考文档

  • https://blog.csdn.net/Man_In_The_Night/article/details/105379137
  • https://blog.csdn.net/weiguang1017/article/details/54381439
  • https://www.ibm.com/docs/en/ahte/4.0?topic=wf-configuring-linux-many-watch-folders

其他问题

在增大open files 限制时,还遇到 docker 启动失败的问题:

Sep 22 13:16:57 qa5.haidao systemd[1]: Stopped containerd container runtime.
-- Subject: Unit containerd.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit containerd.service has finished shutting down.
Sep 22 13:16:57 qa5.haidao systemd[1]: Starting containerd container runtime...
-- Subject: Unit containerd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit containerd.service has begun starting up.
Sep 22 13:16:57 qa5.haidao systemd[8690]: Failed at step LIMITS spawning /sbin/modprobe: Operation not permitted
-- Subject: Process /sbin/modprobe could not be executed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The process /sbin/modprobe could not be executed and failed.
--
-- The error number returned by this process is 1.
Sep 22 13:16:57 qa5.haidao systemd[8699]: Failed at step LIMITS spawning /usr/bin/containerd: Operation not permitted
-- Subject: Process /usr/bin/containerd could not be executed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The process /usr/bin/containerd could not be executed and failed.
--
-- The error number returned by this process is 1.
Sep 22 13:16:57 qa5.haidao systemd[1]: containerd.service: main process exited, code=exited, status=205/LIMITS
Sep 22 13:16:57 qa5.haidao systemd[1]: Failed to start containerd container runtime.
-- Subject: Unit containerd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit containerd.service has failed.
--
-- The result is failed.
Sep 22 13:16:57 qa5.haidao systemd[1]: Dependency failed for Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is dependency.
Sep 22 13:16:57 qa5.haidao systemd[1]: Job docker.service/start failed with result 'dependency'.
Sep 22 13:16:57 qa5.haidao systemd[1]: Unit containerd.service entered failed state.
Sep 22 13:16:57 qa5.haidao systemd[1]: containerd.service failed.
Sep 22 13:16:57 qa5.haidao polkitd[1163]: Unregistered Authentication Agent for unix-process:8667:522314 (system bus name :1.393, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

原因是,docker serivce 配置 /usr/lib/systemd/system/docker.serviceLimitNOFILE=infinity
/etc/sysctl.conf 中配置了 fs.nr_open=10240000 冲突
删除fs.nr_open=10240000 并执行:

sysctl -p --system

再重启 docker

你可能感兴趣的:(linux,Go语言杂文,fsnotify,docker,too,many,open,open,files,ulimit)