一、引言:
单个salt-master下的minion数已经达到2101个了,所以在master日志有如下的提示:
2016-09-09 11:36:22,221 [salt.utils.verify][CRITICAL][10919] The number of accepted minion keys(2101) should be lower than 1/4 of the max open files soft setting(4096). Please consider raising this value.
如果不能解决这个问题将无数加入新节点。从日志中可以看出max open files的值是4096,很奇怪!
通过ulimit -a看到open files是65535,从这里联想到是不是salt得问题?
二、解决问题:
在度娘和G哥上一顿搜索,该github上saltstack有一个issues:
salt-master not recognizing max files increase #5323
在/usr/lib/python2.7/site-packages/salt/utils/verify.py脚本check_max_open_files的函数,具体如下:
def check_max_open_files(opts): ''' Check the number of max allowed open files and adjust if needed ''' mof_c = opts.get('max_open_files', 100000) if sys.platform.startswith('win'): # Check the Windows API for more detail on this # http://msdn.microsoft.com/en-us/library/xt874334(v=vs.71).aspx # and the python binding http://timgolden.me.uk/pywin32-docs/win32file.html mof_s = mof_h = win32file._getmaxstdio() else: mof_s, mof_h = resource.getrlimit(resource.RLIMIT_NOFILE) accepted_keys_dir = os.path.join(opts.get('pki_dir'), 'minions') accepted_count = len(os.listdir(accepted_keys_dir)) log.debug( 'This salt-master instance has accepted {0} minion keys.'.format( accepted_count ) ) level = logging.INFO if (accepted_count * 4) <= mof_s: # We check for the soft value of max open files here because that's the # value the user chose to raise to. # # The number of accepted keys multiplied by four(4) is lower than the # soft value, everything should be OK return msg = ( 'The number of accepted minion keys({0}) should be lower than 1/4 ' 'of the max open files soft setting({1}). '.format( accepted_count, mof_s ) ) with open("/tmp/openfile.txt","a") as f: f.write("mof_s-->%s\n"%mof_s) f.write("accepted_count-->%s\n"%accepted_count) if accepted_count >= mof_s: # This should never occur, it might have already crashed msg += 'salt-master will crash pretty soon! ' level = logging.CRITICAL elif (accepted_count * 2) >= mof_s: # This is way too low, CRITICAL level = logging.CRITICAL elif (accepted_count * 3) >= mof_s: level = logging.WARNING # The accepted count is more than 3 time, WARN elif (accepted_count * 4) >= mof_s: level = logging.INFO if mof_c < mof_h: msg += ('According to the system\'s hard limit, there\'s still a ' 'margin of {0} to raise the salt\'s max_open_files ' 'setting. ').format(mof_h - mof_c) msg += 'Please consider raising this value.' log.log(level=level, msg=msg)
通过resource.getrlimit(resource.RLIMIT_NOFILE)得到软和硬的两种打开最大文件数,单独执行该方法:
>>> import resource >>> resource.getrlimit(resource.RLIMIT_NOFILE) (65535, 65535)
很奇怪,为什么单独执行是65535,而salt执行出来的是4096。
线上有多个salt-master,正好操作系统的版本是不一样的,经过检查发现只有centos7 以上的才会出现这种情况,那就是系统的问题了。
在centos5/6等版本中,资源限制的配置可以在/etc/security/limits.conf设置,针对root/user等各个用户或者*代表所有用户来设置。当然,/etc/security/limits.d/中可以配置,系统是先加载limits.conf然后按照英文字母顺序加载limits.d目录下的配置文件,后加载配置覆盖之前的配置。
不过在centos7/rhel7的系统中,使用Systemd替代了之前的SysV,因此/etc/security/limits.conf文件的配置作用域缩小了一些。limits.conf这里的配置,只适用于通过PAM认证登录用户的资源限制,它对systemd的service的资源限制不生效。登录用户的限制,与上面讲的一样,通过/etc/security/limits.conf和limits.d来配置即可。
对于systemd services的资源限制,如何配置呢?
全局的配置,放在文件/etc/systemd/system.conf和/etc/systemd/user.conf。同时,也会加载两个对应的目录中的所有.conf文件/etc/systemd/system.conf.d/*.conf和/etc/systemd/user.conf.d/*.conf。其中,system.conf是系统实例使用的,user.conf是用户实例使用的。一般的service使用system.conf中的配置即可。system.conf.d/*.conf中的配置会覆盖system.conf。
但是如果修改/etc/systemd/system.conf的话需要重启系统才会生效。
针对单个service,可以直接设置它自己的:
然后运行如下命令,才能生效:
sudo systemctl daemon-reload
sudo systemctl restart salt-master.service