Saltstack之salt-master的打开文件数问题

一、引言:

单个salt-master下的minion数已经达到2101个了,所以在master日志有如下的提示:

2016-09-09 11:36:22,221 [salt.utils.verify][CRITICAL][10919] The number of accepted minion keys(2101) should be lower than 1/4 of the max open files soft setting(4096). Please consider raising this value.

如果不能解决这个问题将无数加入新节点。从日志中可以看出max open files的值是4096,很奇怪!

Saltstack之salt-master的打开文件数问题_第1张图片

通过ulimit -a看到open files是65535,从这里联想到是不是salt得问题?

二、解决问题:

在度娘和G哥上一顿搜索,该github上saltstack有一个issues:

salt-master not recognizing max files increase #5323

在/usr/lib/python2.7/site-packages/salt/utils/verify.py脚本check_max_open_files的函数,具体如下:

def check_max_open_files(opts):
    '''
    Check the number of max allowed open files and adjust if needed
    '''
    mof_c = opts.get('max_open_files', 100000)
    if sys.platform.startswith('win'):
        # Check the Windows API for more detail on this
        # http://msdn.microsoft.com/en-us/library/xt874334(v=vs.71).aspx
        # and the python binding http://timgolden.me.uk/pywin32-docs/win32file.html
        mof_s = mof_h = win32file._getmaxstdio()
    else:
        mof_s, mof_h = resource.getrlimit(resource.RLIMIT_NOFILE)

    accepted_keys_dir = os.path.join(opts.get('pki_dir'), 'minions')
    accepted_count = len(os.listdir(accepted_keys_dir))

    log.debug(
        'This salt-master instance has accepted {0} minion keys.'.format(
            accepted_count
        )
    )

    level = logging.INFO

    if (accepted_count * 4) <= mof_s:
        # We check for the soft value of max open files here because that's the
        # value the user chose to raise to.
        #
        # The number of accepted keys multiplied by four(4) is lower than the
        # soft value, everything should be OK
        return

    msg = (
        'The number of accepted minion keys({0}) should be lower than 1/4 '
        'of the max open files soft setting({1}). '.format(
            accepted_count, mof_s
        )
    )
    with open("/tmp/openfile.txt","a") as f:
        f.write("mof_s-->%s\n"%mof_s)
        f.write("accepted_count-->%s\n"%accepted_count)

    if accepted_count >= mof_s:
        # This should never occur, it might have already crashed
        msg += 'salt-master will crash pretty soon! '
        level = logging.CRITICAL
    elif (accepted_count * 2) >= mof_s:
        # This is way too low, CRITICAL
        level = logging.CRITICAL
    elif (accepted_count * 3) >= mof_s:
        level = logging.WARNING
        # The accepted count is more than 3 time, WARN
    elif (accepted_count * 4) >= mof_s:
        level = logging.INFO

    if mof_c < mof_h:
        msg += ('According to the system\'s hard limit, there\'s still a '
                'margin of {0} to raise the salt\'s max_open_files '
                'setting. ').format(mof_h - mof_c)

    msg += 'Please consider raising this value.'
    log.log(level=level, msg=msg)

通过resource.getrlimit(resource.RLIMIT_NOFILE)得到软和硬的两种打开最大文件数,单独执行该方法:

>>> import resource
>>> resource.getrlimit(resource.RLIMIT_NOFILE)
(65535, 65535)

很奇怪,为什么单独执行是65535,而salt执行出来的是4096。

线上有多个salt-master,正好操作系统的版本是不一样的,经过检查发现只有centos7 以上的才会出现这种情况,那就是系统的问题了。

在centos5/6等版本中,资源限制的配置可以在/etc/security/limits.conf设置,针对root/user等各个用户或者*代表所有用户来设置。当然,/etc/security/limits.d/中可以配置,系统是先加载limits.conf然后按照英文字母顺序加载limits.d目录下的配置文件,后加载配置覆盖之前的配置。

不过在centos7/rhel7的系统中,使用Systemd替代了之前的SysV,因此/etc/security/limits.conf文件的配置作用域缩小了一些。limits.conf这里的配置,只适用于通过PAM认证登录用户的资源限制,它对systemd的service的资源限制不生效。登录用户的限制,与上面讲的一样,通过/etc/security/limits.conf和limits.d来配置即可。

对于systemd services的资源限制,如何配置呢?

全局的配置,放在文件/etc/systemd/system.conf和/etc/systemd/user.conf。同时,也会加载两个对应的目录中的所有.conf文件/etc/systemd/system.conf.d/*.conf和/etc/systemd/user.conf.d/*.conf。其中,system.conf是系统实例使用的,user.conf是用户实例使用的。一般的service使用system.conf中的配置即可。system.conf.d/*.conf中的配置会覆盖system.conf。

但是如果修改/etc/systemd/system.conf的话需要重启系统才会生效。

针对单个service,可以直接设置它自己的:

 然后运行如下命令,才能生效:

sudo systemctl daemon-reload
sudo systemctl restart salt-master.service

 

你可能感兴趣的:(Saltstack之salt-master的打开文件数问题)