MySQL:open_files_limit中的坑


学习笔记仅供参考。我们先从一个超过open_files_limit导致的问题出发,看看有哪些坑,这是5.7.11的一个问题。

一、问题重现

版本5.7.11,我们首先设置参数:

  • max_connections = 20
  • table_open_cache = 20
  • table_definition_cache=10
  • open_files_limit=300
  • innodb_open_files=300
  • OS中设置 hard nofile 300 soft nofile 300

这样重启后我们的数据库中open_files_limit设置为300如下:

mysql> show variables like '%open_files_limit%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 300   |
+------------------+-------+
1 row in set (0.00 sec)

接下来我们建立3个大约80分区的表,注意建立期间可能要重启一次数据库,释放一次open table cache/open table share,接下来我们连续打开3个表当打开最后一个表的时候如下:

mysql> select * from testpar1;
ERROR 2013 (HY000): Lost connection to MySQL server during query

查看日志如下:

2020-10-16T22:36:41.736524-05:00 2 [ERROR] InnoDB: Operating system error number 24 in a file operation.
2020-10-16T22:36:41.736527-05:00 2 [ERROR] InnoDB: Error number 24 means 'Too many open files'
2020-10-16T22:36:41.736529-05:00 2 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-10-16T22:36:41.736531-05:00 2 [Warning] InnoDB: Cannot open './test/testpar1#P#p20190528.ibd'. Have you deleted .ibd files under a running mysqld server?
2020-10-16T22:36:41.736534-05:00 2 [ERROR] InnoDB: Trying to do I/O to a tablespace which exists without .ibd data file. I/O type: read, page: [page id: space=74, page number=3], I/O length: 16384 bytes
2020-10-16T22:36:41.736537-05:00 2 [ERROR] InnoDB: trying to read page [page id: space=74, page number=3] in nonexisting or being-dropped tablespace
2020-10-16T22:36:41.736541-05:00 2 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=74, page number=3] into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. Or, the table was compressed with with an algorithm that is not supported by this instance. If it is not a decompress failure, you can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.7/en/ for more details. Aborting...
2020-10-16 22:36:41 0x7f8b751ab700  InnoDB: Assertion failure in thread 140236941866752 in file ut0ut.cc line 920
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:36:41 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=2147483648
read_buffer_size=131072
max_used_connections=1
max_threads=20
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2120458 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f8b38000ae0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f8b751aaea8 thread_stack 0x40000
/opt/mysql/mysql3320/bin/mysqld(my_print_stacktrace+0x35)[0xf45595]
/opt/mysql/mysql3320/bin/mysqld(handle_fatal_signal+0x4a4)[0x77fd34]
/lib64/libpthread.so.0(+0xf630)[0x7f8c7bbea630]
/lib64/libc.so.6(gsignal+0x37)[0x7f8c7a76a387]
/lib64/libc.so.6(abort+0x148)[0x7f8c7a76ba78]
/opt/mysql/mysql3320/bin/mysqld[0x10dd465]
/opt/mysql/mysql3320/bin/mysqld(_ZN2ib5fatalD1Ev+0xb3)[0x10e2b33]
/opt/mysql/mysql3320/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0xf56)[0x11267c6]
/opt/mysql/mysql3320/bin/mysqld(_Z31btr_cur_open_at_index_side_funcbP12dict_index_tmP9btr_cur_tmPKcmP5mtr_t+0x3a5)[0x1103405]
/opt/mysql/mysql3320/bin/mysqld[0x1062d7b]
/opt/mysql/mysql3320/bin/mysqld(_Z15row_search_mvccPh15page_cur_mode_tP14row_prebuilt_tmm+0x1f62)[0x106d4f2]
/opt/mysql/mysql3320/bin/mysqld(_ZN11ha_innobase10index_readEPhPKhj16ha_rkey_function+0x247)[0xf7c687]
/opt/mysql/mysql3320/bin/mysqld(_ZN11ha_innobase11index_firstEPh+0x39)[0xf6ccf9]
/opt/mysql/mysql3320/bin/mysqld(_ZN11ha_innopart19index_first_in_partEjPh+0x2d)[0xf8fa7d]
/opt/mysql/mysql3320/bin/mysqld(_ZN16Partition_helper36handle_unordered_scan_next_partitionEPh+0x182)[0xc6d6e2]
/opt/mysql/mysql3320/bin/mysqld(_ZN7handler14ha_index_firstEPh+0x127)[0x7c8787]
/opt/mysql/mysql3320/bin/mysqld(_Z15join_read_firstP7QEP_TAB+0x6f)[0xce6ddf]
/opt/mysql/mysql3320/bin/mysqld(_Z10sub_selectP4JOINP7QEP_TABb+0x2ba)[0xce7cca]
/opt/mysql/mysql3320/bin/mysqld(_ZN4JOIN4execEv+0x28a)[0xce622a]
/opt/mysql/mysql3320/bin/mysqld(_Z12handle_queryP3THDP3LEXP12Query_resultyy+0x250)[0xd51d00]
/opt/mysql/mysql3320/bin/mysqld[0xd12973]
/opt/mysql/mysql3320/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x3325)[0xd16605]
/opt/mysql/mysql3320/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xd1851d]
/opt/mysql/mysql3320/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x117d)[0xd196fd]
/opt/mysql/mysql3320/bin/mysqld(_Z10do_commandP3THD+0x194)[0xd1a1f4]
/opt/mysql/mysql3320/bin/mysqld(handle_connection+0x29c)[0xde605c]
/opt/mysql/mysql3320/bin/mysqld(pfs_spawn_thread+0x174)[0x1213184]
/lib64/libpthread.so.0(+0x7ea5)[0x7f8c7bbe2ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f8c7a8328dd]

我们注意日志中的error number 24 ,这是就是Linux errno

  • EMFILE 24 /* Too many open files */

二、5.7中open_files_limit 设置规则

在MySQL的官方文档中有如下说明:

The effective open_files_limit value is based on the value specified at system startup (if any) and
the values of max_connections and table_open_cache, using these formulas:
• 10 + max_connections + (table_open_cache * 2)
• max_connections * 5
• Operating system limit if that limit is positive but not Infinity
• If operating system limit is Infinity: open_files_limit value if specified at startup, 5000 if not

The server attempts to obtain the number of file descriptors using the maximum of those values. If that many descriptors cannot be obtained, the server attempts to obtain as many as the system will permit.The effective value is 0 on systems where MySQL cannot change the number of open files.
On Unix, the value cannot be set greater than ulimit -n.

也就是说参数的设置规则来自4个方面:

  • 10 + max_connections + (table_open_cache * 2)
  • max_connections * 5
  • 操作系统设置的open files
  • 参数open_files_limit的设置,默认值为5000

对于如上4个值会取其中的最大值,获取操作系统open files设置的时候使用的是getrlimit函数获取,设置则使用的是setrlimit函数,这个比较和设置函数如下:

static uint set_max_open_files(uint max_file_limit) my_set_max_open_files
{
  struct rlimit rlimit;
  uint old_cur;
  DBUG_ENTER("set_max_open_files");
  DBUG_PRINT("enter",("files: %u", max_file_limit));

  if (!getrlimit(RLIMIT_NOFILE,&rlimit)) //获取OS设置的 nofile值
  {
    old_cur= (uint) rlimit.rlim_cur;
    DBUG_PRINT("info", ("rlim_cur: %u  rlim_max: %u",
            (uint) rlimit.rlim_cur,
            (uint) rlimit.rlim_max));
    if (rlimit.rlim_cur == (rlim_t) RLIM_INFINITY)
      rlimit.rlim_cur = max_file_limit;
    if (rlimit.rlim_cur >= max_file_limit) //如果OS设置大于 调整的参数 那么以os设置为准
      DBUG_RETURN(rlimit.rlim_cur);     /* purecov: inspected */ 
    rlimit.rlim_cur= rlimit.rlim_max= max_file_limit;
    if (setrlimit(RLIMIT_NOFILE, &rlimit))  //如果OS值小于 则进行调整设置为 调整的值
      max_file_limit= old_cur;          /* Use original value */
    else//成功
    {
      rlimit.rlim_cur= 0;           /* Safety if next call fails */
      (void) getrlimit(RLIMIT_NOFILE,&rlimit);
      DBUG_PRINT("info", ("rlim_cur: %u", (uint) rlimit.rlim_cur));
      if (rlimit.rlim_cur)          /* If call didn't fail */
    max_file_limit= (uint) rlimit.rlim_cur;
    }
  }
  DBUG_PRINT("exit",("max_file_limit: %u", max_file_limit));
  DBUG_RETURN(max_file_limit);
}

对于innodb_open_files来说不能大于open_files_limit的设置,这段代码如下:

    if (innobase_open_files < 10) { //不能小于10
        innobase_open_files = 300;
        if (srv_file_per_table && table_cache_size > 300) {
            innobase_open_files = table_cache_size;
        }
    }

    if (innobase_open_files > (long) open_files_limit) { //不能超过open_files_limit的大小
        ib::warn() << "innodb_open_files should not be greater"
                       " than the open_files_limit.\n";
        if (innobase_open_files > (long) table_cache_size) {
            innobase_open_files = table_cache_size;
        }
    }

但是值得注意的是如果我们使用mysqld_safe来启动数据库,那么mysqld_safe也会通过my_print_defaults进行参数解析获取其中的open_files_limit,同时调用ulimit -n 进行设置如下:

+ arg=--open-files-limit=300
+ case "$arg" in
+ open_files=300
...
+ ulimit -n 300

因为mysqld_safe和mysqld属于父子进程,那么mysqld复制了其中的PCB那么资源使用上限和mysqld_safe一致,比如这里mysqld的file limit也是300。
并且我们需要注意的另外一点,如果mysqld_safe不重启而修改了参数文件中open_files_limit,光是重启了mysqld比如kill 发起杀掉mysqld,起来后open_file_limits也不会更改,因为默认mysqld_safe解析了参数文件的open_files_limit会带入启动项目(--open-files-limit=65535),但是mysqld_safe并没有重启,值还是老的,并且会覆盖mysqld重新读取的参数文件设置。
我们可以简单测试一下如下:

  • mytest简单的获取当前进程的file limit
#include
#include
#include 
#include 
int main(void){
 struct rlimit rlimit;
 getrlimit(RLIMIT_NOFILE,&rlimit);
 printf("limit files soft limit is %ld ,hard limit is %ld\n",(uint) rlimit.rlim_cur,(uint) rlimit.rlim_max);
 sleep(1000);
}
  • mytest.sh 设置ulimit -n 调用mytest,模拟mysqld_safe调用mysqld
#!/bin/bash
ulimit -n 300
/opt/mysql/mysql3306/install/support-files/mytest

测试如下

  • 当前OS设置open file limit是30000,直接跑mytest:
[root@mgr1 support-files]# ulimit -a |grep file
core file size          (blocks, -c) unlimited
file size               (blocks, -f) unlimited
open files                      (-n) 30000
file locks                      (-x) unlimited
[root@mgr1 support-files]# ./mytest
limit files soft limit is 30000 ,hard limit is 30000

  • 当前OS设置open file limit是30000,mytest.sh调用mytest:
[root@mgr1 support-files]# ulimit -a |grep file
core file size          (blocks, -c) unlimited
file size               (blocks, -f) unlimited
open files                      (-n) 30000
file locks                      (-x) unlimited
[root@mgr1 support-files]# ./mytest.sh 
limit files soft limit is 300 ,hard limit is 300

这是一个父子进程
root      25782  25751  0 03:43 pts/2    00:00:00 /bin/bash ./mytest.sh
root      25783  25782  0 03:43 pts/2    00:00:00 /opt/mysql/mysql3306/install/support-files/mytest

因此我们如果参数文件设置了open_files_limit,那么使用mysqld_safe启动的时候将会使用open_files_limit设置的值,这一点5.7版本是要注意的。总结一下:

  • 如果使用mysqld直接启动数据库那么将遵守官方文档规则。
  • 如果使用mysqld_safe拉取mysqld,如果参数文件设置了open_files_limit那么mysqld_safe会获取open_files_limit的设置并且使用ulimit -n 进行修改,然后拉起mysqld,不依赖OS的设置,官方文档规则去掉第三条。
  • 如果使用mysqld_safe拉取mysqld,如果参数文件没有设置open_files_limit则遵守官方文档规则。

三、错误流程

1、报错流程
  • os_file_get_last_error_low:
        ib::error()
            << "Operating system error number "
            << err
            << " in a file operation.";
        ...
        } else if (!os_diagnose_all_o_direct_einval(err)) {
            if (strerror(err) != NULL) {

                ib::error()
                    << "Error number " << err << " means '"
                    << strerror(err) << "'";
            }

            ib::info() << OPERATING_SYSTEM_ERROR_MSG;
        }

  • fil_node_open_file :
ib::warn() << "Cannot open '" << node->name << "'."
                " Have you deleted .ibd files under a"
                " running mysqld server?";
  • _fil_io :
if (!req_type.ignore_missing()) {
                ib::error()
                    << "Trying to do I/O to a tablespace"
                    " which exists without .ibd data file."
                    " I/O type: "
                    << (req_type.is_read()
                        ? "read" : "write")
                    << ", page: "
                    << page_id_t(page_id.space(),
                             cur_page_no)
                    << ", I/O length: " << len << " bytes";
            }

  • buf_read_page

    if (err == DB_TABLESPACE_DELETED) {
        ib::error() << "trying to read page " << page_id
            << " in nonexisting or being-dropped tablespace";
    }
2、信号处理发出

当上面的报错联系100次后将会触发crash,其中BUF_PAGE_READ_MAX_RETRIES为100

        if (buf_read_page(page_id, page_size, trx)) { //进行物理文件读取
            buf_read_ahead_random(page_id, page_size,
                          ibuf_inside(mtr), trx);

            retries = 0;
        } else if (retries < BUF_PAGE_READ_MAX_RETRIES) {
            ++retries;
            DBUG_EXECUTE_IF(
                "innodb_page_corruption_retries",
                retries = BUF_PAGE_READ_MAX_RETRIES;
            );
        } else {
            ib::fatal() << "Unable to read page " << page_id
                << " into the buffer pool after "
                << BUF_PAGE_READ_MAX_RETRIES << " attempts."
                " The most probable cause of this error may"
                " be that the table has been corrupted. Or,"
                " the table was compressed with with an"
                " algorithm that is not supported by this"
                " instance. If it is not a decompress failure,"
                " you can try to fix this problem by using"
                " innodb_force_recovery."
                " Please see " REFMAN " for more"
                " details. Aborting...";
        }

fatal的析构函数如下:

fatal::~fatal()
{
    sql_print_error("[FATAL] InnoDB: %s", m_oss.str().c_str());
    ut_error;
}

ut_error会调用assert(0),触发abort(),发出signal 6信号,导致crash。值得注意的是5.7.29的报错已经不一样了,这段代码有所变化,并且不会触发crash,报错如下:

2020-10-15T23:24:06.546126-05:00 2 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-10-15T23:24:06.546130-05:00 2 [Note] InnoDB: fil_sys open file LRU len 276
2020-10-15T23:24:06.547016-05:00 2 [ERROR] InnoDB: Operating system error number 24 in a file operation.
2020-10-15T23:24:06.547030-05:00 2 [ERROR] InnoDB: Error number 24 means 'Too many open files'
2020-10-15T23:24:06.547035-05:00 2 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-10-15T23:24:06.547039-05:00 2 [Note] InnoDB: fil_sys open file LRU len 276

四、关于assert断言

我们可以简单写一个assert触发一下abort函数看看捕获的信号是什么如下:

#include
#include
#include 
#include 
#include 

#define ut_error assert(0)

void handsig6(int sig)
{
  if(sig ==  SIGABRT){
    printf("%s\n%s\n","I catch SIGABRT singal is  6","do other things exp dump crash stack!!")        
  exit(0);
  }
}

int fatal()
{
        printf("get fatal error\n");
        ut_error;
}

int main(void)
{
  signal(SIGABRT,handsig6);
  fatal();
}

执行如下:

get fatal error
a.out: test10.c:23: fatal: Assertion `0' failed.
I catch SIGABRT singal is  6
do other things exp dump crash stack!!

可以看到触发的正是SIGABRT信号即信号6

五、对于普通用户可能修改进程属性失败

这么一个场景 使用mysql用户,使用mysqld_safe拉mysqld进程,但是mysqld用户的open file limit 没有改为1024,但是open_files_limit参数为了10240。这个时候会出现如下情况:

image.png

可以看到 open-files-limit在mysqld启动命令行已经有参数--open-files-limit=10240,说明参数已经读取到了。但是mysql用户使用mysqld_safe拉取mysqld的时候mysqld_safe进程的打开文件数默认为1024,mysqld继承了mysqld_safe的进程属性也是1024,因为子进程的资源限制部分是来自于主进程的。根据参数设置,mysqld试图修改自己的limit的时候发现为mysql用户拉取的mysqld因此修改不能进行(hard限制),为什么不能修改参考后面的部分,并且报错如下:

2021-08-26T10:34:54.470469Z 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 25000)
2021-08-26T10:34:54.470625Z 0 [Warning] Changed limits: max_connections: 214 (requested 5000)
2021-08-26T10:34:54.470630Z 0 [Warning] Changed limits: table_open_cache: 400 (requested 2048)

当然我们也可以测试一下如下,代码如下:

#include
#include
#include 
#include 
int main(void){
 struct rlimit rlimit;
 getrlimit(RLIMIT_NOFILE,&rlimit);
 printf("current limit files soft limit is %ld ,hard limit is %ld\n",(uint) rlimit.rlim_cur,(uint) rlimit.rlim_max);
 
     rlimit.rlim_cur=65535; //软限制
     rlimit.rlim_max=65535; //硬限制
     if (setrlimit(RLIMIT_NOFILE,&rlimit)<0)
     {
         printf("setrlimit error to 65535\n");
     }
     
 getrlimit(RLIMIT_NOFILE,&rlimit);
 printf("current limit files soft limit is %ld ,hard limit is %ld\n",(uint) rlimit.rlim_cur,(uint) rlimit.rlim_max);
 
 sleep(1000);
}

mysql用户输出:

[mysql@mgr5 ~]$ id
uid=1000(mysql) gid=1000(mysql) groups=1000(mysql) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[mysql@mgr5 ~]$ ./a.out 
current limit files soft limit is 1024 ,hard limit is 4096
setrlimit error to 65535
current limit files soft limit is 1024 ,hard limit is 4096

root用户输出如下:

[root@mgr5 mysql]# ./a.out 
current limit files soft limit is 1024 ,hard limit is 4096
current limit files soft limit is 65535 ,hard limit is 65535

因此对于mysql用户启动的MYSQLD情况下,应该保证OS的limit已经修改,否则程序自己调用setrlimit,可能不能修改。

六、systemctl 拉取mysqld

这种情况由于systemctl 有自己的配置文件,以systemctl的配置为准,也是一个坑,官方文档如下:

To set the number of file descriptors available to the MySQL server, use LimitNOFILE in
override.conf rather than the open_files_limit system variable for mysqld or --open-
files-limit option for mysqld_safe.

七、soft limit和hard limit

  • soft limit: 程序自生可以修改到hard limit的限制,但是不能突破hard limit的限制
  • hard limit:对于非特权用户启动的进程自生无法改大hard limit,但是对于特权用户启动的进程自生可以修改hard limit

特权用户指的是拥有CAP_SYSTEM_RESOURCE能力的用户。

修改则是setrlimit(RLIMIT_NOFILE, &rlimit),其中RLIMIT_NOFILE为打开文件数,而 rlimit包含soft和hard属性

  • rlimit.rlim_cur 为soft limit
  • rlimit.rlim_max 为hard limit

查看限制

  • ulimit -aS:查看当前软链接
  • ulimit -aH:查看当前硬限制

MySQL 代码中,可以根据open_files_limit进行修改

rlimit.rlim_cur= rlimit.rlim_max= max_file_limit; //修改打开文件数的软硬限制
    if (setrlimit(RLIMIT_NOFILE, &rlimit))

但是我们应该更加注意soft 限制,因为mysqld通常以mysql用户启动,不能修改hard限制,即便代码写了也会修改失败。官方文档也有如下描述:

--open-files-limit and ulimit can increase the number of file descriptors, but only up to the limit imposed by the operating system. There is also a “hard” limit that can be overridden only if you start mysqld_safe or mysqld as root

你可能感兴趣的:(MySQL:open_files_limit中的坑)