
这两天遇到mysql宕机的问题,导致数据出现不一致的现象,结果发现有个特别重要的参数innodb_flush_log_at_trx_commit特别需要关注,默认情况下这个参数是1,即严格保证数据库的一致性,事务在提交之后立即将log buffer的数据写入到log file,同时调用文件系统的flush操作。

仔细分析一下mysql和文件系统的关系,基本上分为write和flush两个主要的操作,mysql管理自己的log buffer,文件系统管理log file,mysql事务提交后会调用write方法将数据从log buffer写入到log file,即持久化;学过操作系统的同学都知道文件系统为了提高IO的效率,本身会对每个文件做缓存,所以要想保证数据真正落地到磁盘上,有时会多一步flush操作,这个flush操作可能是文件系统自己flush,也可能是mysql通过系统调用强制让文件系统去flush。


但是出现故障的时候innodb_flush_log_at_trx_commit参数的设置为2,根据下面文档的解释,为2时mysql会在事务提交之后立即向文件系统写入log file,但并不会立即调用文件系统的flush操作,而是由定时任务调度每隔1秒flush一次,这就会引发数据丢失问题:当操作系统挂掉时,恰好文件系统自己也没有调用flush操作,那么那部分写入到文件缓存的数据就会丢失。如果mysql自己挂掉,由于log buffer中的数据已经写入文件系统,只要文件系统不挂掉,数据还是在的。

The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB
log buffer are written out to thelog file at each transaction commit and the log file is flushed to disk.

With a value of 0, the contents of the InnoDB
log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.

With a value of 2, the contents of the InnoDB
log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.


If required, flushes the log to disk based on the value of
innodb_flush_log_at_trx_commit. */
    lsn_t   lsn)    /*!< in: lsn up to which logs are to be
            flushed. */
    switch (srv_flush_log_at_trx_commit) {
    case 0:
        /* Do nothing */
    case 1:
        /* Write the log and optionally flush it to disk */
        log_write_up_to(lsn, LOG_WAIT_ONE_GROUP,
                srv_unix_file_flush_method != SRV_UNIX_NOSYNC);
    case 2:
        /* Write the log but do not flush it to disk */
        log_write_up_to(lsn, LOG_WAIT_ONE_GROUP, FALSE);


