通过一个案例分析binlog的刷盘过程:大事物提交导致整库事物堵塞的分析.




案例描述:
    一次线上误操作.将一个大表的数据清理到历史表.涉及历史数据有1600w,大约15个G的数据.当时偷懒,就直接insert into history select * from table where xxxx;这条sql执行了大概70分钟.执行过程中没有什么大问题.在提交那一刻,服务器负载飙升,mysql线程飙升到1500多(平时只有300多),大量事务提交失败.检查了错误日志,却没有发现有任何错误.检查binlog发现,这个操作导致生成的那个binlog达到13个G(设置的binlog大小为1G).  这些都是次要的.. 关键是业务那边所有连接这个mysql的业务.在那一刻.集体报错.所有事务都居然报超时.


案例分析:
    过程大概就是这样的,现在来分析下,为什么会这样.
    最开始的推断是,insert 这个sql涉及到1600w行,大约15个G的数据,是一次刷新到磁盘,导致事务全部被堵在外面.具体哪个环节,没弄明白.下面就来证明是哪个环节出问题了;


开启debug模式,来检查在commit这一刻,日志是什么情况:


点击(此处)折叠或打开

  1. ...
  2. ...
  3. ...
  4. T@2: | | | | >trans_commit
  5. T@2: | | | | | >trans_check
  6. T@2: | | | | | <trans_check 88
  7. T@2: | | | | | info: clearing SERVER_STATUS_IN_TRANS
  8. T@2: | | | | | >commit_owned_gtids(...)
  9. T@2: | | | | | <commit_owned_gtids(...) 1622
  10. T@2: | | | | | >ha_commit_trans
  11. T@2: | | | | | | info: all=1 thd->in_sub_stmt=0 ha_info=0x7fffc8002570 is_real_trans=1
  12. T@2: | | | | | | debug: Acquire MDL commit lock
  13. T@2: | | | | | | >check_readonly
  14. T@2: | | | | | | <check_readonly 522
  15. T@2: | | | | | | >MYSQL_BIN_LOG::prepare
  16. T@2: | | | | | | | >ha_prepare_low
  17. T@2: | | | | | | | | >binlog_prepare
  18. T@2: | | | | | | | | <binlog_prepare 1587
  19. T@2: | | | | | | | | >innobase_trx_init
  20. T@2: | | | | | | | | <innobase_trx_init 2765
  21. T@2: | | | | | | | <ha_prepare_low 2358
  22. T@2: | | | | | | <MYSQL_BIN_LOG::prepare 8153
  23. T@2: | | | | | | >MYSQL_BIN_LOG::commit
  24. T@2: | | | | | | | info: query='commit'
  25. T@2: | | | | | | | enter: thd: 0x7fffc8000df0, all: yes, xid: 25, cache_mngr: 0x7fffc8060570
  26. T@2: | | | | | | | debug: in_transaction: yes, no_2pc: no, rw_ha_count: 2
  27. T@2: | | | | | | | debug: trx_cache - pending: 0x0, bytes: 723876
  28. T@2: | | | | | | | debug: all.cannot_safely_rollback(): no, trx_cache_empty: no
  29. T@2: | | | | | | | debug: stmt_cache - pending: 0x0, bytes: 0
  30. T@2: | | | | | | | debug: stmt.cannot_safely_rollback(): no, stmt_cache_empty: yes
  31. T@2: | | | | | | | debug: stmt_cache - pending: 0x0, bytes: 0
  32. T@2: | | | | | | | debug: trx_cache - pending: 0x0, bytes: 723876
  33. T@2: | | | | | | | >binlog_cache_data::finalize
  34. T@2: | | | | | | | | debug: trx_cache - pending: 0x0, bytes: 723876
  35. T@2: | | | | | | | | >binlog_cache_data::write_event
  36. T@2: | | | | | | | | | >Log_event::write_header
  37. T@2: | | | | | | | | | | >Log_event::need_checksum
  38. T@2: | | | | | | | | | | <Log_event::need_checksum 969
  39. T@2: | | | | | | | | | | >Log_event::need_checksum
  40. T@2: | | | | | | | | | | <Log_event::need_checksum 969
  41. T@2: | | | | | | | | | <Log_event::write_header 1118
  42. T@2: | | | | | | | | | >Log_event::need_checksum
  43. T@2: | | | | | | | | | <Log_event::need_checksum 969
  44. T@2: | | | | | | | | | >Log_event::need_checksum
  45. T@2: | | | | | | | | | <Log_event::need_checksum 969
  46. T@2: | | | | | | | | <binlog_cache_data::write_event 1142
  47. T@2: | | | | | | | | debug: flags.finalized: yes
  48. T@2: | | | | | | | <binlog_cache_data::finalize 1351
  49. T@2: | | | | | | | debug: is_empty: 1
  50. T@2: | | | | | | | >Rpl_transaction_ctx::is_transaction_rollback
  51. T@2: | | | | | | | <Rpl_transaction_ctx::is_transaction_rollback 62
  52. T@2: | | | | | | | debug: Acquiring binlog protection lock
  53. T@2: | | | | | | | >Global_backup_lock::acquire_protection
  54. T@2: | | | | | | | | >Global_backup_lock::init_protection_request
  55. T@2: | | | | | | | | <Global_backup_lock::init_protection_request 1384
  56. T@2: | | | | | | | <Global_backup_lock::acquire_protection 1365
  57. T@2: | | | | | | | >MYSQL_BIN_LOG::ordered_commit
  58. T@2: | | | | | | | | enter: flags.pending: yes, commit_error: 0, thread_id: 2
  59. T@2: | | | | | | | | >MYSQL_BIN_LOG::change_stage
  60. T@2: | | | | | | | | | enter: thd: 0x7fffc8000df0, stage: FLUSH, queue: 0x7fffc8000df0
  61. T@2: | | | | | | | | | debug: Enqueue 0x7fffc8000df0 to queue for stage 0
  62. T@2: | | | | | | | | | >Stage_manager::Mutex_queue::append
  63. T@2: | | | | | | | | | | enter: first: 0x7fffc8000df0
  64. T@2: | | | | | | | | | | info: m_first: 0x0, &m_first: 0x2dfe480, m_last: 0x2dfe480
  65. T@2: | | | | | | | | | | info: m_first: 0x7fffc8000df0, &m_first: 0x2dfe480, m_last: 0x2dfe480
  66. T@2: | | | | | | | | | | info: m_first: 0x7fffc8000df0, &m_first: 0x2dfe480, m_last: 0x7fffc80030c0
  67. T@2: | | | | | | | | | | return: empty: yes
  68. T@2: | | | | | | | | | <Stage_manager::Mutex_queue::append 1904
  69. T@2: | | | | | | | | <MYSQL_BIN_LOG::change_stage 8755
  70. T@2: | | | | | | | | >MYSQL_BIN_LOG::process_flush_stage_queue
  71. T@2: | | | | | | | | | debug: Fetching queue for stage 0
  72. T@2: | | | | | | | | | >Stage_manager::Mutex_queue::fetch_and_empty
  73. T@2: | | | | | | | | | | enter: m_first: 0x7fffc8000df0, &m_first: 0x2dfe480, m_last: 0x7fffc80030c0
  74. T@2: | | | | | | | | | | info: m_first: 0x0, &m_first: 0x2dfe480, m_last: 0x2dfe480
  75. T@2: | | | | | | | | | | return: result: 0x7fffc8000df0
  76. T@2: | | | | | | | | | <Stage_manager::Mutex_queue::fetch_and_empty 2012
  77. T@2: | | | | | | | | | >plugin_foreach_with_mask
  78. T@2: | | | | | | | | | | >innobase_flush_logs
  79. T@2: | | | | | | | | | | | ib_log: write 207335827 to 207336069
  80. T@2: | | | | | | | | | | | ib_log: write 207335424 to 8329216: group 0 len 1024 blocks 404953..404954
  81. T@2: | | | | | | | | | | <innobase_flush_logs 4409
  82. T@2: | | | | | | | | | <plugin_foreach_with_mask 2322
  83. T@2: | | | | | | | | | >binlog_cache_data::flush
  84. T@2: | | | | | | | | | | debug: flags.finalized: no
  85. T@2: | | | | | | | | | <binlog_cache_data::flush 1471
  86. T@2: | | | | | | | | | >binlog_cache_data::flush
  87. T@2: | | | | | | | | | | debug: flags.finalized: yes
  88. T@2: | | | | | | | | | | debug: bytes_in_cache: 723903
  89. T@2: | | | | | | | | | | >MYSQL_BIN_LOG::write_gtid
  90. T@2: | | | | | | | | | | | >Rpl_transaction_ctx::get_gno
  91. T@2: | | | | | | | | | | | <Rpl_transaction_ctx::get_gno 80
  92. T@2: | | | | | | | | | | | >Rpl_transaction_ctx::get_sidno
  93. T@2: | | | | | | | | | | | <Rpl_transaction_ctx::get_sidno 74
  94. T@2: | | | | | | | | | | | >Gtid_state::generate_automatic_gtid
  95. T@2: | | | | | | | | | | | | info: 0x2fbaf40.rdlock()
  96. T@2: | | | | | | | | | | | | >Gtid_state::acquire_anonymous_ownership
  97. T@2: | | | | | | | | | | | | | info: anonymous_gtid_count increased to 1
  98. T@2: | | | | | | | | | | | | <Gtid_state::acquire_anonymous_ownership 2476
  99. T@2: | | | | | | | | | | | | >Gtid::to_string
  100. T@2: | | | | | | | | | | | | <Gtid::to_string 195
  101. T@2: | | | | | | | | | | | | info: set owned_gtid (anonymous) in generate_automatic_gtid: -2:0
  102. T@2: | | | | | | | | | | | | info: 0x2fbaf40.unlock()
  103. T@2: | | | | | | | | | | | | >gtid_set_performance_schema_values
  104. T@2: | | | | | | | | | | | | <gtid_set_performance_schema_values 629
  105. T@2: | | | | | | | | | | | <Gtid_state::generate_automatic_gtid 652
  106. T@2: | | | | | | | | | | | >Gtid_log_event::Gtid_log_event(THD *)
  107. T@2: | | | | | | | | | | | | >Gtid_specification::to_string(char*)
  108. T@2: | | | | | | | | | | | | <Gtid_specification::to_string(char*) 83
  109. T@2: | | | | | | | | | | | | info: SET @@SESSION.GTID_NEXT= 'ANONYMOUS'
  110. T@2: | | | | | | | | | | | <Gtid_log_event::Gtid_log_event(THD *) 13079
  111. T@2: | | | | | | | | | | | >Gtid_log_event::write_data_header_to_memory
  112. T@2: | | | | | | | | | | | | info: sid=00000000-0000-0000-0000-000000000000 sidno=0 gno=0
  113. T@2: | | | | | | | | | | | <Gtid_log_event::write_data_header_to_memory 13208
  114. T@2: | | | | | | | | | | | >Binlog_event_writer::write_event_part
  115. T@2: | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  116. T@2: | | | | | | | | | | <MYSQL_BIN_LOG::write_gtid 1233
  117. T@2: | | | | | | | | | | >MYSQL_BIN_LOG::write_cache(THD *, binlog_cache_data *, bool)
  118. T@2: | | | | | | | | | | | >MYSQL_BIN_LOG::do_write_cache
  119. T@2: | | | | | | | | | | | | >reinit_io_cache
  120. T@2: | | | | | | | | | | | | | enter: cache: 0x7fffc8060700 type: 1 seek_offset: 0 clear_cache: 0
  121. T@2: | | | | | | | | | | | | | >my_b_flush_io_cache
  122. T@2: | | | | | | | | | | | | | | enter: cache: 0x7fffc8060700
  123. T@2: | | | | | | | | | | | | | | >my_write
  124. T@2: | | | | | | | | | | | | | | | my: fd: 61 Buffer: 0x7fffc8073f30 Count: 3007 MyFlags: 20
  125. T@2: | | | | | | | | | | | | | | <my_write 115
  126. T@2: | | | | | | | | | | | | | <my_b_flush_io_cache 1583
  127. T@2: | | | | | | | | | | | | <reinit_io_cache 387
  128. T@2: | | | | | | | | | | | | >my_seek
  129. T@2: | | | | | | | | | | | | | my: fd: 61 Pos: 0 Whence: 0 MyFlags: 0
  130. T@2: | | | | | | | | | | | | <my_seek 80
  131. T@2: | | | | | | | | | | | | >my_read
  132. T@2: | | | | | | | | | | | | | my: fd: 61 Buffer: 0x7fffc8073f30 Count: 32768 MyFlags: 16
  133. T@2: | | | | | | | | | | | | <my_read 106
  134. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  135. T@2: | | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  136. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  137. T@2: | | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  138. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  139. T@2: | | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  140. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  141. T@2: | | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  142. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  143. T@2: | | | | | | | | | | | | <Binlog_event_writer::write_event_part 1033
  144. T@2: | | | | | | | | | | | | >Binlog_event_writer::write_event_part
  145. T@2: | | | | | | | | | | | | | >my_b_flush_io_cache





日志太长,就不一一解释,大概其说几个关键点(从上到下的顺序):
第一个:trans_commit   开始事务提交(commit);
debug: Acquire MDL commit lock     在提交前先要取得一个MDL元数据锁;
MYSQL_BIN_LOG::prepare   MDL获得后就开始进入prepare 阶段
>MYSQL_BIN_LOG::commit   开始提交阶段






T@2: | | | | | | | >binlog_cache_data::finalize
T@2: | | | | | | | | debug: trx_cache - pending: 0x0, bytes: 723876
T@2: | | | | | | | | >binlog_cache_data::write_event


从这三行开始就涉及到这个案例的一些东西了.后面再细说.这三行的意思就是.binlog_chache准备刷出内存了,准备往磁盘刷;


T@2: | | | | | | | debug: Acquiring binlog protection lock
T@2: | | | | | | | >Global_backup_lock::acquire_protection
T@2: | | | | | | | | >Global_backup_lock::init_protection_request
T@2: | | | | | | | |  


 


 volatile int32 Global_read_lock::m_active_requests;


/****************************************************************************
  Handling of global read locks


  Global read lock is implemented using metadata lock infrastructure.


  Taking the global read lock is TWO steps (2nd step is optional; without
  it, COMMIT of existing transactions will be allowed):
  lock_global_read_lock() THEN make_global_read_lock_block_commit().


  How blocking of threads by global read lock is achieved: that's
  semi-automatic. We assume that any statement which should be blocked
  by global read lock will either open and acquires write-lock on tables
  or acquires metadata locks on objects it is going to modify. For any
  such statement global IX metadata lock is automatically acquired for
  its duration (in case of LOCK TABLES until end of LOCK TABLES mode).
  And lock_global_read_lock() simply acquires global S metadata lock
  and thus prohibits execution of statements which modify data (unless
  they modify only temporary tables). If deadlock happens it is detected
  by MDL subsystem and resolved in the standard fashion (by backing-off
  metadata locks acquired so far and restarting open tables process
  if possible).


  Why does FLUSH TABLES WITH READ LOCK need to block COMMIT: because it's used
  to read a non-moving SHOW MASTER STATUS, and a COMMIT writes to the binary
  log.


  Why getting the global read lock is two steps and not one. Because FLUSH
  TABLES WITH READ LOCK needs to insert one other step between the two:
  flushing tables. So the order is
  1) lock_global_read_lock() (prevents any new table write locks, i.e. stalls
  all new updates)
  2) close_cached_tables() (the FLUSH TABLES), which will wait for tables
  currently opened and being updated to close (so it's possible that there is
  a moment where all new updates of server are stalled *and* FLUSH TABLES WITH
  READ LOCK is, too).
  3) make_global_read_lock_block_commit().
  If we have merged 1) and 3) into 1), we would have had this deadlock:
  imagine thread 1 and 2, in non-autocommit mode, thread 3, and an InnoDB
  table t.
  thd1: SELECT * FROM t FOR UPDATE;
  thd2: UPDATE t SET a=1; # blocked by row-level locks of thd1
  thd3: FLUSH TABLES WITH READ LOCK; # blocked in close_cached_tables() by the
  table instance of thd2
  thd1: COMMIT; # blocked by thd3.
  thd1 blocks thd2 which blocks thd3 which blocks thd1: deadlock.


  Note that we need to support that one thread does
  FLUSH TABLES WITH READ LOCK; and then COMMIT;
  (that's what innobackup does, for some good reason).
  So in this exceptional case the COMMIT should not be blocked by the FLUSH
  TABLES WITH READ LOCK.


****************************************************************************/


/**
  Take global read lock, wait if there is protection against lock.


  If the global read lock is already taken by this thread, then nothing is done.


  See also "Handling of global read locks" above.


  @param thd     Reference to thread.


  @retval False  Success, global read lock set, commits are NOT blocked.
  @retval True   Failure, thread was killed.
*/






 
volatile int32 Global_read_lock::m_active_requests;


/****************************************************************************
  Handling of global read locks


  Global read lock is implemented using metadata lock infrastructure.


  Taking the global read lock is TWO steps (2nd step is optional; without
  it, COMMIT of existing transactions will be allowed):
  lock_global_read_lock() THEN make_global_read_lock_block_commit().


  How blocking of threads by global read lock is achieved: that's
  semi-automatic. We assume that any statement which should be blocked
  by global read lock will either open and acquires write-lock on tables
  or acquires metadata locks on objects it is going to modify. For any
  such statement global IX metadata lock is automatically acquired for
  its duration (in case of LOCK TABLES until end of LOCK TABLES mode).
  And lock_global_read_lock() simply acquires global S metadata lock
  and thus prohibits execution of statements which modify data (unless
  they modify only temporary tables). If deadlock happens it is detected
  by MDL subsystem and resolved in the standard fashion (by backing-off
  metadata locks acquired so far and restarting open tables process
  if possible).


  Why does FLUSH TABLES WITH READ LOCK need to block COMMIT: because it's used
  to read a non-moving SHOW MASTER STATUS, and a COMMIT writes to the binary
  log.


  Why getting the global read lock is two steps and not one. Because FLUSH
  TABLES WITH READ LOCK needs to insert one other step between the two:
  flushing tables. So the order is
  1) lock_global_read_lock() (prevents any new table write locks, i.e. stalls
  all new updates)
  2) close_cached_tables() (the FLUSH TABLES), which will wait for tables
  currently opened and being updated to close (so it's possible that there is
  a moment where all new updates of server are stalled *and* FLUSH TABLES WITH
  READ LOCK is, too).
  3) make_global_read_lock_block_commit().
  If we have merged 1) and 3) into 1), we would have had this deadlock:
  imagine thread 1 and 2, in non-autocommit mode, thread 3, and an InnoDB
  table t.
  thd1: SELECT * FROM t FOR UPDATE;
  thd2: UPDATE t SET a=1; # blocked by row-level locks of thd1
  thd3: FLUSH TABLES WITH READ LOCK; # blocked in close_cached_tables() by the
  table instance of thd2
  thd1: COMMIT; # blocked by thd3.
  thd1 blocks thd2 which blocks thd3 which blocks thd1: deadlock.


  Note that we need to support that one thread does
  FLUSH TABLES WITH READ LOCK; and then COMMIT;
  (that's what innobackup does, for some good reason).
  So in this exceptional case the COMMIT should not be blocked by the FLUSH
  TABLES WITH READ LOCK.


****************************************************************************/


/**
  Take global read lock, wait if there is protection against lock.


  If the global read lock is already taken by this thread, then nothing is done.


  See also "Handling of global read locks" above.


  @param thd     Reference to thread.


  @retval False  Success, global read lock set, commits are NOT blocked.
  @retval True   Failure, thread was killed.
*/


/**
  Flush and commit the transaction.


  This will execute an ordered flush and commit of all outstanding
  transactions and is the main function for the binary log group
  commit logic. The function performs the ordered commit in two
  phases.


  The first phase flushes the caches to the binary log and under
  LOCK_log and marks all threads that were flushed as not pending.


  The second phase executes under LOCK_commit and commits all
  transactions in order.


  The procedure is:


  1. Queue ourselves for flushing.
  2. Grab the log lock, which might result is blocking if the mutex is
     already held by another thread.
  3. If we were not committed while waiting for the lock
     1. Fetch the queue
     2. For each thread in the queue:
        a. Attach to it
        b. Flush the caches, saving any error code
     3. Flush and sync (depending on the value of sync_binlog).
     4. Signal that the binary log was updated
  4. Release the log lock
  5. Grab the commit lock
     1. For each thread in the queue:
        a. If there were no error when flushing and the transaction shall be committed:
           - Commit the transaction, saving the result of executing the commit.
  6. Release the commit lock
  7. Call purge, if any of the committed thread requested a purge.
  8. Return with the saved error code


  @todo The use of @c skip_commit is a hack that we use since the @c
  TC_LOG Interface does not contain functions to handle
  savepoints. Once the binary log is eliminated as a handlerton and
  the @c TC_LOG interface is extended with savepoint handling, this
  parameter can be removed.


  @param thd Session to commit transaction for
  @param all   This is @c true if this is a real transaction commit, and
               @c false otherwise.
  @param skip_commit
               This is @c true if the call to @c ha_commit_low should
               be skipped (it is handled by the caller somehow) and @c
               false otherwise (the normal case).
 */
int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit)










看到这几段源码的注释.其实后面的我都不想再继续看了..大概过程就是:
在事务commit的时候,先获取一个全局(global)元数据锁(MDL),将buffer里面的事务日志排序刷新到磁盘,这个过程,还是先刷redo,在刷binlog,同时这个过程buffer是锁定的,事务是在这个过程没法写入buffer的,然后在获取一个log commit锁,保证redo和binlog都刷新到磁盘(这个过程,很多人都有写图文描述,不累述了);


看起来很简单.实现起来就蛋疼了.. 首先是加了一个全局的MDL,然后又加了一个log commit锁...(这些锁是源码函数的意思,在mysql层的,具体还有待研究,咳咳.抱歉技术有限);




结合这个案例来看,一下子就清晰了.
因为这个事务太大(binlog都14个G),bin_log_buffer在刷新到磁盘的时候,就比较耗时,这个时候的buffer又不是被锁定的,所以新的事务就没办法写入buffer(日志和数据都是先写buffer,在根据规则设定刷新到磁盘).最后才导致所有事务都失败或者挂起(有重试机制的情况).


最后还有一个知识点没想透.就是事务在写入buffer的过程,是怎么一个机制?肯定是有什么机制或者规则的.现在还摸不透,先留个坑在这吧.后面随着研究的深入,应该能解开这个迷惑.




总结:涉及大数据的事务,最好分开成多个事务来完成,实在没法分的,可以考虑关闭binlog(set sql_log_bin=0).虽然也会写redo和undo,但这个代价比较小,redo和undoe的写入机制跟binlog不一样.有时间在另外细说.



最后感谢身边的大神--八怪!!! 所有源码的研究来自于他的帮助! 非常感谢!




来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/20892230/viewspace-2132645/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/20892230/viewspace-2132645/

你可能感兴趣的:(通过一个案例分析binlog的刷盘过程:大事物提交导致整库事物堵塞的分析.)