When executing a replication workload with semisynch replication turned off and periodic
STOP SLAVE IO_THREAD thread, START SLAVE IO_THREAD (in order to simulate network
failures), the slave stopped with the following information in SHOW SLAVE STATUS:
mysql> show slave status\G
*************************** 1. row ***************************
Master_User: root
Master_Port: 19306
Connect_Retry: 1
Master_Log_File: master-bin.000001
Read_Master_Log_Pos: 1900343
Relay_Log_File: slave-relay-bin.000003
Relay_Log_Pos: 1283333
Relay_Master_Log_File: master-bin.000001
Slave_IO_Running: No
Slave_SQL_Running: Yes
Last_Errno: 0
Skip_Counter: 0
Exec_Master_Log_Pos: 1900343
Relay_Log_Space: 1900793
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook
Last_SQL_Errno: 0
Master_Server_Id: 1
1 row in set (0.00 sec)
The slave error log contains:
090630 13:58:12 [ERROR] Set 'rpl_semi_sync_master_reply_log_file_pos' on master failed
090630 13:58:12 [ERROR] Slave I/O: Fatal error: Failed to run 'after_queue_event' hook,
Error_code: 1593
090630 13:58:12 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001',
position 1900343
which implies that the master is at fault, however the master error log does not contain
any information about what happened.
Also, the error is reported as "Fatal", but START SLAVE IO_THREAD is able to reestablish
replication and Rpl_semi_sync_master_status converts back to ON, so how "Fatal" the error
was remains an open question.
How to repeat:
With the Random Query Generator from the mysql-test-extra-6.0 tree, mysql-test/gentest,
$ perl \
--basedir=/build/bzr/azalea-bugfixing \
--grammar=conf/replication_simple.yy \
--rpl_mode=statement \
--mem \
--mysqld=--plugin-dir=/build/bzr/azalea-bugfixing/plugin/semisync/.libs \
--mysqld=--rpl_semi_sync_master_enabled=1 \
--mysqld=--rpl_semi_sync_slave_enabled=1 \
--duration=1800 \
--queries=100000000 \
--threads=10 \
--validator=ReplicationSlaveStatus \
The ReplicationSemiSync reporter causes the slave to stop for random periods of time and
then attempt to reestablish semisynchronous replication. The ReplicationSlaveStatus
monitors SHOW SLAVE STATUS and will terminate the test as soon as the error occurs,
usually within a minute (a few START|STOP SLAVE IO THREAD cycles).
Suggested fix:
Semisynchronous replication should survive IO thread issues without throwing error
The actual cause of each failure must be present if not in SHOW SLAVE STATUS, then at
least in the error log. If the slave error log talks about a failure on the master, the
error log on the master needs to contain an entry and debug information for that failure.
[30 Jun 2009 13:30] Zhenxing He
The error is on slave, not on master, maybe the error message 'Set
'rpl_semi_sync_master_reply_log_file_pos' on master failed' is misleading, better say
'Send semi-sync reply to master failed'. This error is mostly caused by connection
Fatal here means that the slave IO thread has to stop because of the error, but it's
possible to restart slave IO thread.
[30 Jun 2009 13:51] Philip Stoev
The test uses only STOP SLAVE IO_THREAD and it does not attempt to disrupt the connection
between the master and the slave in any other way. Can we have extra debugging
information in case this error happens so that this can be debugged further?
[5 Sep 2009 8:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
2802 He Zhenxing 2009-09-05
Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignore the return value of
@ plugin/semisync/
Deliberately ignore the return value of slaveReply so that errors
while sending slave reply will not cause the IO thread to stop.
[8 Sep 2009 4:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
2802 He Zhenxing 2009-09-08
Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignore the return value of
@ plugin/semisync/
Change the error message when send reply failed
@ plugin/semisync/
Deliberately ignore the return value of slaveReply so that errors
while sending slave reply will not cause the IO thread to stop.
[8 Sep 2009 6:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
2802 He Zhenxing 2009-09-08
Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignoring the return value of
@ plugin/semisync/
Change the error message when send reply failed
@ plugin/semisync/
Deliberately ignore the return value of slaveReply so that errors
while sending slave reply will not cause the IO thread to stop.
[26 Sep 2009 10:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
2837 He Zhenxing 2009-09-26 [merge]
Merge BUG#45852 to 6.0-codebase-bugfixing
[30 Sep 2009 3:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
3627 He Zhenxing 2009-09-30
Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignoring the return value of
@ plugin/semisync/
Deliberately ignore the return value of slaveReply so that errors
while sending slave reply will not cause the IO thread to stop.
[9 Oct 2009 8:47] Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version
source revid:
[email protected]) (merge vers:
6.0.14-alpha) (pib:12)
[9 Oct 2009 10:43] Jon Stephens
Documented bugfix in the 6.0.14 changelog as follows:
When using semi-synchronous replication, an error raised on the
slave while sending a response to the master caused the slave IO
thread to stop.
Set status to NDI, waiting for push to 5.4.
[12 Oct 2009 12:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
3115 He Zhenxing 2009-10-12
Backport Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run
'after_queue_event' hook
Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.
The problem is fixed by deliberately ignoring the return value of
[27 Oct 2009 9:49] Bugs System
Pushed into 6.0.14-alpha (revid:
[email protected]) (version
source revid:
[email protected]) (merge vers:
6.0.14-alpha) (pib:13)
[28 Oct 2009 6:37] Jon Stephens
Already documented for 6.0.14; re-closing.
[12 Nov 2009 8:18] Bugs System
Pushed into 5.5.0-beta (revid:
[email protected]) (version
source revid:
[email protected]) (merge vers: 5.5.0-beta)
[12 Nov 2009 14:27] Jon Stephens
Also documented in the 5.5.0 changelog; closed.
[18 Dec 2009 15:41] Paul DuBois
Removed 5.5.0 changelog entry. In 5.5, semisync replication first appears in 5.5.0, so
this bug affects no 5.5.x releases.