Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook

Description:
When executing a replication workload with semisynch replication turned off and periodic
STOP SLAVE IO_THREAD thread, START SLAVE IO_THREAD (in order to simulate network
failures), the slave stopped with the following information in SHOW SLAVE STATUS:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 19306
                Connect_Retry: 1
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 1900343
               Relay_Log_File: slave-relay-bin.000003
                Relay_Log_Pos: 1283333
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1900343
              Relay_Log_Space: 1900793
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1593
                Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

The slave error log contains:

090630 13:58:12 [ERROR] Set 'rpl_semi_sync_master_reply_log_file_pos' on master failed
090630 13:58:12 [ERROR] Slave I/O: Fatal error: Failed to run 'after_queue_event' hook,
Error_code: 1593
090630 13:58:12 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001',
position 1900343

which implies that the master is at fault, however the master error log does not contain
any information about what happened.

Also, the error is reported as "Fatal", but START SLAVE IO_THREAD is able to reestablish
replication and Rpl_semi_sync_master_status converts back to ON, so how "Fatal" the error
was remains an open question.

How to repeat:
With the Random Query Generator from the mysql-test-extra-6.0 tree, mysql-test/gentest,
run:

$ perl runall.pl \
  --basedir=/build/bzr/azalea-bugfixing \
  --grammar=conf/replication_simple.yy \
  --rpl_mode=statement \
  --mem \
  --mysqld=--plugin-dir=/build/bzr/azalea-bugfixing/plugin/semisync/.libs \

--mysqld=--plugin-load=rpl_semi_sync_master=libsemisync_master.so:rpl_semi_sync_slave=libsemisync_slave.so
\
  --mysqld=--rpl_semi_sync_master_enabled=1 \
  --mysqld=--rpl_semi_sync_slave_enabled=1 \
  --duration=1800 \
  --queries=100000000 \
  --threads=10 \
  --validator=ReplicationSlaveStatus \
  --reporter=ReplicationSemiSync

The ReplicationSemiSync reporter causes the slave to stop for random periods of time and
then attempt to reestablish semisynchronous replication. The ReplicationSlaveStatus
monitors SHOW SLAVE STATUS and will terminate the test as soon as the error occurs,
usually within a minute (a few START|STOP SLAVE IO THREAD cycles).

Suggested fix:
Semisynchronous replication should survive IO thread issues without throwing error
messages.

The actual cause of each failure must be present if not in SHOW SLAVE STATUS, then at
least in the error log. If the slave error log talks about a failure on the master, the
error log on the master needs to contain an entry and debug information for that failure.

[30 Jun 2009 13:30] Zhenxing He

The error is on slave, not on master, maybe the error message 'Set
'rpl_semi_sync_master_reply_log_file_pos' on master failed' is misleading, better say
'Send semi-sync reply to master failed'. This error is mostly caused by connection
problem.

Fatal here means that the slave IO thread has to stop because of the error, but it's
possible to restart slave IO thread.

[30 Jun 2009 13:51] Philip Stoev

The test uses only STOP SLAVE IO_THREAD and it does not attempt to disrupt the connection
between the master and the slave in any other way. Can we have extra debugging
information in case this error happens so that this can be debugged further?

[5 Sep 2009 8:23] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82521

2802 He Zhenxing 2009-09-05
      Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
hook
     
      Errors when send reply to master should never cause the IO thread
      to stop, because master can fall back to async replication if it
      does not get reply from slave.
     
      The problem is fixed by deliberately ignore the return value of
      slaveReply.
     @ plugin/semisync/semisync_slave_plugin.cc
        Deliberately ignore the return value of slaveReply so that errors
        while sending slave reply will not cause the IO thread to stop.

[8 Sep 2009 4:41] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82622

2802 He Zhenxing 2009-09-08
      Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
hook
     
      Errors when send reply to master should never cause the IO thread
      to stop, because master can fall back to async replication if it
      does not get reply from slave.
     
      The problem is fixed by deliberately ignore the return value of
      slaveReply.
     @ plugin/semisync/semisync_slave.cc
        Change the error message when send reply failed
     @ plugin/semisync/semisync_slave_plugin.cc
        Deliberately ignore the return value of slaveReply so that errors
        while sending slave reply will not cause the IO thread to stop.

[8 Sep 2009 6:15] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82625

2802 He Zhenxing 2009-09-08
      Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
hook
     
      Errors when send reply to master should never cause the IO thread
      to stop, because master can fall back to async replication if it
      does not get reply from slave.
     
      The problem is fixed by deliberately ignoring the return value of
      slaveReply.
     @ plugin/semisync/semisync_slave.cc
        Change the error message when send reply failed
     @ plugin/semisync/semisync_slave_plugin.cc
        Deliberately ignore the return value of slaveReply so that errors
        while sending slave reply will not cause the IO thread to stop.

[26 Sep 2009 10:32] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84715

2837 He Zhenxing 2009-09-26 [merge]
      Merge BUG#45852 to 6.0-codebase-bugfixing

[30 Sep 2009 3:58] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85120

3627 He Zhenxing 2009-09-30
      Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event'
hook
     
      Errors when send reply to master should never cause the IO thread
      to stop, because master can fall back to async replication if it
      does not get reply from slave.
     
      The problem is fixed by deliberately ignoring the return value of
      slaveReply.
     @ plugin/semisync/semisync_slave_plugin.cc
        Deliberately ignore the return value of slaveReply so that errors
        while sending slave reply will not cause the IO thread to stop.

[9 Oct 2009 8:47] Bugs System

Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version
source revid:[email protected]) (merge vers:
6.0.14-alpha) (pib:12)

[9 Oct 2009 10:43] Jon Stephens

Documented bugfix in the 6.0.14 changelog as follows:

        When using semi-synchronous replication, an error raised on the
        slave while sending a response to the master caused the slave IO
        thread to stop.

Set status to NDI, waiting for push to 5.4.

[12 Oct 2009 12:57] Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86574

3115 He Zhenxing 2009-10-12
      Backport Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run
'after_queue_event' hook
     
      Errors when send reply to master should never cause the IO thread
      to stop, because master can fall back to async replication if it
      does not get reply from slave.
     
      The problem is fixed by deliberately ignoring the return value of
      slaveReply.

[27 Oct 2009 9:49] Bugs System

Pushed into 6.0.14-alpha (revid:[email protected]) (version
source revid:[email protected]) (merge vers:
6.0.14-alpha) (pib:13)

[28 Oct 2009 6:37] Jon Stephens

Already documented for 6.0.14; re-closing.

[12 Nov 2009 8:18] Bugs System

Pushed into 5.5.0-beta (revid:[email protected]) (version
source revid:[email protected]) (merge vers: 5.5.0-beta)
(pib:13)

[12 Nov 2009 14:27] Jon Stephens

Also documented in the 5.5.0 changelog; closed.

[18 Dec 2009 15:41] Paul DuBois

Removed 5.5.0 changelog entry. In 5.5, semisync replication first appears in 5.5.0, so
this bug affects no 5.5.x releases.

你可能感兴趣的:(event)