MySQL源码-binlog复制协议

Binlog 复制架构

在Master开启binlog后,写操作会记录到binlog中,Slave通过发送dump命令同步binlog。

Slave上会有2个处理binlog的线程,拉取binlog的IO Thread,会将binlog保存在本地relay log中,同时SQL Thread读取relay log中的binlog,应用到Slave,从而完成写操作复制。
slaverep.png

Binlog 文件结构

binlog二进制日志文件中记录着每个写操作事件,下面以MySQL 5.0.0+版本的日志进行介绍,对应binlog版本4,该协议下增加了FORMAT_DESCRIPTION_EVENT事件。
binlogfile.png

每个binlog日志文件固定4字节开头:[ fe 'bin' ];
第一个事件是FORMAT_DESCRIPTION_EVENT,描述了其他事件是如何布局,Slave在解析对应事件时使用。
最后一个事件是ROTATE_EVENT,记录下一个binlog文件的信息。

事件类型

基于行复制的事件包括:

  1. TABLE_MAP_EVENT
  2. ROWS_EVENT
    • DELETE_ROWS_EVENTv2
    • UPDATE_ROWS_EVENTv2
    • WRITE_ROWS_EVENTv2

每个插入、更新和删除操作都会前缀一个TABLE_MAP_EVENT事件用于描述操作对应的表信息,2个连续事件通过table_id进行关联。

table_id和表名并不是一一对应,table_id的作用只是在基于行复制的协议中用于关联TABLE_MAP_EVENT和ROWS_EVENT

握手协议

MySQL提供了基于日志文件名-位置和GTID2种复制binlog的方式,分别对应COM_BINLOG_DUMP和COM_BINLOG_DUMP_GTID事件。


handshake.png

在发送DUMP事件之前,Master需要对Slave进行权限认证。

  1. Slave连接到Master时,Master会发送handshark包对Slave进行认证;
  2. Slave收到handshark包后,会将用户名和密码作为认证信息发送ahthentication包;
  3. Master验证用户名和密码,如果认证通过,则发送OK_Packet,否则发送ERR_Packet。

Handshake包构造如下:

  Packet format:

    Bytes       Content
    -----       ----
    1           protocol version (always 10)
    n           server version string, \0-terminated
    4           thread id
    8           first 8 bytes of the plugin provided data (scramble)
    1           \0 byte, terminating the first part of a scramble
    2           server capabilities (two lower bytes)
    1           server character set
    2           server status
    2           server capabilities (two upper bytes)
    1           length of the scramble
    10          reserved, always 0
    n           rest of the plugin provided data (at least 12 bytes)
    1           \0 byte, terminating the second part of a scramble

static bool send_server_handshake_packet(MPVIO_EXT *mpvio,
                                         const char *data, uint data_len)
{
  Protocol_classic *protocol= mpvio->protocol;

  char *buff= (char *) my_alloca(1 + SERVER_VERSION_LENGTH + data_len + 64);
  char scramble_buf[SCRAMBLE_LENGTH];
  char *end= buff;

  DBUG_ENTER("send_server_handshake_packet");
  *end++= protocol_version;

  protocol->set_client_capabilities(CLIENT_BASIC_FLAGS);

  if (data_len)
  {
    mpvio->cached_server_packet.pkt= (char*) memdup_root(mpvio->mem_root, 
                                                         data, data_len);
    mpvio->cached_server_packet.pkt_len= data_len;
  }

  if (data_len < SCRAMBLE_LENGTH)
  {
    if (data_len)
    {
      /*
        the first packet *must* have at least 20 bytes of a scramble.
        if a plugin provided less, we pad it to 20 with zeros
      */
      memcpy(scramble_buf, data, data_len);
      memset(scramble_buf + data_len, 0, SCRAMBLE_LENGTH - data_len);
      data= scramble_buf;
    }
    else
    {
      generate_user_salt(mpvio->scramble, SCRAMBLE_LENGTH + 1);
      data= mpvio->scramble;
    }
    data_len= SCRAMBLE_LENGTH;
  }

  end= my_stpnmov(end, server_version, SERVER_VERSION_LENGTH) + 1;

  DBUG_ASSERT(sizeof(my_thread_id) == 4);
  int4store((uchar*) end, mpvio->thread_id);
  end+= 4;

  /* write server characteristics: up to 16 bytes allowed */
  end[2]= (char) default_charset_info->number;
  int2store(end + 3, mpvio->server_status[0]);
  int2store(end + 5, protocol->get_client_capabilities() >> 16);
  end[7]= data_len;
  DBUG_EXECUTE_IF("poison_srv_handshake_scramble_len", end[7]= -100;);
  memset(end + 8, 0, 10);
  end+= 18;
  /* write scramble tail */
  end= (char*) memcpy(end, data + AUTH_PLUGIN_DATA_PART_1_LENGTH,
                      data_len - AUTH_PLUGIN_DATA_PART_1_LENGTH);
  end+= data_len - AUTH_PLUGIN_DATA_PART_1_LENGTH;
  end= strmake(end, plugin_name(mpvio->plugin)->str,
                    plugin_name(mpvio->plugin)->length);

  int res= protocol->write((uchar*) buff, (size_t) (end - buff + 1)) ||
           protocol->flush_net();
}

收到authentication包后,Master会解析出用户名和密码进行验证。

static size_t parse_client_handshake_packet(MPVIO_EXT *mpvio,
                                            uchar **buff, size_t pkt_len)
{
  size_t user_len;
  char *user= get_string(&end, &bytes_remaining_in_packet, &user_len);

  size_t passwd_len= 0;
  char *passwd= NULL;

  passwd= get_length_encoded_string(&end, &bytes_remaining_in_packet,
                                    &passwd_len);
  if (passwd_len)
    mpvio->auth_info.password_used= PASSWORD_USED_YES;
}

Dump命令解析

对于COM_BINLOG_DUMP命令,需要在之前发送COM_REGISTER_SLAVE进行注册。
对于COM_BINLOG_DUMP_GTID命令,会根据该命令中gtidset字段从而定位起始发送日志位置。
Master收到命令后,会根据命令中flags字段是否设置BINLOG_DUMP_NON_BLOCK进行区分处理,未设置BINLOG_DUMP_NON_BLOCK的请求,会在binlog发送完成后,返回EOF_Packet,否则会一致阻塞等待下一个事件。

bool com_binlog_dump_gtid(THD *thd, char *packet, size_t packet_length)
{
  const uchar* packet_position= (uchar *) packet;
  size_t packet_bytes_todo= packet_length;
  Sid_map sid_map(NULL/*no sid_lock because this is a completely local object*/);
  Gtid_set slave_gtid_executed(&sid_map);

  thd->status_var.com_other++;
  thd->enable_slow_log= opt_log_slow_admin_statements;
  if (check_global_access(thd, REPL_SLAVE_ACL))
    DBUG_RETURN(false);

  //解析COM_BINLOG_DUMP_GTID https://dev.mysql.com/doc/internals/en/com-binlog-dump-gtid.html
  READ_INT(flags,2);
  READ_INT(thd->server_id, 4);
  READ_INT(name_size, 4);
  READ_STRING(name, name_size, sizeof(name));
  READ_INT(pos, 8);
  DBUG_PRINT("info", ("pos=%llu flags=%d server_id=%d", pos, flags, thd->server_id));
  READ_INT(data_size, 4);
  CHECK_PACKET_SIZE(data_size);
  if (slave_gtid_executed.add_gtid_encoding(packet_position, data_size) !=  //将包中内容解析到slave_gtid_executed中interval
      RETURN_STATUS_OK)
    DBUG_RETURN(true);
  slave_gtid_executed.to_string(>id_string);  //解析为gtid_string
  //T@2: | | | info: Slave 1828716545 requested to read  at position 4 gtid set '075ca916-e025-11e9-bde7-bd71fea5404f:1'.
  DBUG_PRINT("info", ("Slave %d requested to read %s at position %llu gtid set "
                      "'%s'.", thd->server_id, name, pos, gtid_string));

  kill_zombie_dump_threads(thd);
  query_logger.general_log_print(thd, thd->get_command(),
                                 "Log: '%s' Pos: %llu GTIDs: '%s'",
                                 name, pos, gtid_string);
  my_free(gtid_string);
  mysql_binlog_send(thd, name, (my_off_t) pos, &slave_gtid_executed, flags);

  unregister_slave(thd, true, true/*need_lock_slave_list=true*/);
  /*  fake COM_QUIT -- if we get here, the thread needs to terminate */
  DBUG_RETURN(true);
}

DUMP_GTID命令中slave_gtid_executed表示Slave已经执行过的事件集合,mysql_binlog_send函数中会根据该集合确定发送binlog的起点。

日志发送

发送日志逻辑在单独的线程Binlog_sender中进行,逻辑如下:

  1. 校验slave_gtid_executed是否合法,定位第一个发送文件名;
  2. 发送伪造的rotate_event事件,打开第一个发送文件名;
  3. 依次发送每个文件。
void run()
{
  init();
  while (!has_error() && !m_thd->killed)
  {
    if (unlikely(fake_rotate_event(log_file, start_pos)))
      break;

    file= open_binlog_file(&log_cache, log_file, &m_errmsg);  //根据文件名打开文件
    if (send_binlog(&log_cache, start_pos))  //发送一个文件,返回0表示读完了,即log_pos == end_pos,然后开始下一个文件
      break;

    /* Will go to next file, need to copy log file name */
    set_last_file(log_file);

    int error= mysql_bin_log.find_next_log(&m_linfo, 0);  //定位下一个文件
}

你可能感兴趣的:(MySQL源码-binlog复制协议)