Binlog 复制架构
在Master开启binlog后,写操作会记录到binlog中,Slave通过发送dump命令同步binlog。
Binlog 文件结构
binlog二进制日志文件中记录着每个写操作事件,下面以MySQL 5.0.0+版本的日志进行介绍,对应binlog版本4,该协议下增加了FORMAT_DESCRIPTION_EVENT事件。每个binlog日志文件固定4字节开头:[ fe 'bin' ];
第一个事件是FORMAT_DESCRIPTION_EVENT,描述了其他事件是如何布局,Slave在解析对应事件时使用。
最后一个事件是ROTATE_EVENT,记录下一个binlog文件的信息。
事件类型
基于行复制的事件包括:
- TABLE_MAP_EVENT
- ROWS_EVENT
- DELETE_ROWS_EVENTv2
- UPDATE_ROWS_EVENTv2
- WRITE_ROWS_EVENTv2
每个插入、更新和删除操作都会前缀一个TABLE_MAP_EVENT事件用于描述操作对应的表信息,2个连续事件通过table_id进行关联。
table_id和表名并不是一一对应,table_id的作用只是在基于行复制的协议中用于关联TABLE_MAP_EVENT和ROWS_EVENT
握手协议
MySQL提供了基于日志文件名-位置和GTID2种复制binlog的方式,分别对应COM_BINLOG_DUMP和COM_BINLOG_DUMP_GTID事件。
在发送DUMP事件之前,Master需要对Slave进行权限认证。
- Slave连接到Master时,Master会发送handshark包对Slave进行认证;
- Slave收到handshark包后,会将用户名和密码作为认证信息发送ahthentication包;
- Master验证用户名和密码,如果认证通过,则发送OK_Packet,否则发送ERR_Packet。
Handshake包构造如下:
Packet format:
Bytes Content
----- ----
1 protocol version (always 10)
n server version string, \0-terminated
4 thread id
8 first 8 bytes of the plugin provided data (scramble)
1 \0 byte, terminating the first part of a scramble
2 server capabilities (two lower bytes)
1 server character set
2 server status
2 server capabilities (two upper bytes)
1 length of the scramble
10 reserved, always 0
n rest of the plugin provided data (at least 12 bytes)
1 \0 byte, terminating the second part of a scramble
static bool send_server_handshake_packet(MPVIO_EXT *mpvio,
const char *data, uint data_len)
{
Protocol_classic *protocol= mpvio->protocol;
char *buff= (char *) my_alloca(1 + SERVER_VERSION_LENGTH + data_len + 64);
char scramble_buf[SCRAMBLE_LENGTH];
char *end= buff;
DBUG_ENTER("send_server_handshake_packet");
*end++= protocol_version;
protocol->set_client_capabilities(CLIENT_BASIC_FLAGS);
if (data_len)
{
mpvio->cached_server_packet.pkt= (char*) memdup_root(mpvio->mem_root,
data, data_len);
mpvio->cached_server_packet.pkt_len= data_len;
}
if (data_len < SCRAMBLE_LENGTH)
{
if (data_len)
{
/*
the first packet *must* have at least 20 bytes of a scramble.
if a plugin provided less, we pad it to 20 with zeros
*/
memcpy(scramble_buf, data, data_len);
memset(scramble_buf + data_len, 0, SCRAMBLE_LENGTH - data_len);
data= scramble_buf;
}
else
{
generate_user_salt(mpvio->scramble, SCRAMBLE_LENGTH + 1);
data= mpvio->scramble;
}
data_len= SCRAMBLE_LENGTH;
}
end= my_stpnmov(end, server_version, SERVER_VERSION_LENGTH) + 1;
DBUG_ASSERT(sizeof(my_thread_id) == 4);
int4store((uchar*) end, mpvio->thread_id);
end+= 4;
/* write server characteristics: up to 16 bytes allowed */
end[2]= (char) default_charset_info->number;
int2store(end + 3, mpvio->server_status[0]);
int2store(end + 5, protocol->get_client_capabilities() >> 16);
end[7]= data_len;
DBUG_EXECUTE_IF("poison_srv_handshake_scramble_len", end[7]= -100;);
memset(end + 8, 0, 10);
end+= 18;
/* write scramble tail */
end= (char*) memcpy(end, data + AUTH_PLUGIN_DATA_PART_1_LENGTH,
data_len - AUTH_PLUGIN_DATA_PART_1_LENGTH);
end+= data_len - AUTH_PLUGIN_DATA_PART_1_LENGTH;
end= strmake(end, plugin_name(mpvio->plugin)->str,
plugin_name(mpvio->plugin)->length);
int res= protocol->write((uchar*) buff, (size_t) (end - buff + 1)) ||
protocol->flush_net();
}
收到authentication包后,Master会解析出用户名和密码进行验证。
static size_t parse_client_handshake_packet(MPVIO_EXT *mpvio,
uchar **buff, size_t pkt_len)
{
size_t user_len;
char *user= get_string(&end, &bytes_remaining_in_packet, &user_len);
size_t passwd_len= 0;
char *passwd= NULL;
passwd= get_length_encoded_string(&end, &bytes_remaining_in_packet,
&passwd_len);
if (passwd_len)
mpvio->auth_info.password_used= PASSWORD_USED_YES;
}
Dump命令解析
对于COM_BINLOG_DUMP命令,需要在之前发送COM_REGISTER_SLAVE进行注册。
对于COM_BINLOG_DUMP_GTID命令,会根据该命令中gtidset字段从而定位起始发送日志位置。
Master收到命令后,会根据命令中flags字段是否设置BINLOG_DUMP_NON_BLOCK进行区分处理,未设置BINLOG_DUMP_NON_BLOCK的请求,会在binlog发送完成后,返回EOF_Packet,否则会一致阻塞等待下一个事件。
bool com_binlog_dump_gtid(THD *thd, char *packet, size_t packet_length)
{
const uchar* packet_position= (uchar *) packet;
size_t packet_bytes_todo= packet_length;
Sid_map sid_map(NULL/*no sid_lock because this is a completely local object*/);
Gtid_set slave_gtid_executed(&sid_map);
thd->status_var.com_other++;
thd->enable_slow_log= opt_log_slow_admin_statements;
if (check_global_access(thd, REPL_SLAVE_ACL))
DBUG_RETURN(false);
//解析COM_BINLOG_DUMP_GTID https://dev.mysql.com/doc/internals/en/com-binlog-dump-gtid.html
READ_INT(flags,2);
READ_INT(thd->server_id, 4);
READ_INT(name_size, 4);
READ_STRING(name, name_size, sizeof(name));
READ_INT(pos, 8);
DBUG_PRINT("info", ("pos=%llu flags=%d server_id=%d", pos, flags, thd->server_id));
READ_INT(data_size, 4);
CHECK_PACKET_SIZE(data_size);
if (slave_gtid_executed.add_gtid_encoding(packet_position, data_size) != //将包中内容解析到slave_gtid_executed中interval
RETURN_STATUS_OK)
DBUG_RETURN(true);
slave_gtid_executed.to_string(>id_string); //解析为gtid_string
//T@2: | | | info: Slave 1828716545 requested to read at position 4 gtid set '075ca916-e025-11e9-bde7-bd71fea5404f:1'.
DBUG_PRINT("info", ("Slave %d requested to read %s at position %llu gtid set "
"'%s'.", thd->server_id, name, pos, gtid_string));
kill_zombie_dump_threads(thd);
query_logger.general_log_print(thd, thd->get_command(),
"Log: '%s' Pos: %llu GTIDs: '%s'",
name, pos, gtid_string);
my_free(gtid_string);
mysql_binlog_send(thd, name, (my_off_t) pos, &slave_gtid_executed, flags);
unregister_slave(thd, true, true/*need_lock_slave_list=true*/);
/* fake COM_QUIT -- if we get here, the thread needs to terminate */
DBUG_RETURN(true);
}
DUMP_GTID命令中slave_gtid_executed表示Slave已经执行过的事件集合,mysql_binlog_send函数中会根据该集合确定发送binlog的起点。
日志发送
发送日志逻辑在单独的线程Binlog_sender中进行,逻辑如下:
- 校验slave_gtid_executed是否合法,定位第一个发送文件名;
- 发送伪造的rotate_event事件,打开第一个发送文件名;
- 依次发送每个文件。
void run()
{
init();
while (!has_error() && !m_thd->killed)
{
if (unlikely(fake_rotate_event(log_file, start_pos)))
break;
file= open_binlog_file(&log_cache, log_file, &m_errmsg); //根据文件名打开文件
if (send_binlog(&log_cache, start_pos)) //发送一个文件,返回0表示读完了,即log_pos == end_pos,然后开始下一个文件
break;
/* Will go to next file, need to copy log file name */
set_last_file(log_file);
int error= mysql_bin_log.find_next_log(&m_linfo, 0); //定位下一个文件
}