

主键冲突, Slave SQL: Error 'Duplicate entry '5218036-4f8c3555-1e5f-4c8e-bed7-9c78d6ab8725' for key 'PRIMARY'' on query. Default database: .....



master插入一条记录,写入binlogslave IO进程从主库获取这条记录的信息并写入relay-log,接着slaveSQL进程解析到这条记录,并写入到slave,完成一条记录的复制。复制过程中会涉及到文件有:


relay-log.info  存放slave SQL线程执行到了master的哪个文件和pos以及relay-log文件和pos位置,这个文件由SQL线程负责更新

master.info  存放slave 获取到了master的二进制日志文件和pos位置,复制有关的信息,master-host,port,user,passwd等等,这个文件由IO线程负责更新


# sync_relay_log_info

If the value of this variable is greater than 0,

a replication slave synchronizes its relay-log.info file to disk (using fdatasync()) after every sync_relay_log_info transactions.

A value of 1 is the generally the best choice.

The default value of sync_relay_log_info is 0,

which does not force any synchronization to disk by the MySQL server—in this case,

the server relies on the operating system to flush the relay-log.info file's contents from time to time as for any other file.

# sync_master_info

If the value of this variable is greater than 0,

a replication slave synchronizes its master.info file to disk (using fdatasync()) after every sync_master_info events.

The default value of sync_relay_log_info is 0 (recommended in most situations),

which does not force any synchronization to disk by the MySQL server;

in this case, the server relies on the operating system to flush the master.info file's contents from time to time as for any other file.

# sync_relay_log

If the value of this variable is greater than 0,

the MySQL server synchronizes its relay log to disk (using fdatasync()) after every sync_relay_log writes to the relay log.

There is one write to the relay log per statement if autocommit is enabled, and one write per transaction otherwise.

The default value of sync_relay_log is 0, which does no synchronizing to disk—in this case,

the server relies on the operating system to flush the relay log's contents from time to time as for any other file.

A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the relay log.

However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).


默认都是0,都是依赖操作系统的刷新来更新。查看了出故障这台slave,这几个参数都是默认的。既然依赖操作系统的刷新来更新,这就意味着有可能丢失数据,在异常掉电的情况下,操作系统cache来不及刷新到磁盘,就会大导致slave复制信息没有及时刷新到relay-log.infomaster.info而丢失,也就是此时的relay-log.infomaster.info是落后于复制线程的,复制信息和disk存放的信息不一致。这也是5.5版本复制称为not crash safe的原因,5.6版本可以选择采用表来存放复制信息,称为 crash safe,可以找时间验证一下。



通过上述几点理解,那就很容易理解为何实例启动后会报主键冲突了,master.inforelay-log.info 在掉电后,复制信息没有及时刷新到磁盘,落后于slave复制。实例启动后,自动启动复制,mysql根据这两个落后的文件来启动的复制,实际上就是将一些relay-log重放了,从而导致了主键冲突。既然是重放,都是主键操作,选择跳过异常。

而要避免这个问题的话,可以将上述几个参数设置为1,每处理一个slave event后就刷新relay.infomaster.info。这样最多丢失一个event影响的记录。


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22418990/viewspace-1309000/,如需转载,请注明出处,否则将追究法律责任。

