Advanced use of Global Transaction Identifiers
Details of Re-execution and Empty Transactions
In my previous post, we saw how GTIDs are generated andpropagated, we described the new replication protocol, and we saw howthese simple elements fit together to allow failover in a range ofexamples, from simple tree topologies to circular topologies to anytopology you could possibly think about.
We will now dive deeper into how GTIDs work. In particular, wewill see how the slave thread ensures that no transaction is executedmore than once. As a side note, mysqlbinlog uses the same mechanism.We will introduce the concept of empty transactions and see how thisallows us to suppress transactions in a safe manner.
The replication thread
Let us start with looking at the replication thread. As weremember from previous post, the master stores GTIDs as events in thebinary log, and the GTID event preceds the transaction. When theslave thread reads the GTID, it sets the session server variablegtid_next to that GTID. For example, if the slave thread reads4d8b564f-03f4-4975-856a-0e65c3105328:4711, then it executes thefollowing SQL statement:
SET GTID_NEXT = 4d8b564f-03f4-4975-856a-0e65c3105328:4711;
This tells the server to use4d8b564f-03f4-4975-856a-0e65c3105328:4711 instead of generating a newidentifier.
Mysqlbinlog
The same statement can be executed by any client, if the clienthas SUPER privileges (the reason for this seemingly very strictrequirement will soon be explained).
The same mechanism is also used by mysqlbinlog. (If you don'tknow: mysqlbinlog is a command line utility, separate from the coreserver, that reads a binary log file and outputs the contents in textform, as a sequence of SQL statements. By piping this output to aclient, you make that client act as the replication thread.) Whenmysqlbinlog reads a GTID-event, it outputs a SET GTID_NEXT statement.Thus, the client that executes the output from mysqlbinlog willcorrectly re-execute not only the transactions but also the GTIDs.
Transactions Must Only Execute Once
Now, what if the transaction specified by GTID_NEXT has alreadybeen executed? We don't want the transaction to be executed more thanonce: first, it would likely take the server to an inconsistentstate. Second, and more fundamentally, we can't have two transactionswith the same GTID in the binary log; that would lead to other errorson the next failover.
Therefore, when the server executes SET GTID_NEXT, it checks ifthat transaction is already executed (i.e., if the GTID is in@@GLOBAL.GTID_DONE):
- If not, then everything is fine and the server executes the transaction.
- But if the GTID is in @@GLOBAL.GTID_DONE, then the transaction is not executed by the server – this second attempt to execute the same transaction is completely ignored by the server and has no effect whatsoever.
Empty Transactions – Making the Slave SkipTransactions
The fact that the server skips already executed transactions ismore than a mechanism to prevent disastrous mistakes. It is also animportant tool that allows us to make filters GTID-safe, let the DBAskip a transaction in a safe manner, or to start replicate from aspecific point in the replication stream in a safe manner. (And by“safe”, I wish to emphasize that an “obvious” way toaccomplish these things, which we have deliberately not implemented,would be highly unsafe, risking such nasty things as data corruptionnot immediately but at the next failover.)
Let us start with an example:
A is the master, B is the slave; A has executed three transactionsand B has replicated all of them. Now suppose we want to attachanother slave, C, to A. Also suppose that we do not want C to executetrx1 or trx2. There may be many reasons to skip the transactions:perhaps C is only supposed to hold a subset of the tables, and trx1and trx2 touch a table that C does not have; or maybe trx1 was amistake and trx2 is an “anti-transaction” that undoes trx1, andboth are really huge so it would be more efficient to skip both; ortrx1 and trx2 are in some other way unnecessary or unwanted on C.
It is tempting to think that we can just skip trx1 and trx2 andmake C replicate starting from trx3, so that we would have thefollowing situation:
But let us look at what would happen on a failover. Suppose Acrashes and we wish to make B the new master and C a slave of B. aswe remember from the previous blog post, we have a new replicationprotocol that makes failover possible: when C connects to B, C sendsthe range of identifiers it has and B sends all other transactionsback to C. That is, C sends id3 and be sends back id1, trx1, id2,trx2, followed by any transactions committed after trx3. See: theskipped transactions come back and bite us on failover!
Not only can this corrupt the database (because the transactionsare re-executed out of order on C). It corrupts the database silently– no error message – and in a failover situation where a serverhas just crashed and the DBA certainly has enough trouble already.Moreover, the problematic transactions may be really old – maybe wedid not have to failover until years after trx1 and trx2 were skipped– so both the context of trx1 and trx2, and the context where theywere skipped may be long forgotten by the DBA, possibly making thesituation even more difficult to debug and correct.
The good news is that we don't allow the type of skipping thatleads to this DBA nightmare. Instead, we provide a robust mechanismto achieve the wanted effect in a safe manner.
Recall that the server skips transactions if GTID_NEXT is set to aGTID that already exists in GTID_DONE. Therefore, to skip atransaction with a given GTID, all we have to do is to
firstexecute a transaction that has no effect – a “no-op” – withthe same GTID. It is as simple as this:
mysql>SET GTID_NEXT = “4d8b564f-03f4-4975-856a-0e65c3105328:4711”;
mysql>COMMIT;
Normally, a single commitstatement would not make a difference in any way. But when GTID_NEXTis set to a GTID, it causes the server to write an
emptytransaction, just a BEGIN/COMMITpair with nothing inbetween, to the binary log:
This makes the transaction being
permanently skipped:it cannot ever come back again and bite us.
Empty Transactions andFailover
Let us revisit the last example and seewhat happens when we “skip” trx1, trx2 using empty transactions.Before we connect C as a slave of A, we commit two empty transactionson C, with GTID id1 and id2, respectively:
We connect C to A as usual using thenew replication protocol: C send GTID_DONE to A and A sendseverything else to C. That is, C sends “id1,id2” to A and getsback id3, trx3, and so on.
So far we have accomplished what wewant: C has skipped trx1 and trx2 (but it has id1 and id2) andstarted to replicate.
Now, what happens on failover? Supposeagain that A crashes, we make B the new master and wish to connect Cas a slave to B. C then sends “id1-id3” to B, and B will sendeverything else to C. trx1 and trx2 do not come back to C in thiscase because we committed empty transactions with the same GTIDs.
This example highlights one importantpoint: GTIDs are a part of the server state. Two servers that havethe exact same data but different sets of GTIDs in their binary logsshould not be considered “the same”. Luckily, the tools weprovide (e.g., empty transactions) ensure that server states do notdiverge in unwanted ways.
Replication Filtersand Empty Transactions
Another scenario where emptytransactions play an important role is when using replicationfilters. Filters were designed to allow a slave to hold only a subsetof the master's database. For instance, if the slave server isstarted with the command line option--replicated-ignore-db=mydatabase, then the slave will check thatdatabase of every binary log event it receives from the master andskip everything that belongs to mydatabase.
Suppose again we have a setup where Ais the master, with two immediate slaves B and C, and C filters outupdates from database mydatabase:
Suppose, moreover, that trx1 and trx2operate on mydatabase, so that C skips them. We have implemented itso that C then commits empty transactions with GTIDs id1 and id2, asin the illustration.
If A crashes as this point, and Bbecomes the new master, and C a slave of B, then the emptytransactions ensure that id1 and id2 are included in C's GTID_DONE.This in turn implies that C sends id1 and id2 to B when it connectsas a slave, so that B does not send trx1 or trx2 to B again.
This is important for performance: thesum of all transactions that ever operated on database mydatabasenaturally occupies no less gigabytes than the entire databasemydatabase itself. If we did not have the empty transactions on C,blocking B from sending this potentially huge amount of data to C,then the failover would risk causing a significant disruption while Cwades through oceans of transactions that it has already skipped.
GTID_NEXT is only settable by SUPER
Remember I said that GTID_NEXT is only settable by users that have SUPER privileges? Now it should be evident why this is the case: by setting GTID_NEXT, you make the replication thread suppress arbitrary transactions from the master. This is true no matter which user committed the transaction on the master. Therefore, it would not be safe to allow non-SUPER users to set GTID_NEXT.
Summary
We have seen that the slavethread executes SET GTID_NEXT to specify the GTID of the nexttransaction to come. The mysqlbinlog utility does the same, and a DBAcan do the same if that is needed.
A transaction that has the same GTID(specified by GTID_NEXT) as an already committed transaction, isskipped.
To suppress a transaction on a slavethat has not yet replicated it, commit an empty transaction with theGTID of the transaction to skip.
From: svenmysql.blogspot.jp/2012/10/advanced-use-of-global-transaction.html