MySQL MVCC官网翻译与讲解

第一节:多版本
InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle).InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
译:InnoDB是一个多版本的存储引擎:它保留关于旧版本更改行的信息,以支持并发和回滚等事务特性。这个(行更改)信息存储在表空间中一个称为“回滚段”的数据结构中(在Oracle中类似的数据结构之后)。InnoDB使用回滚段中的信息来执行回滚事务所需的undo操作。它还使用这些信息来构建行的早期版本,以实现一致的读取。

关键字:回滚段、undo log、一致性读(行早期版本)

下面就对回滚段,回滚机制做出解释:
Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.

在InnoDB内部实现中,InnoDB为存储在数据库中的每一行添加三个字段。
(1)DB_TRX_ID:6字节,表示插入或更新该行的最后一个事务的事务标识符(事务ID)。
(2)DB_ROLL_PTR:7字节,回滚指针,指向写入回滚段的撤销日志记录。如果该行已更新,则撤消日志记录将包含在该行更新之前重建该行内容所需的信息。
(3)DB_ROW_ID:6字节,行隐藏ID,随着新行插入而单调增加。对于没有主键和唯一键的表,InnoDB会自动生成一个聚集索引,则聚簇索引以隐藏行ID为主键建立。(若表有主键或唯一键,那么表自然不会使用DB_ROW_ID去建聚簇索引,即不会使用)。

Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
回滚段中的Undo日志分为两类,即insert undo log和update undo log。insert undo log只在事务回滚中需要,并且可以在事务提交后立即删除。update undo日志也用于读取一致性,但只有当InnoDB已经分配了为保证数据一致性而建立的早期版本数据行的快照,没有事务使用后,update undo log才会被删除。

有个很大的问题:insert undo log和update undo log到底长啥样?为啥insert在回滚时用到?update在数据一致性用到?
insert应该是只在delete回滚时需要啊? -- 这个问题需要研究透!!!!

一个解释:
undo log是一种日志,日志中记录对于数据库的反向操作。
如果把数据库的内容当做一种状态机,那么数据的写操作就是修改状态机的命令,而undo 就对应修改状态机的反向命令。
所以理论上每一个对于状态机修改的命令都会产生对应一条相当的undo log,以便事务回滚的时候,能够把状态机修改到事务原来的样子。
假如我们有一个事务:
create table table1(c1 int);
begin transaction;
insert into table1 (100); //数据库执行这一条命令的时候应该产生一条undo日志,能够把这语句对于状态机的修改回复到原来没有修改的状态 , undo 应该是 delete table1 where c1 = 100;
insert into table1(200); // undo 应该是 delete table1 where c1 = 200;
update table1 set c1 = 300 where c1 = 200; // undo: update table1 set c1 = 200 where c1 = 300

Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
定期提交事务,包括那些只发出一致读取的事务。否则,InnoDB无法从update undo log中删除数据,并且回滚段可能会变得太大,填满表空间。

The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
回滚段中的undo log记录物理空间大小通常小于相应的插入或更新行大小。您可以使用这些信息来计算回滚段所需的空间。

In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.

在InnoDB多版本方案中,使用SQL语句删除一行时,行数据不会立即从数据库中物理删除。InnoDB只有在删除标记为“已删除”的行记录和索引记录的undo log时,才会将行记录数据物理删除。这个删除操作称为“purge”,它非常快,执行顺序与SQL语句指定删除(标记为“deletion”)的时间顺序一致。

这一步初步理解是:直接的sql语句没有在数据库层面删除数据,只是标记为“删除”,只有undo log中也删除后才是真正的删除。

If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable.

如果表持续以相同的速度进行小批量插入和删除行,那么purge线程可能会开始滞后,而由于所有“dead”行,表可能会变得越来越大,使所有内容都绑定在磁盘上,导致数据库非常慢。在这种情况下,可以限制新的数据行操作,并通过调优innodb_max_purge_lag系统变量为清除线程分配更多的资源。

这里有个知识点,即ibdata1的解析:查看undo log占用的比例(innodb_ruby工具)

第二节(多版本和辅助索引):Multi-Versioning and Secondary Indexes
很多

问:为什么多版本控制还有change buffer都和辅助索引有关呢?这个辅助索引到底有什么特殊之处? 这里翻译成辅助索引更合适!!!

InnoDB multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.

InnoDB多版本并发控制(MVCC)对二级索引和聚簇索引的处理是不同的。聚集索引中的记录就地更新,它们隐藏的系统列指向undo log项,可以从撤消日志项重新构建早期版本的记录。与聚集索引记录不同,二级索引记录不包含隐藏的系统列,也不就地更新。

上面的“in-place”就是指能够找到索引即可回滚数据,聚簇(聚集、主键)索引可以,因为撒个隐藏列直接指向undo log的回滚段,可以直接获取数据,而非聚簇的二级索引只存储主键值,需要通过主键值回表操作。

When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.

当更新辅助索引列时,旧的辅助索引记录将被标记为“删除”,插入新记录,并最终清除标记为“删除”的记录。当一个辅助索引记录被标记为删除或者辅助索引页面被一个新的事务更新时,InnoDB会在聚集索引中查找该数据库记录。在聚集索引中,检查行记录的DB_TRX_ID隐藏列,如果在读取事务启动后修改了记录,则从undo日志中检索记录的正确版本。

If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB looks up the record in the clustered index.

如果辅助索引记录被标记为删除,或者辅助索引页被更新的事务更新,则覆盖索引就会失效。InnoDB不是从索引结构中返回值,而是在聚集索引中查找记录(就是覆盖索引失效,只能通过回表查找聚簇索引)。

However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB looks up the record in the clustered index.

但是,如果启用了索引条件下推(ICP)优化,并且部分WHERE条件仅可以使用索引中的字段进行评估,MySQL服务器仍然会将WHERE条件的这部分向下推到存储引擎,在那里使用索引进行评估。如果没有找到匹配的记录,则避免进行聚集索引查找。如果找到匹配的记录,即使是在删除标记的记录中,InnoDB也会在聚集索引中查找记录。

你可能感兴趣的:(MySQL MVCC官网翻译与讲解)