2.1 How a Two Component LSM-tree Grows
To trace the metamorphosis of an LSM-tree from the beginning of its growth, let us begin with a first insertion to the C0 tree component in memory.
跟踪LSM-tree增加内容的变化过程,先从内存存储的C0树插入开始。
Unlike the C1 tree, the C0 tree is not expected to have a B-tree-like structure.
不像C1,C0没有类似B树的结构。
For one thing, the nodes could be any size: there is no need to insist on disk page size nodes since the C0 tree never sits on disk, and so we need not sacrifice CPU efficiency to minimize depth.
首先,大小没有限制:不需要匹配磁盘页大小,C0不会存储在磁盘上,所以不需要消耗CPU减少深度。
Thus a (2-3) tree or AVL-tree (as explained, for example, in [1]) are possible alternative structures for a C0 tree.
因此一个(2-3)树活在AVL-tree是两个可以选择的C0结构。
When the growing C0 tree first reaches its threshold size, a leftmost sequence of entries is deleted from the C0 tree (this should be done in an efficient batch manner rather than one entry at a time) and reorganized into a C1 tree leaf node packed 100% full.
当C0达到边界大小,一个最左边的序列会从C0删除(批量操作,而不是每条内容只想一次)重组写入填满C1的子节点。
Successive leaf nodes are placed left-to-right in the initial pages of a buffer resident multi-page block until the block is full;
连续的叶子节点从左到右存储在缓存的多页存储块中,知道存满;
then this block is written out to disk to become the first part of the C1 tree disk-resident leaf level.
然后存储磁盘称为C1的叶节点。
A directory node structure for the C1 tree is created in memory buffers as successive leaf nodes are added, with details explained below.
C1的节点目录连续的存储在内存中,详细内容后续分析。
Successive multi-page blocks of the C1 tree leaf level in ever increasing key-sequence order are written out to disk to keep the C0 tree threshold size from exceeding its threshold.
以不断增加的键序顺序将C1树叶子级的连续多页块写入磁盘,以防止C0树阈值大小超过其阈值。(有道翻译)
Upper level C1 tree directory nodes are maintained in separate multi-page block buffers, or else in single page buffers, whichever makes more sense from a standpoint of total memory and disk arm cost;
上层C1树目录节点是在单独的多页块缓冲区中维护的,或者是在单页缓冲区中维护的,从总内存和磁盘臂成本的角度来看,哪个更有意义;(有道翻译)
entries in these directory nodes contain separators that channel access to individual single-page nodes below, as in a B-tree.
这些目录节点中的条目包含分隔符,分隔符引导对下面单个单页节点的访问,就像在b -树中一样。(有道翻译)
The intention is to provide efficient exact-match access along a path of single page index nodes down to the leaf level, avoiding multi-page block reads in such a case to minimize memory buffer requirements.
这样做的目的是沿着单页索引节点到叶节点的路径提供高效的精确匹配访问,在这种情况下避免多页块读取,以最小化内存缓冲区需求。(有道翻译)
Thus we read and write multi- page blocks for the rolling merge or for long range retrievals, and single-page nodes for indexed find (exact-match) access.
因此,我们为滚动合并或长范围检索读写多页块,为索引的find(精确匹配)访问读写单页节点。(有道翻译)
A somewhat different architecture that supports such a dichotomy is presented in [21].
一个不同的架构见[21].
Partially full multi-page blocks of C1 directory nodes are usually allowed to remain in buffer while a sequence of leaf node blocks are written out.
在写出叶节点块序列时,C1目录节点的部分满的多页块通常允许留在缓冲区中。(有道翻译)
C1 directory nodes are forced to new positions on disk when:
C1目录在下面的情况会强制写入磁盘新空间:
-A multi-page block buffer containing directory nodes becomes full
多页缓存满
-The root node splits, increasing the depth of the C1 tree (to a depth greater than two)
根分裂,增加一层深度
-A checkpoint is performed
执行检查点
In the first case, the single multi-page block which has filled is written out to disk.
在第一种情况下,已填满的单个多页块被写入磁盘。
In the latter two cases, all multi-page block buffers and directory node buffers are flushed to disk.
后面两种情况,多页块的缓存和目录缓存写入磁盘。
After the rightmost leaf entry of the C0 tree is written out to the C1 tree for the first time, the process starts over on the left end of the two trees, except that now and with successive passes multi-page leaf-level blocks of the C1 tree must be read into buffer and merged with the entries in the C0 tree, thus creating new multi-page leaf blocks of C1 to be written to disk.
最右边的C0树的叶子条目后首次写入C1树,左边的过程重新开始的两棵树,现在除了和连续通过多页面C1树的叶级块必须读入缓冲区和合并C0树中的条目,从而创建新的C1多页叶块,将其写入磁盘。(有道翻译)
Once the merge starts, the situation is more complex.
一旦开始合并,场景就更复杂了。
We picture the rolling merge process in a two component LSM-tree as having a conceptual cursor which slowly circulates in quantized steps through equal key values of the C0 tree and C1 tree components, drawing indexing data out from the C0 tree to the C1 tree on disk.
我们将两个组件lsm树中的滚动合并过程描绘为一个概念性游标,它以量化的步骤缓慢地循环C0树和C1树组件的相同键值,将索引数据从C0树提取到磁盘上的C1树。(有道翻译)。
The rolling merge cursor has a position at the leaf level of the C1 tree and within each higher directory level as well.
位置标记在C1的叶子层上,同时在每一个更高层的目录上。
At each level, all currently merging multi-page blocks of the C1 tree will in general be split into two blocks:
每层,C1的当前多页块一般分成两块:
the "empty- ing" block whose entries have been depleted but which retains information not yet reached by the merge cursor,
空块:保留了已删除但是合并标记还没有执行到的内容,
and the "filling" block which reflects the result of the merge up to this moment.
填充块:合并结果保存。
There will be an analogous "filling node" and "emptying node" defining the cursor which will certainly be buffer resident.
缓存中的填充块和空快都有类似的填充节点和空节点对应。
For concurrent access purposes, both the emptying block and the filling block on each level contain an integral number of page-sized nodes of the C1 tree, which simply happen to be buffer resident.
出于并发操作目的,每层的空快和填充快都有一个整数表示页节点大小,简化缓存操作。
(During the merge step that restructures individual nodes, other types of concurrent access to the entries on those nodes are blocked.
在合并操作重构节点阶段,其他并发操作节点的被阻塞。
) Whenever a complete flush of all buffered nodes to disk is required, all buffered information at each level must be written to new positions on disk (
当一个完整flush缓存操作执行,所有缓存信息写入新磁盘新位置
with positions reflected in superior directory information, and a sequential log entry for recovery purposes).
带有目录信息序列化操作log用于恢复信息)。
At a later point, when the filling block in buffer on some level of the C1 tree fills and must be flushed again, it goes to a new position.
之后,一个填充快需要写入磁盘时,会写入一个新的磁盘位置。
Old information that might still be needed during recovery is never overwritten on disk, only invalidated as new writes succeed with more up-to-date information.
旧的信息仍然可能用于恢复信息,只有更新成功了,信息才失效。
A somewhat more detailed explanation of the rolling merge process is presented in Section 4, where con- currency and recovery designs are considered.
第四章更详细介绍合并处理过程,包括并发和恢复的设计实现。
It is an important efficiency consideration of the LSM-tree that when the rolling merge process on a particular level of the C1 tree passes through nodes at a relatively high rate, all reads and writes are in multi-page blocks.
当在C1树的特定级别上滚动合并进程以相对较高的速度通过节点时,所有读和写都在多页块中,这是lsm树的一个重要效率考虑因素。(有道翻译)
By eliminating seek time and rotational latency, we expect to gain a large advantage over random page I/O involved in normal B-tree entry insertion.
通过消除寻道时间和旋转延迟,我们期望获得比普通b树插入所涉及的随机页面I/O更大的优势。(有道翻译)
(This advantage is analyzed below, in Section 3.2.
详细信息见3.2.
) The idea of always writing multi-page blocks to new locations was inspired by the Log-Structured File System devised by Rosenblum and Ousterhout [23], from which the Log-Structured Merge-tree takes its name.
总是将多页块写入新位置的想法受到了Rosenblum和Ousterhout[23]设计的日志结构文件系统的启发,日志结构合并树就是从这个系统中获得名字的。(有道翻译)
Note that the continuous use of new disk space for fresh multi-page block writes implies that the area of disk being written will wrap, and old discarded blocks must be reused.
请注意,为新的多页块写而持续使用新的磁盘空间意味着要写入的磁盘区域将被包装,并且必须重用旧的丢弃的块。(有道翻译)
This bookkeeping can be done in a memory table; old multi-page blocks are invalidated and reused as single units, and re- covery is guaranteed by the checkpoint.
这种簿记可以在内存表中完成;旧的多页块失效并作为单个单元重用,恢复由检查点保证。(有道翻译)
In the Log-Structured File System, the reuse of old blocks involves significant I/O because blocks are typically only partially freed up, so reuse requires a block read and block write.
在日志结构文件系统中,旧块的重用涉及大量的I/O,因为块通常只被部分释放,所以重用需要读块和写块。(有道翻译)
In the LSM-Tree, blocks are totally freed up on the trailing edge of the rolling merge, so no extra I/O is involved.
在LSM-Tree中,滚动合并的后缘完全释放了块,因此不涉及额外的I/O。(有道翻译)
吐槽:作者的长句子太长了。
todo:在分析一下,部分没有自己翻译