3.3 Multi-Component LSM-trees(2)
To avoid a small value for M the only course with a two-component LSM-tree is to increase the size of the C0 component relative to that of C1. Consider a two-component LSM-tree of given total leaf entry size S (S = S0 + S1, an approximately stable value), and assume we have a constant rate R in bytes per second of new entry inserts into C0. For simplicity, we assume that no entries inserted into C0 are deleted before they get out to component C1, and therefore entries must migrate out to component C1 through the rolling merge at the same rate that they are in- serted into C0 to keep the size of C0 near its threshold size. (Given that the total size S is ap- proximately stable, this also implies that the insertion rate into C0 must be balanced by a constant deletion rate from C1, possibly using a succession of predicate deletes.) As we vary the size of C0, we affect the circulation speed of the merge cursor. A constant migration rate out to C1 in bytes per second requires that the rolling merge cursor move through entries of C0 at a constant rate in bytes per second, and therefore as the size of C0 decreases the circulation rate from smallest to largest index values in C0 will increase; as a result, the I/O rate for multi- page blocks in C1 to perform the rolling merge must also increase. If a C0 size of a single entry were possible, at this conceptual extreme point we would require a circulation through all multi-page blocks of C1 for each newly inserted entry, an immense demand on I/O. The ap- proach of merging C0 and C1, rather than accessing relevant nodes of C1 for each newly inserted entry as is done with the B-tree, would become a millstone around our necks. By comparison, larger size C0 components will slow down the circulation of the merge cursor and decrease the I/O cost of inserts. However, this will increase the cost of the memory-resident component C0.
为了避免M值偏小,唯一使用双分量lsm树的方法是相对于C1增大C0分量大小。考虑一个给定总叶条目大小S (S = S0 + S1,一个近似稳定的值)的双组分lsm树,并假设我们有一个恒定的速率R(以字节/秒为单位),将新条目插入C0。为简单起见,我们假设之前没有删除条目插入C0组件C1,因此条目必须迁移通过滚动组件C1合并以同样的速度,他们——泽特C0 C0接近阈值的大小尺寸。(考虑到总大小S是近似稳定的,这也意味着对C0的插入率必须与C1的常量删除率平衡,可能使用一系列谓词删除。)当我们改变C0的大小时,就会影响合并游标的循环速度。不断迁移率在字节每秒C1要求滚动游标移动通过合并条目的C0以恒定速率每秒字节,因此C0减少循环率的大小从最小到最大的索引值C0将增加;因此,C1中执行滚动合并的多页块的I/O率也必须增加。如果单个条目的C0大小是可能的,在这个概念上的极端点上,我们需要为每个新插入的条目在C1的所有多页块中循环,这对I/O有巨大的需求。合并C0和C1的方法,而不是像b树那样为每个新插入的条目访问C1的相关节点,将成为我们的负担。相比之下,更大的C0组件将减缓合并游标的循环,并降低插入的I/O成本。然而,这将增加内存驻留组件C0的成本。(有道翻译)
todo:自己翻译