LSM-Tree(50)

3.4 LSM-trees: Component Sizes(16)

Here is a full explanation of the two-component solution. The insert rate R = 160,000 bytes/sec is turned into 40 pages/second that need to be merged from C0 to C1. Since C1 is 68 times larger than C0, merging a page from C0 requires 68 page reads and 68 writes to C1, a total of 5450 pages per second. But this is exactly what 13.5 disks provide in multiblock I/O capacity.
下面是对双组分解的完整解释。插入速率R = 160,000字节/秒被转换为40页/秒,需要从C0合并到C1。由于C1比C0大68倍,从C0合并一个页面需要对C1进行68个页面读取和68个页面写入,总共每秒5450个页面。但这正是13.5磁盘提供的多块I/O容量。(有道翻译)

With an LSM-tree of three components for the R = 160,000 bytes/second case, the cost of the largest disk component and a cost-balanced I/O rate are calculated as for two components. With Si/Si-1=r fori=1,2, byTheorem3.1,wecalculater=23andS0=17MBytes(for memory cost of 1700) for fully occupied disk arms. The smaller disk component costs just 1/23 of the larger. Now increasing the memory size from this point has no good cost effect, and decreasing the memory size will result in a corresponding factor, squared, increase in the cost of disk. Since the cost for disk is currently a good deal higher than the cost of memory, we do not gain cost effectiveness by memory size reduction. Thus we have an analogous s = t solution in the three-component case. Allowing an additional 4 MBytes of memory for buffering, costing400, for the two rolling merge operations, the total cost for a 3 component LSM-tree is therefore 2,100 for memory, or a total cost of $11,300, a further sig- nificant improvement over the cost of a 2-component LSM-tree.
对于R = 160,000字节/秒的情况,使用由三个组件组成的lsm树,计算两个组件的最大磁盘组件的成本和成本平衡的I/O率。Si/Si-1=r fori=1,2,通过定理3.1,我们的计算器=23andS0=17MBytes(内存成本为1700美元)为完全占用的磁盘臂。较小的磁盘组件的成本仅为较大磁盘组件的1/23。现在从这一点增加内存大小没有很好的成本效果,而减少内存大小将导致相应的因素,平方,增加磁盘的成本。由于磁盘的成本目前比内存的成本高得多,我们不能通过减小内存大小来获得成本效益。因此,在三分量情况下,我们有一个类似的s = t解。允许一个额外的4 mb的内存缓冲,花费400美元,为两个滚动合并操作,因此LSM-tree 3组件的总成本是9200美元,内存,磁盘+ 2100美元或11300美元的总成本,进一步团体——nificant改进成本2-component LSM-tree。(有道翻译)

Here is a full explanation of the three-component solution. The in-memory component C0 has 17 Mbytes, the smaller disk component C1 is 23 times larger, at 400 Mbytes, and C2 is 23 times larger than C1, at 9.2 Gbytes. Each page of the 40 pages/second of data that must be merged from C0 to C1 entails 23 pages of reading and 23 of writing, or 1840 pages per second. Similarly, 40 pages/second are being merged from C1 to C2, each of which requires 23 pages of reads and writes of C2. The total of the two I/O rates is 3680, exactly the multiblock I/O ca- pacity of the 9.2 G of disk.
以下是对三部分解决方案的完整解释。内存组件C0有17mbytes,较小的硬盘组件C1是C1的23倍,为400mbytes, C2是C1的23倍,为9.2 Gbytes。在必须从C0合并到C1的40页/秒的数据中,每一页需要23页读和23页写,即每秒1840页。同样,从C1到C2的合并速度为40页/秒,每一页需要对C2进行23页的读写操作。这两个I/O比率的总和是3680,正好是9.2 G磁盘的多块I/O容量。(有道翻译)

An LSM-tree of two or three components will require more I/O for find operations than the simple B-tree. The largest component in either case will look very much like the corresponding simple B-tree, but in the LSM-tree case we have not paid the 6,400 for memory for buffering nodes just above the leaf level in the index. Nodes even higher in the tree are relatively so few as to be negligable, and we can assume they are buffered. Clearly we would be willing to pay for buffering all directory nodes if queries to find entries were sufficiently frequent to justify this cost. In the three-component case, we need to consider the C1 component as well. Since it is 23 times smaller than the largest component, we can easily afford to buffer all of its non-leaf nodes, and this cost should be added in the analysis. The unbuffered leaf access in C1 entails another additional read for the find in cases where an entry in C2 is being sought, and there is a decision to be made whether to buffer the directory of C2. Thus for the three-component case, there may be a few additional page reads over the two I/Os needed for finds in the simple B-tree (counting one I/O for a page write of a leaf node). For the two-component case, there may be one additional read. If we do buy the memory for the buffering of nodes above leaf level of the LSM-tree components, we can meet the B-tree speed in the two-component case and pay for one extra read only in some cases in the three-component case. The total cost to add buffering in the three-component case would then be17,700, still far less than the B-tree. But it may well be better to use this money in other ways: a full analysis should minimize total cost over the workload, including both updates and retrievals.
与简单的b树相比,包含两个或三个组件的lsm树需要更多的I/O来进行查找操作。在这两种情况下,最大的组件都非常类似于相应的简单b树,但在lsm树的情况下,我们没有为缓冲索引中仅高于叶级的节点支付6400美元的内存。树中更高的节点相对较少,可以忽略,我们可以假设它们是缓冲的。显然,如果查找条目的查询足够频繁,我们愿意为缓存所有目录节点付出代价。在三分量的情况下,我们也需要考虑C1分量。由于它比最大的组件小23倍,我们可以轻松地缓冲它的所有非叶节点,这个成本应该在分析中添加。在查找C2中的条目时,C1中的未缓冲叶访问需要对find进行额外的读取,并且需要决定是否缓冲C2的目录。因此,对于三组件的情况,在简单b -树中查找所需的两个I/O之外,可能会有一些额外的页面读操作(叶节点的页面写需要一个I/O)。对于双分量的情况,可能会有一个额外的读取。如果我们确实购买了用于缓冲LSM-tree组件叶级以上节点的内存,那么在双组件的情况下,我们可以满足b -树速度,而在三组件的情况下,在某些情况下,我们只需要支付一次额外的读操作。在三组件的情况下,添加缓冲的总成本将是17700美元,仍然远远低于b -树。但是,以其他方式使用这笔钱可能更好:全面分析应该将工作负载的总成本(包括更新和检索)降至最低。(有道翻译)

We have minimized the total I/O needed for merge operations with given S0 by varying the size ratios ri, with the result of Theorem 3.1, and then minimized the total cost by choosing S0 to achieve best disk arm and media cost. The only remaining variation possible in the LSM-tree is the total number, K+1, of components provided. It turns out that as we increase the number of components the size of S0 continues to decrease until the point is reached where the ratio r be- tween component sizes reaches the value e = 2.71. . . , or until we reach the cold-data regime. However, we can see from Example 3.4 that successively smaller S0 components as the number of components increases make less and less difference to total cost; in an LSM-tree of three components, the memory size S0 has already been reduced to 17 MBytes. Furthermore, there are costs associated with increasing the number of components: a CPU cost to perform the ad- ditional rolling merges and a memory cost to buffer the nodes of those merges (which will ac- tually swamp the memory cost of C0 in common cost regimes). In addition, indexed finds re- quiring immediate response will sometimes have to perform retrieval from all component trees. These considerations put a strong constraint on the appropriate number of components, and three components are probably the most that will be seen in practice.
根据定理3.1的结果,我们已经通过改变大小比率ri使给定S0的合并操作所需的总I/O最小化,然后通过选择S0使总成本最小化以获得最佳的磁盘臂和媒体成本。lsm树中唯一可能的变化是所提供的组件的总数(K+1)。结果表明,随着组件数量的增加,S0的尺寸继续减小,直到达到一个点,即组件尺寸的比值r达到e = 2.71…,或者直到我们到达冷数据状态。但是,由例3.4可以看出,随着组件数量的增加,S0组件数量的不断减少,对总成本的影响越来越小;在由三个组件组成的lsm树中,内存大小S0已经减少到17 mb。此外,还存在与组件数量增加相关的成本:执行附加滚动合并的CPU成本和缓冲这些合并的节点的内存成本(在一般成本机制中,这实际上会抵消C0的内存成本)。此外,需要立即响应的索引查找有时必须从所有组件树中执行检索。这些考虑对适当的组件数量施加了严格的限制,在实践中可能会看到最多的三个组件。(有道翻译)

todo:仔细看一遍,翻译一遍

你可能感兴趣的:(LSM-Tree(50))