The Log-Structured Merge-Bush & the Wacky Continuum (1)

  • LSM-tree

    • buffer writes in memory, flushes them as sorted runs to storage, and merges similarly sized runs across multiple levels of exponentially increasing capacities.
    • use in-memory fence pointers to allow reads to find the relevant key range at each run
    • use in-memory Bloom Filters to allow point reads to skip runs that do not contain a target entry.
    • applications: LevelDB, BigTable, RocksDB, etc.
  • Problem 1: Three-Way Trade-Off

    • trade-off between the costs of reads, writes, and memory
    • not possible to simultaneously achieve the best case on all three metrics
    • for faster point reads
      • use more memory to enlarge the Bloom Filters and thereby reduce read I/Os
      • increase write cost by compacting runs more greedily to restrict the number of runs
        To sum up, tuning to improve on one of these three metrics makes either one or both of the other metrics worse.
  • Problem 2: Worse Trade-Offs as the Data Grows

    • x-axis measures write cost as write-amplification, defined as the average number of times a data entry gets rewritten due to compactions.
    • y-axis measures memory as the average number of bits per entry needed across all Bloom Filters to fix the expected number of false positives to a small constant.
      As data grows, the Pareto frontier moves outwards leading to worse trade-offs. The trade-off deteriorates at even faster rate for plain Tiered and Leveled design.


      The Log-Structured Merge-Bush & the Wacky Continuum (1)_第1张图片
      worse trade-offs as the data grows
  • Write and Point Read Intensive Workloads

    • Write Intensive workloads with mostly or only point reads
    • For WPI workloads, there are untapped opportunities for improving the scalability of the three-way read/write/memory trade-off.
  • Insight: Not all Compactions are Created Equal

    • not all compactions are as impactful: some improve point reads and memory while others improve them negligibly.
    • larger levels contribute the most to the cost of point reads and memory; compaction overheads at smaller levels increase logarithmically with the data size while yielding exponentially diminishing returns.
    • existing designs: assign fixed, uniform capacity ratios between any pair of adjacent levels.
  • The log-structured merge bush

    • a new data structure for WPI applications
    • set increasing capacity ratios between smaller pairs of adjacent levels
      • smaller levels get lazier as they gather more runs before merging them
      • non-impactful compactions are eliminated, overall compaction costs grow more slowly with the data size

你可能感兴趣的:(The Log-Structured Merge-Bush & the Wacky Continuum (1))