翻译 Basic Operations Merge Operator

原文网址:https://github.com/facebook/rocksdb/wiki/Merge-Operator

(有道)
部分内容未理解

This page describes the Atomic Read-Modify-Write operation in RocksDB, known as the "Merge" operation. It is an interface overview, aimed at the client or RocksDB user who has the questions: when and why should I use Merge; and how do I use Merge?
这个页面描述了RocksDB的原子读-修改-写操作,也就是所谓的“合并”操作。这是一种接口概述,针对那些有以下问题的客户或RocksDB用户:何时以及为什么我应该使用Merge;我如何使用合并?

Why

RocksDB is a high-performance embedded persistent key-value store. It traditionally provides three simple operations Get, Put and Delete to allow an elegant Lookup-table-like interface. https://github.com/facebook/rocksdb/blob/main/include/rocksdb/db.h
RocksDB是一个高性能的嵌入式持久键值存储。它通常提供三个简单的操作Get、Put和Delete,以实现一个优雅的类似于查找表的接口。https://github.com/facebook/rocksdb/blob/main/include/rocksdb/db.h

Often times, it's a common pattern to update an existing value in some ways. To do this in rocksdb, the client would have to read (Get) the existing value, modify it and then write (Put) it back to the db. Let's look at a concrete example.
通常,以某种方式更新现有值是一种常见的模式。在rocksdb中,客户端需要读取(Get)现有的值,修改它,然后写入(Put)到数据库中。让我们看一个具体的例子。

Imagine we are maintaining a set of uint64 counters. Each counter has a distinct name. We would like to support four high level operations: Set, Add, Get and Remove.
假设我们维护一组uint64计数器。每个计数器都有一个不同的名称。我们希望支持四个高级操作:设置、添加、获取和删除。

First, we define the interface and get the semantics right. Error handling is brushed aside for clarity.
首先,我们定义接口并获得正确的语义。为了清晰起见,错误处理被放在一边。

class Counters {
 public:
  // (re)set the value of a named counter
  virtual void Set(const string& key, uint64_t value);

  // remove the named counter
  virtual void Remove(const string& key);

  // retrieve the current value of the named counter, return false if not found
  virtual bool Get(const string& key, uint64_t *value);

  // increase the named counter by value.
  // if the counter does not exist,  treat it as if the counter was initialized to zero
  virtual void Add(const string& key, uint64_t value);
  };

Second, we implement it with the existing rocksdb support. Pseudo-code follows:
其次,我们使用现有的rocksdb支持来实现它。伪代码如下:

    class RocksCounters : public Counters {
     public:
      static uint64_t kDefaultCount = 0;
      RocksCounters(std::shared_ptr db);

      // mapped to a RocksDB Put
      virtual void Set(const string& key, uint64_t value) {
        string serialized = Serialize(value);
        db_->Put(put_option_, key,  serialized));
      }

      // mapped to a RocksDB Delete
      virtual void Remove(const string& key) {
        db_->Delete(delete_option_, key);
      }

      // mapped to a RocksDB Get
      virtual bool Get(const string& key, uint64_t *value) {
        string str;
        auto s = db_->Get(get_option_, key,  &str);
        if (s.ok()) {
          *value = Deserialize(str);
          return true;
        } else {
          return false;
        }
      }

      // implemented as get -> modify -> set
      virtual void Add(const string& key, uint64_t value) {
        uint64_t base;
        if (!Get(key, &base)) {
          base = kDefaultCount;
        }
        Set(key, base + value);
      }
    };

Note that, other than the Add operation, all other three operations can be mapped directly to a single operation in rocksdb. Coding-wise, it's not that bad. However, a conceptually single operation Add is nevertheless mapped to two rocksdb operations. This has performance implication too - random Get is relatively slow in rocksdb.
注意,除了Add操作,所有其他三个操作都可以直接映射到rocksdb中的单个操作。在编码方面,它并没有那么糟糕。然而,概念上的单个操作Add仍然映射到两个rocksdb操作。这也有性能上的影响-随机Get在rocksdb中相对较慢。

Now, suppose we are going to host Counters as a service. Given the number of cores of servers nowadays, our service is almost certainly multithreaded. If the threads are not partitioned by the key space, it's possible that multiple Add requests of the same counter, be picked up by different threads and executed concurrently. Well, if we also have strict consistency requirement (missing an update is not acceptable), we would have to wrap Add with external synchronization, a lock of some sort. The overhead adds up.
现在,假设我们要将Counters作为一个服务来托管。考虑到现在服务器的核心数量,我们的服务几乎肯定是多线程的。如果线程没有被键空间分区,那么可能同一计数器的多个Add请求被不同的线程拾取并并发执行。好吧,如果我们也有严格的一致性要求(丢失一个更新是不可接受的),我们就必须用外部同步来包装Add,这是某种锁。开销增加了。

What if RocksDb directly supports the Add functionality? We might come up with something like this then:
如果RocksDb直接支持Add功能呢?我们可能会得出这样的结论:

    virtual void Add(const string& key, uint64_t value) {
      string serialized = Serialize(value);
      db->Add(add_option, key, serialized);
    }

This seems reasonable for Counters. But not everything you store in RocksDB is a counter. Say we need to track the locations where a user has been to. We could store a (serialized) list of locations as the value of a user key. It would be a common operation to add a new location to the existing list. We might want an Append operation in this case: db->Append(user_key, serialize(new_location)). This suggests that the semantics of the read-modify-write operation are really client value-type determined. To keep the library generic, we better abstract out this operation, and allow the client to specify the semantics. That brings us to our proposal: Merge.
这对于Counters来说似乎是合理的。但并非所有你储存在RocksDB中的内容都是一个计数器。假设我们需要追踪用户去过的位置。我们可以将位置列表(序列化)存储为用户键的值。将新位置添加到现有列表是一种常见操作。在本例中,我们可能需要一个追加操作:db->追加(user_key, serialize(new_location))。这表明读-修改-写操作的语义实际上是由客户端值类型决定的。为了保持库的泛型,我们最好将此操作抽象出来,并允许客户端指定语义。 这就引出了我们的建议:合并。

What

We have developed a generic Merge operation as a new first-class operation in RocksDB to capture the read-modify-write semantics.
我们在RocksDB中开发了一个泛型Merge操作,作为一个新的一级操作来捕获读-修改-写语义。

This Merge operation:
这个合并操作:

  • Encapsulates the semantics for read-modify-write into a simple abstract interface.
    将read-modify-write的语义封装到一个简单的抽象接口中。

  • Allows user to avoid incurring extra cost from repeated Get() calls.
    允许用户避免因重复Get()调用而产生额外的成本。

  • Performs back-end optimizations in deciding when/how to combine the operands without changing the underlying semantics.
    在不改变底层语义的情况下,在决定何时/如何组合操作数时执行后端优化。

  • Can, in some cases, amortize the cost over all incremental updates to provide asymptotic increases in efficiency.
    在某些情况下,可以在所有增量更新中摊销成本,以提供渐进的效率增长。

How to Use It

In the following sections, the client-specific code changes are explained. We also provide a brief outline of how to use Merge.
在下面的部分中,将解释特定于客户端的代码更改。我们还简要介绍了如何使用Merge。
It is assumed that the reader already knows how to use classic RocksDB (or LevelDB), including:
假设读者已经知道如何使用经典的RocksDB(或LevelDB),包括:

  • The DB class (including construction, DB::Put(), DB::Get(), and DB::Delete())
    DB类(包括构造、DB::Put()、DB::Get()和DB::Delete())

  • The Options class (and how to specify database options upon creation)
    Options类(以及如何在创建时指定数据库选项)

  • Knowledge that all keys/values written to the database are simple strings of bytes.
    了解写入数据库的所有键/值都是由字节组成的简单字符串。

Overview of the Interface

We have defined a new interface/abstract-base-class: MergeOperator.
我们定义了一个新的接口/抽象基类:MergeOperator。
It exposes some functions telling RocksDB how to combine incremental update operations (called "merge operands") with base-values (Put/Delete).
它展示了一些告诉RocksDB如何将增量更新操作(称为“合并操作数”)与基本值(Put/Delete)结合在一起的函数。
These functions can also be used to tell RocksDB how to combine merge operands with each other to form new merge operands (called "Partial" or "Associative" merging).
这些函数也可以用来告诉RocksDB如何将合并操作数相互组合以形成新的合并操作数(称为“Partial”或“Associative”合并)。

For simplicity, we will temporarily ignore this concept of Partial vs. non-Partial merging.
为了简单起见,我们暂时忽略Partial与non-Partial合并的概念。
So we have provided a separate interface called AssociativeMergeOperator which encapsulates and hides all of the details around partial merging.
因此,我们提供了一个名为AssociativeMergeOperator的独立接口,它封装并隐藏了与部分合并有关的所有细节。
And, for most simple applications (such as in our 64-Bit Counters example above), this will suffice.
而且,对于大多数简单的应用程序(如上面的64位计数器示例),这就足够了。

So the reader should assume that all merging is handled via an interface called AssociativeMergeOperator.
因此,读者应该假设所有的合并都是通过一个名为AssociativeMergeOperator的接口处理的。
Here is the public interface:
下面是公共接口:

    // The Associative Merge Operator interface.
    // Client needs to provide an object implementing this interface.
    // Essentially, this class specifies the SEMANTICS of a merge, which only
    // client knows. It could be numeric addition, list append, string
    // concatenation, ... , anything.
    // The library, on the other hand, is concerned with the exercise of this
    // interface, at the right time (during get, iteration, compaction...)
    class AssociativeMergeOperator : public MergeOperator {
     public:
      virtual ~AssociativeMergeOperator() {}

      // Gives the client a way to express the read -> modify -> write semantics
      // key:           (IN) The key that's associated with this merge operation.
      // existing_value:(IN) null indicates the key does not exist before this op
      // value:         (IN) the value to update/merge the existing_value with
      // new_value:    (OUT) Client is responsible for filling the merge result here
      // logger:        (IN) Client could use this to log errors during merge.
      //
      // Return true on success. Return false failure / error / corruption.
      virtual bool Merge(const Slice& key,
                         const Slice* existing_value,
                         const Slice& value,
                         std::string* new_value,
                         Logger* logger) const = 0;

      // The name of the MergeOperator. Used to check for MergeOperator
      // mismatches (i.e., a DB created with one MergeOperator is
      // accessed using a different MergeOperator)
      virtual const char* Name() const = 0;

     private:
      ...
    };

Some Notes:

  • AssociativeMergeOperator is a sub-class of a class called MergeOperator. We will see later that the more generic MergeOperator class can be more powerful in certain cases. The AssociativeMergeOperator we use here is, on the other hand, a simpler interface.
    AssociativeMergeOperator是一个名为MergeOperator的类的子类。稍后我们将看到,更泛型的MergeOperator类在某些情况下可能更强大。另一方面,我们在这里使用的AssociativeMergeOperator是一个更简单的接口。

  • existing_value could be nullptr. This is useful in case the Merge operation is the first operation of a key. nullptr indicates that the 'existing' value does not exist. This basically defers to the client to interpret the semantics of a merge operation without a pre-value. Client could do whatever reasonable. For example, Counters::Add assumes a zero value, if none exists.
    Existing_value可以是nullptr。这在合并操作是键的第一个操作的情况下是有用的。Nullptr表示“现有”值不存在。这基本上是由客户端来解释没有前置值的合并操作的语义。委托人可以做任何合理的事情。例如,如果不存在,则Counters::Add假定为零值。

  • We pass in the key so that client could multiplex the merge operator based on it, if the key space is partitioned and different subspaces refer to different types of data which have different merge operation semantics. For example, the client might choose to store the current balance (a number) of a user account under the key "BAL:uid" and the history of the account activities (a list) under the key "HIS:uid", in the same DB. (Whether or not this is a good practice is debatable). For current balance, numeric addition is a perfect merge operator; for activity history, we would need a list append though. Thus, by passing the key back to the Merge callback, we allow the client to differentiate between the two types.
    如果对键空间进行了划分,并且不同的子空间指的是不同类型的具有不同合并操作语义的数据,那么客户端可以在此基础上对合并操作符进行复用。例如,客户端可以选择在同一个数据库中,在键“BAL:uid”下存储用户帐户的当前余额(一个数字),在键“HIS:uid”下存储帐户活动的历史记录(一个列表)。(这是否是一个好的实践值得商榷)。对于当前余额,数字加法是一个完美合并算子;对于活动历史,我们需要一个列表附加。因此,通过将键返回给Merge回调,我们允许客户端区分这两种类型。

Example:

     void Merge(...) {
       if (key start with "BAL:") {
         NumericAddition(...)
       } else if (key start with "HIS:") {
         ListAppend(...);
       }
     }

Other Changes to the client-visible interface

To use Merge in an application, the client must first define a class which inherits from the AssociativeMergeOperator interface (or the MergeOperator interface as we will see later).
要在应用程序中使用Merge,客户端必须首先定义一个继承自AssociativeMergeOperator接口(或者我们稍后将看到的MergeOperator接口)的类。
This object class should implement the functions of the interface, which will (eventually) be called by RocksDB at the appropriate time, whenever it needs to apply merging. In this way, the merge-semantics are completely client-specified.
这个对象类应该实现接口的函数,在需要合并的时候,RocksDB会在适当的时候调用这些函数。这样,合并语义完全由客户端指定。

After defining this class, the user should have a way to specify to RocksDB to use this merge operator for its merges. We have introduced additional fields/methods to the DB class and the Options class for this purpose:
定义了这个类之后,用户就可以指定RocksDB使用这个合并操作符进行合并了。为此,我们向DB类和Options类引入了额外的字段/方法:

    // In addition to Get(), Put(), and Delete(), the DB class now also has an additional method: Merge().
    class DB {
      ...
      // Merge the database entry for "key" with "value". Returns OK on success,
      // and a non-OK status on error. The semantics of this operation is
      // determined by the user provided merge_operator when opening DB.
      // Returns Status::NotSupported if DB does not have a merge_operator.
      virtual Status Merge(
        const WriteOptions& options,
        const Slice& key,
        const Slice& value) = 0;
      ...
    };

    Struct Options {
      ...
      // REQUIRES: The client must provide a merge operator if Merge operation
      // needs to be accessed. Calling Merge on a DB without a merge operator
      // would result in Status::NotSupported. The client must ensure that the
      // merge operator supplied here has the same name and *exactly* the same
      // semantics as the merge operator provided to previous open calls on
      // the same DB. The only exception is reserved for upgrade, where a DB
      // previously without a merge operator is introduced to Merge operation
      // for the first time. It's necessary to specify a merge operator when
      // opening the DB in this case.
      // Default: nullptr
      const std::shared_ptr merge_operator;
      ...
    };

Note: The Options::merge_operator field is defined as a shared-pointer to a MergeOperator. As specified above, the AssociativeMergeOperator inherits from MergeOperator, so it is okay to specify an AssociativeMergeOperator here. This is the approach used in the following example.
注意:Options::merge_operator字段被定义为一个指向MergeOperator的共享指针。如上所述,AssociativeMergeOperator继承自MergeOperator,因此可以在这里指定AssociativeMergeOperator。这就是下面示例中使用的方法。

Client code change:

Given the above interface change, the client can implement a version of Counters that directly utilizes the built-in Merge operation.
根据上面的接口变化,客户端可以实现一个直接利用内置Merge操作的Counters版本。

Counters v2:

    // A 'model' merge operator with uint64 addition semantics
    class UInt64AddOperator : public AssociativeMergeOperator {
     public:
      virtual bool Merge(
        const Slice& key,
        const Slice* existing_value,
        const Slice& value,
        std::string* new_value,
        Logger* logger) const override {

        // assuming 0 if no existing value
        uint64_t existing = 0;
        if (existing_value) {
          if (!Deserialize(*existing_value, &existing)) {
            // if existing_value is corrupted, treat it as 0
            Log(logger, "existing value corruption");
            existing = 0;
          }
        }

        uint64_t oper;
        if (!Deserialize(value, &oper)) {
          // if operand is corrupted, treat it as 0
          Log(logger, "operand value corruption");
          oper = 0;
        }

        auto new = existing + oper;
        *new_value = Serialize(new);
        return true;        // always return true for this, since we treat all errors as "zero".
      }

      virtual const char* Name() const override {
        return "UInt64AddOperator";
       }
    };

    // Implement 'add' directly with the new Merge operation
    class MergeBasedCounters : public RocksCounters {
     public:
      MergeBasedCounters(std::shared_ptr db);

      // mapped to a leveldb Merge operation
      virtual void Add(const string& key, uint64_t value) override {
        string serialized = Serialize(value);
        db_->Merge(merge_option_, key, serialized);
      }
    };

    // How to use it
    DB* dbp;
    Options options;
    options.merge_operator.reset(new UInt64AddOperator);
    DB::Open(options, "/tmp/db", &dbp);
    std::shared_ptr db(dbp);
    MergeBasedCounters counters(db);
    counters.Add("a", 1);
    ...
    uint64_t v;
    counters.Get("a", &v);

The user interface change is relatively small. And the RocksDB back-end takes care of the rest.
用户接口的变化相对较小。而RocksDB的后端则负责完成剩下的工作。

Associativity vs. Non-Associativity

Up until now, we have used the relatively simple example of maintaining a database of counters. And it turns out that the aforementioned AssociativeMergeOperator interface is generally pretty good for handling many use-cases such as this. For instance, if you wanted to maintain a set of strings, with an "append" operation, then what we've seen so far could be easily adapted to handle that as well.
到目前为止,我们使用的是维护计数器数据库的相对简单的示例。事实证明,前面提到的AssociativeMergeOperator接口通常非常适合处理许多这样的用例。例如,如果您希望使用“追加”操作来维护一组字符串,那么我们到目前为止所看到的内容也可以很容易地进行调整,以处理该操作。

So, why are these cases considered "simple"? Well, implicitly, we have assumed something about the data: associativity. This means we have assumed that:
那么,为什么这些案例被认为是“简单”的呢?隐含地说,我们假设了数据的结合性。这意味着我们假定:

  • The values that are Put() into the RocksDB database have the same format as the merge operands called with Merge(); and
    Put()在RocksDB数据库中的值与merge()调用的合并操作数的格式相同;和

  • It is okay to combine multiple merge operands into a single merge operand using the same user-specified merge operator.
    可以使用相同的用户指定的合并操作符将多个合并操作数合并为单个合并操作数。

For example, look at the Counters case. The RocksDB database internally stores each value as a serialized 8-byte integer. So, when the client calls Counters::Set (corresponding to a DB::Put()), the argument is exactly in that format. And similarly, when the client calls Counters::Add (corresponding to a DB::Merge()), the merge operand is also a serialized 8-byte integer. This means that, in the client's UInt64AddOperator, the *existing_value may have corresponded to the original Put(), or it may have corresponded to a merge operand; it doesn't really matter! In all cases, as long as the *existing_value and value are given, the UInt64AddOperator behaves in the same way: it adds them together and computes the *new_value. And in turn, this *new_value may be fed into the merge operator later, upon subsequent merge calls.
例如,看看Counters案例。RocksDB数据库内部将每个值存储为一个序列化的8字节整数。因此,当客户端调用Counters::Set(对应于DB::Put())时,参数就是这种格式的。类似地,当客户端调用Counters::Add(对应于DB::Merge())时,Merge操作数也是一个序列化的8字节整数。这意味着,在客户端的UInt64AddOperator中,*existing_value可能对应于原始的Put(),或者它可能对应于一个合并操作数;这真的不重要!在所有的情况下,只要existing_value和value是给定的,UInt64AddOperator的行为是相同的:它将它们加在一起并计算new_value。反过来,这个*new_value可以在随后的合并调用中被输入到合并操作符中。

By contrast, it turns out that RocksDB merge can be used in more powerful ways than this. For example, suppose we wanted our database to store a set of json strings (such as PHP arrays or objects). Then, within the database, we would want them to be stored and retrieved as fully formatted json strings, but we might want the "update" operation to correspond to updating a property of the json object. So we might be able to write code like:
相比之下,事实证明,RocksDB的合并可以在更强大的方面发挥作用。例如,假设我们希望数据库存储一组json字符串(例如PHP数组或对象)。然后,在数据库中,我们希望它们被存储和检索为完全格式化的json字符串,但我们可能希望“update”操作对应于更新json对象的属性。所以我们可以这样写代码:

    ...
    // Put/store the json string into to the database
    db_->Put(put_option_, "json_obj_key",
             "{ employees: [ {first_name: john, last_name: doe}, {first_name: adam, last_name: smith}] }");

    ...

    // Use a pre-defined "merge operator" to incrementally update the value of the json string
    db_->Merge(merge_option_, "json_obj_key", "employees[1].first_name = lucy");
    db_->Merge(merge_option_, "json_obj_key", "employees[0].last_name = dow");

In the above pseudo-code, we see that the data would be stored in RocksDB as a json string (corresponding to the original Put()), but when the client wants to update the value, a "javascript-like" assignment-statement string is passed as the merge-operand. The database would store all of these strings as-is, and would expect the user's merge operator to be able to handle it.
在上面的伪代码中,我们看到数据将以json字符串的形式存储在RocksDB中(对应于原始的Put()),但当客户端想要更新值时,一个“类似javascript”的赋值语句字符串被作为合并操作数传递。数据库将按原样存储所有这些字符串,并期望用户的合并操作符能够处理它。

Now, the AssociativeMergeOperator model cannot handle this, simply because it assumes the associativity constraints as mentioned above. That is, in this case, we have to distinguish between the base-values (json strings) and the merge-operands (the assignment statements); and we also don't have an (intuitive) way of combining the merge-operands into a single merge-operand. So this use-case does not fit into our "associative" merge model. That is where the generic MergeOperator interface becomes useful.
现在,AssociativeMergeOperator模型不能处理这个问题,因为它假定了上面提到的结合性约束。也就是说,在这种情况下,我们必须区分基值(json字符串)和合并操作数(赋值语句);我们也没有一种(直观的)方法将合并操作数合并成单个合并操作数。所以这个用例不适合我们的“结合的”合并模型。这就是泛型MergeOperator接口有用的地方。

The Generic MergeOperator interface

The MergeOperator interface is designed to support generality and also to exploit some of the key ways in which RocksDB operates in order to provide an efficient solution for "incremental updates". As noted above in the json example, it is possible for the base-value types (Put() into the database) to be formatted completely differently than the merge operands that are used to update them. Also, we will see that it is sometimes beneficial to exploit the fact that some merge operands can be combined to form a single merge operand, while some others may not. It all depends on the client's specific semantics. The MergeOperator interface provides a relatively simple way of providing these semantics as a client.
MergeOperator接口旨在支持通用性,并利用RocksDB的一些关键运营方式,为“增量更新”提供有效的解决方案。如上所述,在json示例中,base-value类型(Put()进入数据库)的格式可能与用于更新它们的合并操作数完全不同。此外,我们还将看到,有时利用一些合并操作数可以组合成单个合并操作数这一事实是有益的,而另一些则可能不能。这完全取决于客户端的特定语义。MergeOperator接口提供了一种作为客户端提供这些语义的相对简单的方法。

    // The Merge Operator
    //
    // Essentially, a MergeOperator specifies the SEMANTICS of a merge, which only
    // client knows. It could be numeric addition, list append, string
    // concatenation, edit data structure, ... , anything.
    // The library, on the other hand, is concerned with the exercise of this
    // interface, at the right time (during get, iteration, compaction...)
    class MergeOperator {
     public:
      virtual ~MergeOperator() {}

      // Gives the client a way to express the read -> modify -> write semantics
      // key:         (IN) The key that's associated with this merge operation.
      // existing:    (IN) null indicates that the key does not exist before this op
      // operand_list:(IN) the sequence of merge operations to apply, front() first.
      // new_value:  (OUT) Client is responsible for filling the merge result here
      // logger:      (IN) Client could use this to log errors during merge.
      //
      // Return true on success. Return false failure / error / corruption.
      virtual bool FullMerge(const Slice& key,
                             const Slice* existing_value,
                             const std::deque& operand_list,
                             std::string* new_value,
                             Logger* logger) const = 0;

      struct MergeOperationInput { ... };
      struct MergeOperationOutput { ... };
      virtual bool FullMergeV2(const MergeOperationInput& merge_in,
                               MergeOperationOutput* merge_out) const;

      // This function performs merge(left_op, right_op)
      // when both the operands are themselves merge operation types.
      // Save the result in *new_value and return true. If it is impossible
      // or infeasible to combine the two operations, return false instead.
      virtual bool PartialMerge(const Slice& key,
                                const Slice& left_operand,
                                const Slice& right_operand,
                                std::string* new_value,
                                Logger* logger) const = 0;

      // The name of the MergeOperator. Used to check for MergeOperator
      // mismatches (i.e., a DB created with one MergeOperator is
      // accessed using a different MergeOperator)
      virtual const char* Name() const = 0;

      // Determines whether the MergeOperator can be called with just a single
      // merge operand.
      // Override and return true for allowing a single operand. FullMergeV2 and
      // PartialMerge/PartialMergeMulti should be implemented accordingly to handle
      // a single operand.
      virtual bool AllowSingleOperand() const { return false; }
    };

Some Notes:

  • MergeOperator has two methods, FullMerge() and PartialMerge(). The first method is used when a Put/Delete is the *existing_value (or nullptr). The latter method is used to combine two-merge operands (if possible).
    MergeOperator有两个方法,FullMerge()和PartialMerge()。当Put/Delete是*existing_value(或nullptr)时,使用第一个方法。后一种方法用于合并两个合并操作数(如果可能的话)。
  • AssociativeMergeOperator simply inherits from MergeOperator and provides private default implementations of these methods, while exposing a wrapper function for simplicity.
    AssociativeMergeOperator只是继承了MergeOperator,并提供了这些方法的私有默认实现,同时为了简单起见公开了一个包装器函数。
  • In MergeOperator, the "FullMerge()" function takes in an *existing_value and a sequence (std::deque) of merge operands, rather than a single operand. We explain below.
    在合并操作符中,“FullMerge()”函数接受一个*existing_value和一个合并操作数序列(std::deque),而不是单个操作数。我们解释如下。

How do these methods work?

On a high level, it should be noted that any call to DB::Put() or DB::Merge() does not necessarily force the value to be computed or the merge to occur immediately. RocksDB will more-or-less lazily decide when to actually apply the operations (e.g.: the next time the user calls Get(), or when the system decides to do its clean-up process called "Compaction"). This means that, when the MergeOperator is actually invoked, it may have several "stacked" operands that need to be applied. Hence, the MergeOperator::FullMerge() function is given an *existing_value and a list of operands that have been stacked. The MergeOperator should then apply the operands one-by-one (or in whatever optimized way the client decides so that the final *new_value is computed as if the operands were applied one-by-one).
在高层次上,应该注意到,任何对DB::Put()或DB::Merge()的调用都不一定强制计算值或立即进行合并。RocksDB或多或少会延迟决定何时实际应用这些操作(例如:下次用户调用Get()时,或者当系统决定进行名为“Compaction”的清理过程时)。这意味着,当实际调用MergeOperator时,它可能有几个需要应用的“堆叠”操作数。因此,给MergeOperator::FullMerge()函数一个*existing_value和一个已经堆叠的操作数列表。然后,MergeOperator应该一个接一个地应用操作数(或者采用客户端决定的任何优化方式,以便计算最终的*new_value,就像一个接一个地应用操作数一样)。

Partial Merge vs. Stacking

Sometimes, it may be useful to start combining the merge-operands as soon as the system encounters them, instead of stacking them. The MergeOperator::PartialMerge() function is supplied for this case. If the client-specified operator can logically handle "combining" two merge-operands into a single operand, the semantics for doing so should be provided in this method, which should then return true. If it is not logically possible, then it should simply leave *new_value unchanged and return false.
有时,在系统遇到合并操作数时立即开始组合它们,而不是将它们堆积起来,这可能是有用的。在这种情况下,提供了MergeOperator::PartialMerge()函数。如果客户端指定的操作符能够在逻辑上处理将两个合并操作数“合并”为单个操作数,那么应该在此方法中提供这样做的语义,然后返回true。如果它在逻辑上是不可能的,那么它应该简单地保持*new_value不变并返回false。

Conceptually, when the library decides to begin its stacking and applying process, it first tries to apply the client-specified PartialMerge() on each pair of operands it encounters. Whenever this returns false, it will instead resort to stacking, until it finds a Put/Delete base-value, in which case it will call the FullMerge() function, passing the operands as a list parameter. Generally speaking, this final FullMerge() call should return true. It should really only return false if there is some form of corruption or bad-data.
从概念上讲,当库决定开始堆叠和应用过程时,它首先尝试对遇到的每一对操作数应用客户端指定的PartialMerge()。当这个函数返回false时,它会使用堆叠,直到它找到一个Put/Delete基值,在这种情况下,它会调用FullMerge()函数,将操作数作为列表参数传递。一般来说,最后的FullMerge()调用应该返回true。它应该只在存在某种形式的损坏或坏数据时返回false。

How AssociativeMergeOperator fits in

As alluded to above, AssociativeMergeOperator inherits from MergeOperator and allows the client to specify a single merge function. It overrides PartialMerge() and FullMerge() to use this AssociativeMergeOperator::Merge(). It is then used for combining operands, and also when a base-value is encountered. That is why it only works under the "associativity" assumptions described above (and it also explains the name).
如上所述,AssociativeMergeOperator继承自MergeOperator,并允许客户端指定单个合并函数。它覆盖了PartialMerge()和FullMerge()来使用AssociativeMergeOperator::Merge()。然后,它用于组合操作数,也用于遇到基值时。这就是为什么它只能在上面描述的“联想性”假设下工作(这也解释了它的名字)。

When to allow a single merge operand

Typically a merge operator is invoked only if there are at least two merge operands to act on. Override AllowSingleOperand() to return true if you need the merge operator to be invoked even with a single operand. An example use case for this is if you are using merge operator to change the value based on a TTL so that it could be dropped during later compactions (may be using a compaction filter).
通常,只有在至少有两个合并操作数要操作时,才会调用合并操作符。Override AllowSingleOperand()返回true,如果你需要调用合并操作符,即使只有一个操作数。这方面的一个示例用例是,如果您正在使用合并操作符来更改基于TTL的值,以便在稍后的压缩期间可以删除它(可能使用压缩过滤器)。

JSON Example

Using our generic MergeOperator interface, we now have the ability to implement the json example.
使用我们的通用的MergeOperator接口,我们现在有能力实现json示例。

    // A 'model' pseudo-code merge operator with json update semantics
    // We pretend we have some in-memory data-structure (called JsonDataStructure) for
    // parsing and serializing json strings.
    class JsonMergeOperator : public MergeOperator {          // not associative
     public:
      virtual bool FullMerge(const Slice& key,
                             const Slice* existing_value,
                             const std::deque& operand_list,
                             std::string* new_value,
                             Logger* logger) const override {
        JsonDataStructure obj;
        if (existing_value) {
          obj.ParseFrom(existing_value->ToString());
        }

        if (obj.IsInvalid()) {
          Log(logger, "Invalid json string after parsing: %s", existing_value->ToString().c_str());
          return false;
        }

        for (const auto& value : operand_list) {
          auto split_vector = Split(value, " = ");      // "xyz[0] = 5" might return ["xyz[0]", 5] as an std::vector, etc.
          obj.SelectFromHierarchy(split_vector[0]) = split_vector[1];
          if (obj.IsInvalid()) {
            Log(logger, "Invalid json after parsing operand: %s", value.c_str());
            return false;
          }
        }

        obj.SerializeTo(new_value);
        return true;
      }


      // Partial-merge two operands if and only if the two operands
      // both update the same value. If so, take the "later" operand.
      virtual bool PartialMerge(const Slice& key,
                                const Slice& left_operand,
                                const Slice& right_operand,
                                std::string* new_value,
                                Logger* logger) const override {
        auto split_vector1 = Split(left_operand, " = ");   // "xyz[0] = 5" might return ["xyz[0]", 5] as an std::vector, etc.
        auto split_vector2 = Split(right_operand, " = ");

        // If the two operations update the same value, just take the later one.
        if (split_vector1[0] == split_vector2[0]) {
          new_value->assign(right_operand.data(), right_operand.size());
          return true;
        } else {
          return false;
        }
      }

      virtual const char* Name() const override {
        return "JsonMergeOperator";
       }
    };

    ...

    // How to use it
    DB* dbp;
    Options options;
    options.merge_operator.reset(new JsonMergeOperator);
    DB::Open(options, "/tmp/db", &dbp);
    std::shared_ptr db_(dbp);
    ...
    // Put/store the json string into to the database
    db_->Put(put_option_, "json_obj_key",
             "{ employees: [ {first_name: john, last_name: doe}, {first_name: adam, last_name: smith}] }");

    ...

    // Use the "merge operator" to incrementally update the value of the json string
    db_->Merge(merge_option_, "json_obj_key", "employees[1].first_name = lucy");
    db_->Merge(merge_option_, "json_obj_key", "employees[0].last_name = dow");

Error Handling

If the MergeOperator::PartialMerge() returns false, this is a signal to RocksDB that the merging should be deferred (stacked) until we find a Put/Delete value to FullMerge() with.
如果MergeOperator::PartialMerge()返回false,这是给RocksDB的一个信号,合并应该延迟(堆叠),直到我们找到一个Put/Delete值给FullMerge()。
However, if FullMerge() returns false, then this is treated as "corruption" or error. This means that RocksDB will usually reply to the client with a Status::Corruption message or something similar.
然而,如果FullMerge()返回false,那么这将被视为“损坏”或错误。这意味着RocksDB通常会用Status::Corruption之类的消息来回复客户端。
Hence, the MergeOperator::FullMerge() method should only return false if there is absolutely no robust way of handling the error within the client logic itself.
因此,只有在客户端逻辑本身中绝对没有健壮的方法处理错误时,MergeOperator::FullMerge()方法才应该返回false。
(See the JsonMergeOperator example)
参见JsonMergeOperator示例

For AssociativeMergeOperator, the Merge() method follows the same "error" rules as MergeOperator::FullMerge() in terms of error-handling. Return false only if there is no logical way of dealing with the values. In the Counters example above, our Merge() always returns true, since we can interpret any bad value as 0.
对于AssociativeMergeOperator,在错误处理方面,Merge()方法遵循与MergeOperator::FullMerge()相同的“错误”规则。只有在没有逻辑方法处理这些值时才返回false。在上面的Counters例子中,我们的Merge()总是返回true,因为我们可以将任何错误的值解释为0。

Get Merge Operands

This is an API to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. This API is available from version 6.4.0.
这是一个允许获取所有与Key相关的合并操作数的API。这个API的主要动机是支持不需要进行完全在线合并的用例,因为这对性能很敏感。该API可从6.4.0版本获得。

Example use-cases:
示例用例:

  1. Storing a KV pair where V is a collection of sorted integers and new values may get appended to the collection and subsequently users want to search for a value in the collection.
    存储KV对,其中V是排序整数的集合,新值可能会被附加到集合中,随后用户希望在集合中搜索值。
    Example KV:
    Key: ‘Some-Key’ Value: [2], [3,4,5], [21,100], [1,6,8,9]
    To store such a KV pair users would typically call the Merge API as:
    a. db→Merge(WriteOptions(), 'Some-Key', '2');
    b. db→Merge(WriteOptions(), 'Some-Key', '3,4,5');
    c. db→Merge(WriteOptions(), 'Some-Key', '21,100');
    d. db→Merge(WriteOptions(), 'Some-Key', '1,6,8,9');
    and implement a Merge Operator that would simply convert the Value to [2,3,4,5,21,100,1,6,8,9] upon a Get API call and then search in the resultant value. In such a case doing the merge online is unnecessary and simply returning all the operands [2], [3,4,5], [21, 100] and [1,6,8,9] and then search through the sub-lists proves to be faster while saving CPU and achieving the same outcome.
    并实现一个合并操作符,在Get API调用时简单地将值转换为[2,3,4,5,21,100,1,6,8,9],然后在结果值中搜索。在这种情况下,在线合并是不必要的,简单地返回所有的操作数[2],[3,4,5],[21,100]和[1,6,8,9],然后在子列表中搜索被证明是更快的,同时节省CPU和实现相同的结果。

  2. Update subset of columns and read subset of columns - Imagine a SQL Table, a row may be encoded as a KV pair. If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
    更新列的子集和读取列的子集——想象一个SQL表,一行可能被编码为一个KV对。如果有很多列,而用户只更新其中的一个,我们可以使用合并操作符来减少写放大。虽然用户在读取查询中只读取一或两个列,但该特性可以避免整个行的合并,并节省一些CPU。

  3. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
    在一个类似json的文档中更新很少的属性——使用合并操作符可以有效地更新一个属性,而如果我们不需要进行完整的合并,则可以更有效地读取一个属性。

 API: 
  // Returns all the merge operands corresponding to the key. If the
  // number of merge operands in DB is greater than
  // merge_operands_options.expected_max_number_of_operands
  // no merge operands are returned and status is Incomplete. Merge operands
  // returned are in the order of insertion.
  // merge_operands- Points to an array of at-least
  //             merge_operands_options.expected_max_number_of_operands and the
  //             caller is responsible for allocating it. If the status
  //             returned is Incomplete then number_of_operands will contain
  //             the total number of merge operands found in DB for key.
  virtual Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands) = 0;

  Example: 
  int size = 100;
  int number_of_operands = 0;
  std::vector values(size);
  GetMergeOperandsOptions merge_operands_info;
  merge_operands_info.expected_max_number_of_operands = size;
  db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, 
  &number_of_operands);

The above API returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands, no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
上述API返回与键对应的所有合并操作数。如果DB中的合并操作数大于merge_operands_options。expected_max_number_of_operands,没有返回合并操作数和状态是不完整。合并操作数的返回顺序与插入操作数的顺序一致。

DB Bench has a benchmark that uses Example 1 to demonstrate the performance difference of doing an online merge and then operating on the collection vs simply returning the sub-lists and operating on the sub-lists. To run the benchmark the command is :
DB Bench有一个基准测试,它使用了示例1来演示执行在线合并然后操作集合与简单地返回子列表并操作子列表的性能差异。运行基准测试命令如下:

./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist
The merge_operator used above is used to sort the data across all the sublists for the online merge case which happens automatically when Get API is called.
上面使用的merge_operator用于对在线合并情况下的所有子列表中的数据进行排序,这种情况在调用Get API时自动发生。

Review and Best Practices

Altogether, we have described the Merge Operator, and how to use it. Here are a couple tips on when/how to use the MergeOperator and AssociativeMergeOperator depending on use-cases.
总之,我们已经描述了合并操作符,以及如何使用它。这里有一些关于何时/如何根据用例使用MergeOperator和AssociativeMergeOperator的技巧。

When to use merge

If the following are true:
如果下列情况为真:

  • You have data that needs to be incrementally updated.
    您有需要增量更新的数据。
  • You would usually need to read the data before knowing what the new value would be.
    在知道新值是什么之前,您通常需要读取数据。

Then use one of the two Merge operators as specified in this wiki.
然后使用本wiki中指定的两个Merge操作符之一。

Associative Data

If the following are true:
如果下列情况为真:

  • Your merge operands are formatted the same as your Put values, AND
    合并操作数的格式与Put值和格式相同
  • It is okay to combine multiple operands into one (as long as they are in the same order)
    可以将多个操作数合并为一个操作数(只要顺序相同)

Then use AssociativeMergeOperator.
然后使用AssociativeMergeOperator

Generic Merge

If either of the two associativity constraints do not hold, then use MergeOperator.
如果这两个结合性约束都不成立,则使用MergeOperator。

If there are some times where it is okay to combine multiple operands into one (but not always):
如果有时可以将多个操作数合并为一个操作数(但不总是这样):

  • Use MergeOperator
    使用MergeOperator
  • Have the PartialMerge() function return true in cases where the operands can be combined.
    在操作数可以组合的情况下,让PartialMerge()函数返回true。

Tips

Multiplexing: While a RocksDB DB object can only be passed 1 merge-operator at the time of construction, your user-defined merge operator class can behave differently depending on the data passed to it. The key, as well as the values themselves, will be passed to the merge operator; so one can encode different "operations" in the operands themselves, and get the MergeOperator to perform different functions accordingly.
虽然RocksDB的DB对象在构造时只能传递一个合并操作符,但用户定义的合并操作符类可以根据传递给它的数据的不同表现不同的行为。键和值本身将被传递给合并操作符;因此可以在操作数本身中编码不同的“操作”,并让合并操作符相应地执行不同的函数。

Is my use-case Associative?: If you are unsure of whether the "associativity" constraints apply to your use-case, you can ALWAYS use the generic MergeOperator. The AssociativeMergeOperator is a direct subclass of MergeOperator, so any use-case that can be solved with the AssociativeMergeOperator can be solved with the more generic MergeOperator. The AssociativeMergeOperator is mostly provided for convenience.
如果你不确定“结合性”约束是否适用于你的用例,你可以总是使用通用的合并操作符。AssociativeMergeOperator是合并操作符的一个直接子类,所以任何可以用AssociativeMergeOperator解决的用例都可以用更通用的合并操作符解决。AssociativeMergeOperator主要是为了方便而提供的。

Useful Links

  • [[Merge+Compaction Implementation Details|Merge-Operator-Implementation]]: For RocksDB engineers who want to know how MergeOperator affects their code.
    [[Merge+Compaction Implementation Details|Merge-Operator-Implementation]]: 为那些想知道MergeOperator如何影响他们的代码的RocksDB工程师准备。

你可能感兴趣的:(翻译 Basic Operations Merge Operator)