首先让我们思考一下分布式系统中的 RPC (Remote Procedure Call) 问题,一个完整的 RPC 模块需要可以分为三个层次

  • 服务层(service):RPC 接口定义与实现
  • 协议层(protocol):RPC 报文格式和数据编码格式
  • 传输层(transport):实现底层的通信(如 socket)以及系统相关的功能(如事件循环、多线程)

在实际的大型分布式系统中,不同的服务往往会使用不同的语言来实现,所以一般的 RPC 系统会提供一种跨语言的过程调用功能,比如一段用C++实现的客户端代码可以远程调用一个用 Java 实现的服务。实现跨语言 RPC 有两种方法

  • 静态代码生成:开发者用一种中间语言(IDL,接口定义语言)来定义 RPC 的接口和数据类型,然后通过一个编译器来生成不同语言的代码(如C++, Java, Python),并由生成的代码来负责 RPC 协议层和传输层的实现。例如,服务的实现用C++,则服务端需要生成实现RPC协议和传输层的C++代码,服务层使用生成的代码来实现与客户端的通信;而如果客户端用 Python,则客户端需要生成Python代码。
  • 基于“自省”的动态类型系统来实现:协议和传输层可以只用一种语言实现成一个库,但是这种语言需要关联一个具备“自省”或者反射机制的动态类型系统,对外提供其他语言的绑定,客户端和服务端通过语言绑定来使用 RPC。比如,可以考虑用 C 和 GObject 实现一个 RPC 库,然后通过 GObject 实现其他语言的绑定。

第一种方法的优点是RPC的协议层和传输层的实现不需要和某种动态类型系统(如GObject)绑定在一起,同时避免了动态类型检查和转换,程序效率比较高,但是它的缺点是要为不同语言提供不同的 RPC 协议层和传输层实现。第二种方法的主要难度在于语言绑定和通用的对象串行化机制的实现,同时也需要考虑效率的问题。

Thrift 是一个基于静态代码生成的夸语言的RPC协议栈实现,它可以生成包括C++, Java, Python, Ruby, PHP 等主流语言的代码,这些代码实现了 RPC 的协议层和传输层功能,从而让用户可以集中精力于服务的调用和实现。Cassandra 的服务访问协议是基于 Thrift 来实现的。


Thrift 介绍及使用

Thrift 主要由5个部分组成

  • 类型系统以及 IDL 编译器:负责由用户给定的 IDL 文件生成相应语言的接口代码
  • TProtocol:实现 RPC 的协议层,可以选择多种不同的对象串行化方式,如 JSON, Binary。
  • TTransport:实现 RPC 的传输层,同样可以选择不同的传输层实现,如socket, 非阻塞的 socket, MemoryBuffer 等。
  • TProcessor:作为协议层和用户提供的服务实现之间的纽带,负责调用服务实现的接口。
  • TServer:聚合 TProtocol, TTransport 和 TProcessor 几个对象。

上述的这5个部件都是在 Thrift 的源代码中通过为不同语言提供库来实现的,这些库的代码在 Thrift 源码目录的 lib 目录下面,在使用 Thrift 之前需要先熟悉与自己的语言对应的库提供的接口。

Thrift 使用的例子可以在 Thrift 源码的 tutorial 目录下面找到。该目录下已经写好了各种语言使用生成代码的例子,如 java/ 子目录下包含了使用 Thrift 的客户端和服务端的 Java 代码。tutorial.thrift 文件是例子的 IDL 文件,可以运行以下命令生成 Java 接口的代码(假定已经安装 Thrift)

thrift --gen java tutorial.thrift

Cassandra 的 Thrift 接口

IDL 接口定义

Cassandra 的 Thrift 接口定义文件在源码目录的 interface/cassandra.thrift 文件中。

#!/usr/local/bin/thrift --java --php --py
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
# Interface definition for Cassandra Service
namespace java org.apache.cassandra.service
namespace cpp org.apache.cassandra
namespace csharp Apache.Cassandra
namespace py cassandra
namespace php cassandra
namespace perl Cassandra
# Thrift.rb has a bug where top-level modules that include modules 
# with the same name are not properly referenced, so we can't do
# Cassandra::Cassandra::Client.
namespace rb CassandraThrift
# constants
# for clients checking that server and it have same thrift definitions.
# no promises are made other than "if both are equal, you're good."
# in particular, don't try to parse numeric information out and assume
# that a "greater" version is a superset of a "smaller" one.
const string VERSION = "0.5.1"
# data structures
/** Basic unit of data within a ColumnFamily. * @param name. A column name can act both as structure (a label) or as data (like value). Regardless, the name of the column * is used as a key to its value. * @param value. Some data * @param timestamp. Used to record when data was sent to be written. */
struct Column {
   1: required binary name,
   2: required binary value,
   3: required i64 timestamp,
/** A named list of columns. * @param name. see * @param columns. A collection of standard Columns. The columns within a super column are defined in an adhoc manner. * Columns within a super column do not have to have matching structures (similarly named child columns). */
struct SuperColumn {
   1: required binary name,
   2: required list<Column> columns,
/** Methods for fetching rows/records from Cassandra will return either a single instance of ColumnOrSuperColumn or a list of ColumnOrSuperColumns (get_slice()). If you're looking up a SuperColumn (or list of SuperColumns) then the resulting instances of ColumnOrSuperColumn will have the requested SuperColumn in the attribute super_column. For queries resulting in Columns, those values will be in the attribute column. This change was made between 0.3 and 0.4 to standardize on single query methods that may return either a SuperColumn or Column.   @param column. The Column returned by get() or get_slice(). @param super_column. The SuperColumn returned by get() or get_slice(). */
struct ColumnOrSuperColumn {
    1: optional Column column,
    2: optional SuperColumn super_column,
# Exceptions
# (note that internal server errors will raise a TApplicationException, courtesy of Thrift)
/** A specific column was requested that does not exist. */
exception NotFoundException {
/** Invalid request could mean keyspace or column family does not exist, required parameters are missing, or a parameter is malformed. why contains an associated error message. */
exception InvalidRequestException {
    1: required string why
/** Not all the replicas required could be created and/or read. */
exception UnavailableException {
/** RPC timeout was exceeded. either a node failed mid-operation, or load was too high, or the requested op was too large. */
exception TimedOutException {
# service api
/** The ConsistencyLevel is an enum that controls both read and write behavior based on <ReplicationFactor> in your * storage-conf.xml. The different consistency levels have different meanings, depending on if you're doing a write or read * operation. Note that if W + R > ReplicationFactor, where W is the number of nodes to block for on write, and R * the number to block for on reads, you will have strongly consistent behavior; that is, readers will always see the most * recent write. Of these, the most interesting is to do QUORUM reads and writes, which gives you consistency while still * allowing availability in the face of node failures up to half of <ReplicationFactor>. Of course if latency is more * important than consistency then you can use lower values for either or both. * * Write: * ZERO Ensure nothing. A write happens asynchronously in background * ONE Ensure that the write has been written to at least 1 node's commit log and memory table before responding to the client. * QUORUM Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes before responding to the client. * ALL Ensure that the write is written to <code>&lt;ReplicationFactor&gt;</code> nodes before responding to the client. * * Read: * ZERO Not supported, because it doesn't make sense. * ONE Will return the record returned by the first node to respond. A consistency check is always done in a * background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent * calls will have correct data even if the initial read gets an older value. (This is called 'read repair'.) * QUORUM Will query all storage nodes and return the record with the most recent timestamp once it has at least a * majority of replicas reported. Again, the remaining replicas will be checked in the background. * ALL Not yet supported, but we plan to eventually. */
enum ConsistencyLevel {
    ZERO = 0,
    ONE = 1,
    QUORUM = 2,
    DCQUORUM = 3,
    ALL = 5,
/** ColumnParent is used when selecting groups of columns from the same ColumnFamily. In directory structure terms, imagine ColumnParent as ColumnPath + '/../'.   See also <a href="cassandra.html#Struct_ColumnPath">ColumnPath</a> */
struct ColumnParent {
    3: required string column_family,
    4: optional binary super_column,
/** The ColumnPath is the path to a single column in Cassandra. It might make sense to think of ColumnPath and * ColumnParent in terms of a directory structure. * * ColumnPath is used to looking up a single column. * * @param column_family. The name of the CF of the column being looked up. * @param super_column. The super column name. * @param column. The column name. */
struct ColumnPath {
    3: required string column_family,
    4: optional binary super_column,
    5: optional binary column,
/** A slice range is a structure that stores basic range, ordering and limit information for a query that will return multiple columns. It could be thought of as Cassandra's version of LIMIT and ORDER BY   @param start. The column name to start the slice with. This attribute is not required, though there is no default value, and can be safely set to '', i.e., an empty byte array, to start with the first column name. Otherwise, it must a valid value under the rules of the Comparator defined for the given ColumnFamily. @param finish. The column name to stop the slice at. This attribute is not required, though there is no default value, and can be safely set to an empty byte array to not stop until 'count' results are seen. Otherwise, it must also be a value value to the ColumnFamily Comparator. @param reversed. Whether the results should be ordered in reversed order. Similar to ORDER BY blah DESC in SQL. @param count. How many keys to return. Similar to LIMIT 100 in SQL. May be arbitrarily large, but Thrift will materialize the whole result into memory before returning it to the client, so be aware that you may be better served by iterating through slices by passing the last value of one call in as the 'start' of the next instead of increasing 'count' arbitrarily large. */
struct SliceRange {
    1: required binary start,
    2: required binary finish,
    3: required bool reversed=0,
    4: required i32 count=100,
/** A SlicePredicate is similar to a mathematic predicate (see, which is described as "a property that the elements of a set have in common."   SlicePredicate's in Cassandra are described with either a list of column_names or a SliceRange. If column_names is specified, slice_range is ignored.   @param column_name. A list of column names to retrieve. This can be used similar to Memcached's "multi-get" feature to fetch N known column names. For instance, if you know you wish to fetch columns 'Joe', 'Jack', and 'Jim' you can pass those column names as a list to fetch all three at once. @param slice_range. A SliceRange describing how to range, order, and/or limit the slice. */
struct SlicePredicate {
    1: optional list<binary> column_names,
    2: optional SliceRange   slice_range,
/** A KeySlice is key followed by the data it maps to. A collection of KeySlice is returned by the get_range_slice operation.   @param key. a row key @param columns. List of data represented by the key. Typically, the list is pared down to only the columns specified by a SlicePredicate. */
struct KeySlice {
    1: required string key,
    2: required list<ColumnOrSuperColumn> columns,
service Cassandra {
  # retrieval methods
  /** Get the Column or SuperColumn at the given column_path. If no value is present, NotFoundException is thrown. (This is the only method that can throw an exception under non-failure conditions.) */
  ColumnOrSuperColumn get(1:required string keyspace,
                          2:required string key,
                          3:required ColumnPath column_path,
                          4:required ConsistencyLevel consistency_level=1)
                      throws (1:InvalidRequestException ire, 2:NotFoundException nfe, 3:UnavailableException ue, 4:TimedOutException te),
  /** Get the group of columns contained by column_parent (either a ColumnFamily name or a ColumnFamily/SuperColumn name pair) specified by the given SlicePredicate. If no matching values are found, an empty list is returned. */
  list<ColumnOrSuperColumn> get_slice(1:required string keyspace, 
                                      2:required string key, 
                                      3:required ColumnParent column_parent, 
                                      4:required SlicePredicate predicate, 
                                      5:required ConsistencyLevel consistency_level=1)
                            throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** Perform a get for column_path in parallel on the given list<string> keys. The return value maps keys to the ColumnOrSuperColumn found. If no value corresponding to a key is present, the key will still be in the map, but both the column and super_column references of the ColumnOrSuperColumn object it maps to will be null. */
  map<string,ColumnOrSuperColumn> multiget(1:required string keyspace, 
                                           2:required list<string> keys, 
                                           3:required ColumnPath column_path, 
                                           4:required ConsistencyLevel consistency_level=1)
                                  throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** Performs a get_slice for column_parent and predicate for the given keys in parallel. */
  map<string,list<ColumnOrSuperColumn>> multiget_slice(1:required string keyspace, 
                                                       2:required list<string> keys, 
                                                       3:required ColumnParent column_parent, 
                                                       4:required SlicePredicate predicate, 
                                                       5:required ConsistencyLevel consistency_level=1)
                                        throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** returns the number of columns for a particular <code>key</code> and <code>ColumnFamily</code> or <code>SuperColumn</code>. */
  i32 get_count(1:required string keyspace, 
                2:required string key, 
                3:required ColumnParent column_parent, 
                4:required ConsistencyLevel consistency_level=1)
      throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** @deprecated; use get_range_slice instead */
  list<string> get_key_range(1:required string keyspace, 
                             2:required string column_family, 
                             3:required string start="", 
                             4:required string finish="", 
                             5:required i32 count=100,
                             6:required ConsistencyLevel consistency_level=1)
               throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** returns a subset of columns for a range of keys. */
  list<KeySlice> get_range_slice(1:required string keyspace, 
                                 2:required ColumnParent column_parent, 
                                 3:required SlicePredicate predicate,
                                 4:required string start_key="", 
                                 5:required string finish_key="", 
                                 6:required i32 row_count=100, 
                                 7:required ConsistencyLevel consistency_level=1)
                 throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  # modification methods
  /** Insert a Column consisting of (column_path.column, value, timestamp) at the given column_path.column_family and optional column_path.super_column. Note that column_path.column is here required, since a SuperColumn cannot directly contain binary values -- it can only contain sub-Columns. */
  void insert(1:required string keyspace, 
              2:required string key, 
              3:required ColumnPath column_path, 
              4:required binary value, 
              5:required i64 timestamp, 
              6:required ConsistencyLevel consistency_level=0)
       throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** Insert Columns or SuperColumns across different Column Families for the same row key. batch_mutation is a map<string, list<ColumnOrSuperColumn>> -- a map which pairs column family names with the relevant ColumnOrSuperColumn objects to insert. */
  void batch_insert(1:required string keyspace, 
                    2:required string key, 
                    3:required map<string, list<ColumnOrSuperColumn>> cfmap, 
                    4:required ConsistencyLevel consistency_level=0)
       throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  /** Remove data from the row specified by key at the granularity specified by column_path, and the given timestamp. Note that all the values in column_path besides column_path.column_family are truly optional: you can remove the entire row by just specifying the ColumnFamily, or you can remove a SuperColumn or a single Column by specifying those levels too. */
  void remove(1:required string keyspace,
              2:required string key, 
              3:required ColumnPath column_path,
              4:required i64 timestamp,
              5:ConsistencyLevel consistency_level=0)
       throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  // Meta-APIs -- APIs to get information about the node or cluster,
  // rather than user data. The nodeprobe program provides usage examples.
  /** get property whose value is of type string. */
  string get_string_property(1:required string property),
  /** get property whose value is list of strings. */
  list<string> get_string_list_property(1:required string property),
  /** describe specified keyspace */
  map<string, map<string, string>> describe_keyspace(1:required string keyspace)
                                   throws (1:NotFoundException nfe),


调用该接口的 Java 客户端代码在源码目录的 src/java/org/apache/cassandra/cli/ 文件中,而实现服务接口的代码在 src/java/org/apache/cassandra/service/ 文件中。

