前一阵子Cassandra-0.7.0-beta1发布了,今天把代码拿下来粗略浏览了一下,发现主要有以下几点变化:
1 数据模型中的Keyspace和ColumnFamily可以动态修改:
之前的版本中,如果想在Cassandra中修改Keyspace和ColumnFamily,必须先停掉Cassandra,然后修改配置文件,最后再重启Cassandra才能生效。
在现在的版本中,我们只需要定义新的Keyspace和ColumnFamily,然后再调用Thrift接口将新的Keyspace和ColumnFamily定义发送给Cassandra即可。
相关的结构体和接口定义可以在cassandra.thrift文件中找到:
/* 相关结构体定义. */ /* describes a column in a column family. */ struct ColumnDef { 1: required binary name, 2: required string validation_class, 3: optional IndexType index_type, 4: optional string index_name } /* describes a column family. */ struct CfDef { 1: required string keyspace, 2: required string name, 3: optional string column_type="Standard", 4: optional string clock_type="Timestamp", 5: optional string comparator_type="BytesType", 6: optional string subcomparator_type="", 7: optional string reconciler="", 8: optional string comment="", 9: optional double row_cache_size=0, 10: optional bool preload_row_cache=0, 11: optional double key_cache_size=200000, 12: optional double read_repair_chance=1.0 13: optional list<ColumnDef> column_metadata 14: optional i32 gc_grace_seconds } /* describes a keyspace. */ struct KsDef { 1: required string name, 2: required string strategy_class, 3: optional map<string,string> strategy_options, 4: required i32 replication_factor, 5: required list<CfDef> cf_defs, } /* 相关接口定义. */ /** adds a column family. returns the new schema id. */ string system_add_column_family(1:required CfDef cf_def) throws (1:InvalidRequestException ire), /** drops a column family. returns the new schema id. */ string system_drop_column_family(1:required string column_family) throws (1:InvalidRequestException ire), /** renames a column family. returns the new schema id. */ string system_rename_column_family(1:required string old_name, 2:required string new_name) throws (1:InvalidRequestException ire), /** adds a keyspace and any column families that are part of it. returns the new schema id. */ string system_add_keyspace(1:required KsDef ks_def) throws (1:InvalidRequestException ire), /** drops a keyspace and any column families that are part of it. returns the new schema id. */ string system_drop_keyspace(1:required string keyspace) throws (1:InvalidRequestException ire), /** renames a keyspace. returns the new schema id. */ string system_rename_keyspace(1:required string old_name, 2:required string new_name) throws (1:InvalidRequestException ire),
2 增加二级索引,提供对Column的value进行查询的功能:
和几乎所有的K/V系统一样,Cassandra只能提供对key的查询,如果我们希望查询某一个key下的value值为一个特定值的情况,只能是将所有的数据取出来,然后遍历,或者使用一些其他的方案提供查询效率避免全表扫描。如:我之前的文章《反转Cassandra索引》,还有一个叫做Lucandra。
如果希望在新的版本中使用二级索引的功能,需要在ColumnFamily中指定要对哪个Column建立索引。同时指定的建立索引方式(目前只支持IndexType.KEYS)。
当包含索引的ColumnFamily在Cassandra建立的时候,Cassandra会额外为ColumnFamily中每一个需要建立索引的Column再建立独立的IndexedColumnFamily。
当写入数据的时候,数据不仅会出存储和数据相关的ColumnFamily中,IndexedColumnFamily中也会存储所有和本索引相关的数据。
当按照索引查询数据的时候,Cassandra将直接从IndexedColumnFamily查询相应的数据。
相关的结构体和接口定义可以在cassandra.thrift文件中找到:
/* 相关结构体定义. */ enum IndexType { KEYS, } /* describes a column in a column family. */ struct ColumnDef { 1: required binary name, 2: required string validation_class, 3: optional IndexType index_type, 4: optional string index_name } /* 相关接口定义. */ /** Returns the subset of columns specified in SlicePredicate for the rows matching the IndexClause */ list<KeySlice> get_indexed_slices(1:required ColumnParent column_parent, 2:required IndexClause index_clause, 3:required SlicePredicate column_predicate, 4:required ConsistencyLevel consistency_level=ONE) throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
3 配置文件格式修改
新版本的Cassandra采用了yaml格式来进行配置,好处是可读性更好。
我们可以对比一下配置集群的名称这个选项,2中不同格式的区别:
老版本(storage-conf.xml):
<!--
~ The name of this cluster. This is mainly used to prevent machines in
~ one logical cluster from joining another.
-->
< ClusterName > Test Cluster </ ClusterName >
新版本(cassandra.yaml):
# name of the cluster
cluster_name: ' Test Cluster '
除此之外。还有大量的修改:
0.7-beta1
* sstable versioning (CASSANDRA-389)
* switched to slf4j logging (CASSANDRA-625)
* add (optional) expiration time for column (CASSANDRA-699)
* access levels for authentication/authorization (CASSANDRA-900)
* add ReadRepairChance to CF definition (CASSANDRA-930)
* fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
* convert to byte[] keys internally and all public APIs (CASSANDRA-767)
* ability to alter schema definitions on a live cluster (CASSANDRA-44)
* renamed configuration file to cassandra.xml, and log4j.properties to
log4j-server.properties, which must now be loaded from
the classpath (which is how our scripts in bin/ have always done it)
(CASSANDRA-971)
* change get_count to require a SlicePredicate. create multi_get_count
(CASSANDRA-744)
* re-organized endpointsnitch implementations and added SimpleSnitch
(CASSANDRA-994)
* Added preload_row_cache option (CASSANDRA-946)
* add CRC to commitlog header (CASSANDRA-999)
* removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
* add truncate thrift method (CASSANDRA-531)
* http mini-interface using mx4j (CASSANDRA-1068)
* optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
* replace constant-size 2GB mmaped segments and special casing for index
entries spanning segment boundaries, with SegmentedFile that computes
segments that always contain entire entries/rows (CASSANDRA-1117)
* avoid reading large rows into memory during compaction (CASSANDRA-16)
* added hadoop OutputFormat (CASSANDRA-1101)
* efficient Streaming (no more anticompaction) (CASSANDRA-579)
* split commitlog header into separate file and add size checksum to
mutations (CASSANDRA-1179)
* avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
* revise HH schema to be per-endpoint (CASSANDRA-1142)
* add joining/leaving status to nodetool ring (CASSANDRA-1115)
* allow multiple repair sessions per node (CASSANDRA-1190)
* optimize away MessagingService for local range queries (CASSANDRA-1261)
* make framed transport the default so malformed requests can't OOM the
server (CASSANDRA-475)
* significantly faster reads from row cache (CASSANDRA-1267)
* take advantage of row cache during range queries (CASSANDRA-1302)
* make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
* keep persistent row size and column count statistics (CASSANDRA-1155)
* add IntegerType (CASSANDRA-1282)
* page within a single row during hinted handoff (CASSANDRA-1327)
* push DatacenterShardStrategy configuration into keyspace definition,
eliminating datacenter.properties. (CASSANDRA-1066)
* optimize forward slices starting with '' and single-index-block name
queries by skipping the column index (CASSANDRA-1338)
* streaming refactor (CASSANDRA-1189)
* faster comparison for UUID types (CASSANDRA-1043)
* secondary index support (CASSANDRA-749 and subtasks)
更多关于Cassandra的文章:http://www.cnblogs.com/gpcuster/tag/Cassandra/