====16 Feb 2012, by Bright Zheng (IT进行时)====
We try to learn it step by step to understand the concepts and Java API usages by means of:
1. Concept Introduction
2. CLI
3. Java Sample Code
public QueryResult<HColumn<String,String>> execute() { ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily("Npanxx"); columnQuery.setKey("512204"); columnQuery.setName("city"); QueryResult<HColumn<String, String>> result = columnQuery.execute();
return result; } |
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---
HColumn(city=Austin)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[default@Tutorial] get Npanxx['512204']['city']; => (column=city, value=Austin, timestamp=1329234388328000) Elapsed time: 16 msec(s). |
public QueryResult<ColumnSlice<Long,String>> execute() { SliceQuery<String, Long, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer); sliceQuery.setColumnFamily("StateCity"); sliceQuery.setKey("TX Austin");
//way 1: set multiple columnNames sliceQuery.setColumnNames(202L, 203L, 204L);
//way 2: use setRange // change 'reversed' to true to get the columns in reverse order //sliceQuery.setRange(202L, 204L, false, 5);
QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute(); return result; } |
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial --- ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS |
TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command? |
public QueryResult<Rows<String,String,String>> execute() { MultigetSliceQuery<String, String, String> multigetSlicesQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); multigetSlicesQuery.setColumnFamily("Npanxx"); multigetSlicesQuery.setColumnNames("city","state","lat","lng"); multigetSlicesQuery.setKeys("512202","512203","512205","512206"); QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute(); return results; } |
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
TODO: N/A? |
GetRangeSlicesForStateCity.java
public QueryResult<OrderedRows<String,String,String>> execute() { RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); rangeSlicesQuery.setColumnFamily("Npanxx"); rangeSlicesQuery.setColumnNames("city","state","lat","lng"); rangeSlicesQuery.setKeys("512202", "512205"); rangeSlicesQuery.setRowCount(5); QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute(); return results; } |
Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so). The result can be referred at “Sample Code run by Maven”.
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))
})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
TODO: N/A |
GetSliceForAreaCodeCity.java
public QueryResult<ColumnSlice<String,String>> execute() { SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("AreaCode"); sliceQuery.setKey("512"); // change the order argument to 'true' to get the last 2 columns in descending order // gets the first 4 columns "between" Austin and Austin__204 according to comparator sliceQuery.setRange("Austin", "Austin__204", false, 5);
QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();
return result; } |
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
ColumnSlice([
HColumn(Austin__202=30.27x097.74),
HColumn(Austin__203=30.27x097.74),
HColumn(Austin__204=30.32x097.73)
])
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
N/A |
GetIndexedSlicesForCityState.java
public QueryResult<OrderedRows<String, String, String>> execute() { IndexedSlicesQuery<String, String, String> indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.setColumnFamily("Npanxx"); indexedSlicesQuery.setColumnNames("city","lat","lng"); indexedSlicesQuery.addEqualsExpression("state", "TX"); indexedSlicesQuery.addEqualsExpression("city", "Austin"); indexedSlicesQuery.addGteExpression("lat", "30.30"); QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();
return result; } |
|
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({512204=Row(
512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30'; ------------------- RowKey: 512204 => (column=city, value=Austin, timestamp=1329299521508000) => (column=lat, value=30.32, timestamp=1329299521540000) => (column=lng, value=097.73, timestamp=1329299521555000) => (column=state, value=TX, timestamp=1329299521524000) ------------------- RowKey: 512206 => (column=city, value=Austin, timestamp=1329299521618000) => (column=lat, value=30.32, timestamp=1329299521633000) => (column=lng, value=097.73, timestamp=1329299522491000) => (column=state, value=TX, timestamp=1329299521618000) ------------------- RowKey: 512205 => (column=city, value=Austin, timestamp=1329299521555000) => (column=lat, value=30.32, timestamp=1329299521586000) => (column=lng, value=097.73, timestamp=1329299521602000) => (column=state, value=TX, timestamp=1329299521571000)
3 Rows Returned. Elapsed time: 16 msec(s). |
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer)); mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));
MutationResult mr = mutator.execute(); return null; } |
Omitted |
[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34'; [default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34'; [default@Tutorial] set Npanxx['650222']['lat']='37.57'; … |
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
//Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer) //columnName as null means to delete the whole row. mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer); mutator.addDeletion("650", "AreaCode", null, stringSerializer); mutator.addDeletion("650222", "Npanxx", null, stringSerializer); // adding a non-existent key like the following will cause the insertion of a tombstone // mutator.addDeletion("652", "AreaCode", null, stringSerializer); MutationResult mr = mutator.execute(); return null;
} |
Omitted… |
[default@Tutorial] del StateCity['CA Burlingame']; [default@Tutorial] del AreaCode['650']; [default@Tutorial] del Npanxx['650222']; |
Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.
[default@Tutorial] list StateCity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) 2 Rows Returned. Elapsed time: 16 msec(s). |
As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.
Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!
Fortrunately, the command of ‘get’ won’t retrieve it back any more.
[default@Tutorial] get StateCity['CA Burlingame']; Returned 0 results. Elapsed time: 0 msec(s). |
Go deeper? Please read on.
When will Cassandra remove these tombstones? As I know, two ways:
1. Wait until gc_grace_seconds is timeout (Not verified yet)
The gc_grace_seconds is set per CF and can be updated without a restart.
How to get gc_grace_seconds? Simply use CLI:
[default@Tutorial] show schema; … create column family StateCity with column_type = 'Standard' and comparator = 'LongType' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and rows_cached = 0.0 and row_cache_save_period = 0 and row_cache_keys_to_save = 2147483647 and keys_cached = 200000.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 // 10 days, OMG and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; … |
2. The Compaction event (under investigation but no luck yet)
The Compaction will be triggered automatically.
But how to trigger compaction manually? Use nodetool as well.
C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial Starting NodeTool C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial Starting NodeTool |
Then we can see some logging messages in the Cassandra console.
But as I found, the tombstones are still here. (WHY???)
C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db { "4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]], "54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]], "616263": [] } |
And still appears in the list command. (KAO, 阴魂不散? Big why???)
[default@Tutorial] list statecity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) ------------------- RowKey: abc
3 Rows Returned. Elapsed time: 31 msec(s). |
在这儿咱发几句牢骚:
1. 可能是学习深度还不足的原因,感觉CLI比较弱,适合初始化建模DDL和简单的数据分析;
2. Tombstone的清理问题还没有最终得到验证,暂时挂起,权当悬案先,以后有答案了再补充、更正