Cassandra
下载
http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz
apache-cassandra-0.5.1自带的hector-0.5.0-7.jar有严重的性能问题,需要修改成hector-0.5.1-9.jar
资源
http://cassandra.apache.org/
http://wiki.apache.org/cassandra/FrontPage
部署
http://kauu.net/2010/02/27/cassandra%E5%88%9D%E4%BD%93%E9%AA%8C/
192.168.2.79
/home/bmb/apache-cassandra-0.5.1
DataModal设计
l 对同一行的所有列,可以定义根据列名的排序规则(即保存规则)。当保存某个用户相对应的朋友的时候,可以用朋友的加入时间作为一个一个的列名,列按照时间倒序拍。这样很容易获得用户最新的朋友。
测试,Keyspace1的Standard1的列按照BytesType进行排序,不管按照什么set顺序,get_slice都会获得,a,b,c的顺序
set Keyspace1.Standard1['jsmith']['c'] = 'c'
set Keyspace1.Standard1['jsmith']['a'] = 'a'
set Keyspace1.Standard1['jsmith']['b'] = 'b'
l Super columns
Super columns are a great way to store one-to-many indexes to other records: make the sub column names TimeUUIDs (or whatever you'd like to use to sort the index), and have the values be the foreign key.
不如某个用户的好友,Sub column name是好友加入时间,Sub column value是好友的ID,可以作为外键关联好友的信息表。
l 复合Key可以等效Super Columns,列名为时间
Alternatively, we could preface the status keys with the user key, which has less temporal locality. If we used user_id:status_id
as the status key, we could do range queries on the user fragment to get tweets-by-user, avoiding the need for a user_timeline
super column.
l In column-orientation, the column names are the data
l 列名TimeUUID,列值JSON格式,可以解决一些问题
l
http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
Twitter怎样使用Cassandra,Twitter的Data Model,Blog的Data Model
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
Digg提供的一个完整例子
http://wiki.apache.org/cassandra/DataModel
http://wiki.apache.org/cassandra/CassandraLimitations
http://www.hellodba.net/2010/02/cassandra.html
(中文翻译,有出入)介绍Twitter的Data Modal,有借鉴意义
修改Schema定义,2次重启
https://issues.apache.org/jira/browse/CASSANDRA-44
动态创建Column Falimy,在不重启服务器下
http://github.com/NZKoz/cassandra_object
启动单个节点的Cluster
安装JDK 6
tar -zxvf cassandra-$VERSION.tgz
cd cassandra-$VERSION
sudo mkdir -p /var/log/cassandra
sudo chown -R `whoami` /var/log/cassandra
sudo mkdir -p /var/lib/cassandra
sudo chown -R `whoami` /var/lib/cassandra
修改/bin/cassandra.in.sh里面的启动端口(-Dcom.sun.management.jmxremote.port=8080)
bin/cassandra -f
查看日志
tail -f /var/log/cassandra/system.log
客户端连接
cd /home/bmb/apache-cassandra-0.5.1
bin/cassandra-cli --host 192.168.2.79 --port 9160
cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
Value inserted.
cassandra> get Keyspace1.Standard1['jsmith']
(column=age, value=42; timestamp=1249930062801)
(column=first, value=John; timestamp=1249930053103)
(column=last, value=Smith; timestamp=1249930058345)
Returned 3 rows.
cassandra>
Java 客户端
l 原始Thrift
http://apache.freelamp.com/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz
封装
http://github.com/charliem/OCM
手动编译Java Thrift
cd D:\7g\Personal\Resources\Architecture\Cassandra\thrift-0.2.0\lib\java
ant
l hector
http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
http://github.com/rantav/hector/downloads
l OCM
http://github.com/charliem/OCM/downloads
Git
下载Git
http://kernel.org/pub/software/scm/git/git-1.7.0.3.tar.gz
安装
cd /home/bmb/apache-cassandra-0.5.1/git-1.7.0.3
./configure
make
make install
Git客户端
http://msysgit.googlecode.com/files/msysGit-fullinstall-1.7.0.2-preview20100309.exe
D:\7g\Personal\Resources\Architecture\Cassandra\msysgit\msysgit\ git-cmd.bat
Check out OCM
git clone http://github.com/charliem/OCM.git
http://tortoisegit.googlecode.com/files/TortoiseGit-1.3.6.0-32bit.msi
集群配置
http://pan-java.iteye.com/blog/604672
192.168.5.11
/u/iic/bmb/apache-cassandra-0.5.1
修改conf下面的文件
/var/log/cassandra成/u/iic/bmb/apache-cassandra-0.5.1/log
/var/lib/cassandra成/u/iic/bmb/apache-cassandra-0.5.1/log
修改bin/cassandra的Java_home
export JAVA_HOME=/u/iic/bmb/jdk6
192.168.5.12 (目录配置同5.11,都以2.79为Seed)
注意:部署集群的时候,不能把5.11整个目录搬到5.12上,不然他们Token一样,会导致Ring路5.11和5.12重复。解决方法:删除data和commit目录。
还有所有的IP必须配置成绝对IP,如果配Localhost,会使Ring不完整
192.168.2.79
/home/bmb/apache-cassandra-0.5.1
bin/cassandra -f
OCM的使用
l 定义数据结构:D:\7g\Personal\Resources\Architecture\Cassandra\Client\OCM Compiler\OCMSpecSample.txt
l 通过com.kissintellignetsystems.ocm.compiler.Compiler类,提供以下的命令行参数,生成Java对象:"OCMSpecSample.txt", "keyspace1", "Java",
"mynamespace", "output/"
l 测试例子:D:\7g\Personal\Resources\Architecture\Cassandra\Client\Output Languages\Java\TestHarness
CreatTest.java
get keyspace1.Users['charlie']
bin/cassandra-cli --host 192.168.2.79 --port 9160
bin/cassandra-cli --host 192.168.5.11 --port 9160
bin/cassandra-cli --host 192.168.5.12 --port 9160
查看集群的节点信息
bin/nodeprobe -host localhost -port 8090 ring
Hadoop & Cassandra
using Hadoop to Cassandra through Binary Memtable
http://github.com/lenn0x/Cassandra-Hadoop-BMT/blob/master/src/java/org/digg/CassandraBulkLoader.java
http://blog.csdn.net/wdwbw/archive/2010/03/10/5366739.aspx
http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
Lucene + Cassandra
Lucandra