/
下载蛋疼
axel -n 20 http://mirrors.hust.edu.cn/apache/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
下载16s搞定
axel -n 20 http://mirrors.hust.edu.cn/apache/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
tar xzvf apache-cassandra-3.10-bin.tar.gz
cd apache-cassandra-3.10/
ls
bin/cassandra -f
bin/cassandra(后台)
ps -ef|grep cassandra
history
pycassa安装
~/.pip/pip.conf
[global]
index-url = http://pypi.douban.com/simple/
sudo pip install pycassa
cassandra初识
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> help
Documented shell commands:
===========================
CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODE
CLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACING
CQL help topics:
================
AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT
ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME
ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMP
ALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATE
ALTER_TYPE CREATE_TRIGGER GRANT TYPES
ALTER_USER CREATE_TYPE INSERT UPDATE
APPLY CREATE_USER INSERT_JSON USE
ASCII DATE INT UUID
BATCH DELETE JSON
BEGIN DROP_AGGREGATE KEYWORDS
BLOB DROP_COLUMNFAMILY LIST_PERMISSIONS
BOOLEAN DROP_FUNCTION LIST_ROLES
COUNTER DROP_INDEX LIST_USERS
CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS
CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE
CREATE_FUNCTION DROP_ROLE SELECT
CREATE_INDEX DROP_TABLE SELECT_JSON
cqlsh> help CREATE_KEYSPACE
cqlsh> Created new window in existing browser session.
基础
Cassandra是一个开源的分布式数据库,结合了Dynamo的Key/Value与Bigtable的面向列的特点。
Cassandra的特点如下:
1.灵活的schema:不需要象数据库一样预先设计schema,增加或者删除字段非常方便(on the fly)。
2.支持range查询:可以对Key进行范围查询。
3.高可用,可扩展:单点故障不影响集群服务,可线性扩展。
我们可以将Cassandra的数据模型想象成一个四维或者五维的Hash。
**查看用户下信息:
describe cluster;
desc cluster;
**查看所有keyspace:
describe keyspaces;
desc keyspaces;
**查看keyspace内容:
describe keyspace knet; --(knet为键空间)
desc keyspace knet; --(knet为键空间)
**创建keyspace:
CREATE KEYSPACE knet WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
Replication Factor : 复制因数。 表示一份数据在一个DC 之中包含几份。常用奇数~ 比如我们项目组设置的replication_factor=3
Replica placement strategy : 复制策略。 默认的是SimpleStrategy. 如果是单机架、单数据中心的模式,保持使用SimpleStrtegy即可。
**使用keyspace:
use knet ;
**查示所有表:
describe tables;
desc tables;
**查看表结构:
describe columnfamaliy abc;
desc table stocks
**创建表:
create table abc ( id int primary key, name varchar, age int );
**表删除:
drop table user;
**COLUMN
Column是Cassandra中最小的数据单元。它是一个3元的数据类型,包含:name,value和timestamp。
将一个Column用JSON的形式表现出来如下:
1: { // 这是一个column
2: name: "逖靖寒的世界",
3: value: "[email protected]",
4: timestamp: 123456789
5: }
为了简单起见,我们可以忽略timestamp。就把column想象成一个name/value即可。
注意,这里提到的name和value都是byte[]类型的,长度不限。
**SUPERCOLUMN
我们可以将SuperColumn想象成Column的数组,它包含一个name,以及一系列相应的Column。
将一个SuperColumn用JSON的形式表现如下:
1: { // 这是一个SuperColumn
2: name: "逖靖寒的世界",
3: // 包含一系列的Columns
4: value: {
5: street: {name: "street", value: "1234 x street", timestamp: 123456789},
6: city: {name: "city", value: "san francisco", timestamp: 123456789},
7: zip: {name: "zip", value: "94107", timestamp: 123456789},
8: }
9: }
Columns和SuperColumns都是name与value的组合。最大的不同在于Column的value是一个“string”,而SuperColumn的value是Columns的Map。
还有一点需要注意的是:SuperColumn’本身是不包含timestamp的。
**COLUMNFAMILY
ColumnFamily是一个包含了许多Row的结构,你可以将它想象成RDBMS中的Table。
每一个Row都包含有client提供的Key以及和该Key关联的一系列Column。
我们可以看看结构:
1: UserProfile = { // 这是一个ColumnFamily
2: phatduckk: { // 这是对应ColumnFamily的key
3: // 这是Key下对应的Column
4: username: "gpcuster",
5: email: "[email protected]",
6: phone: "6666"
7: }, // 第一个row结束
8: ieure: { // 这是ColumnFamily的另一个key
9: //这是另一个Key对应的column
10: username: "pengguo",
11: email: "[email protected]",
12: phone: "888"
13: age: "66"
14: },
15: }
ColumnFamily的类型可以为Standard,也可以是Super类型。
我们刚刚看到的那个例子是一个Standard类型的ColumnFamily。Standard类型的ColumnFamily包含了一系列的Columns(不是SuperColumn)。
Super类型的ColumnFamily包含了一系列的SuperColumn,但是并不能像SuperColumn那样包含一系列Standard ColumnFamily。
这是一个简单的例子:
1: AddressBook = { // 这是一个Super类型的ColumnFamily
2: phatduckk: { // key
3: friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
4: John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
5: Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
6: Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
7: Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
8: ...
9: }, // row结束
10: ieure: { // key
11: joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
12: William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
13: },
14: }
KEYSPACE
Keyspace是我们的数据最外层,你所有的ColumnFamily都属于某一个Keyspace。一般来说,我们的一个程序应用只会有一个Keyspace。
简单测试
我们将Cassandra运行起来以后,启动命令行,执行如下操作:
cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
Value inserted.
这个时候,Cassandra中就已经有3条数据了。
其中插入数据的各个字段含义如下:
image
接下来,我们执行查询操作:
cassandra> get Keyspace1.Standard1['jsmith']
(column=age, value=42; timestamp=1249930062801)
(column=first, value=John; timestamp=1249930053103)
(column=last, value=Smith; timestamp=1249930058345)
Returned 3 rows.
这样,我们就可以将之前插入的数据查询出来了。
排序
有一点需要明确,我们使用Cassandra的时候,数据在写入的时候就已经排好顺序了。
在某一个Key内的所有Column都是按照它的Name来排序的。我们可以在storage-conf.xml文件中指定排序的类型。
目前Cassandra提供的排序类型有:BytesType, UTF8Type,LexicalUUIDType, TimeUUIDType, AsciiType,和LongType。
现在假设你的原始数据如下:
{name: 123, value: "hello there"},
{name: 832416, value: "kjjkbcjkcbbd"},
{name: 3, value: "101010101010"},
{name: 976, value: "kjjkbcjkcbbd"}
当我们storage-conf.xml文件中指定排序的类型为LongType时:
排序后的数据就是这样的:
{name: 3, value: "101010101010"},
{name: 123, value: "hello there"},
{name: 976, value: "kjjkbcjkcbbd"},
{name: 832416, value: "kjjkbcjkcbbd"}
如果我们指定排序的类型为UTF8Type
排序后的数据就是这样的:
{name: 123, value: "hello there"},
{name: 3, value: "101010101010"},
{name: 832416, value: "kjjkbcjkcbbd"},
{name: 976, value: "kjjkbcjkcbbd"}
大家可以看到,指定的排序类型不一样,排序的结果也是完全不同的。
对于SuperColumn,我们有一个额外的排序维度,所以我们可以指定CompareSubcolumnsWith来进行另一个维度的排序类型。
假设我们的原始数据如下:
{ // first SuperColumn from a Row
name: "workAddress",
// and the columns within it
value: {
street: {name: "street", value: "1234 x street"},
city: {name: "city", value: "san francisco"},
zip: {name: "zip", value: "94107"}
}
},
{ // another SuperColumn from same Row
name: "homeAddress",
// and the columns within it
value: {
street: {name: "street", value: "1234 x street"},
city: {name: "city", value: "san francisco"},
zip: {name: "zip", value: "94107"}
}
}
然后我们定义CompareSubcolumnsWith和CompareWith的排序类型都是UTF8Type,那么排序后的结果为:
{
// this one's first b/c when treated as UTF8 strings
{ // another SuperColumn from same Row
// This Row comes first b/c "homeAddress" is before "workAddress"
name: "homeAddress",
// the columns within this SC are also sorted by their names too
value: {
// see, these are sorted by Column name too
city: {name: "city", value: "san francisco"},
street: {name: "street", value: "1234 x street"},
zip: {name: "zip", value: "94107"}
}
},
name: "workAddress",
value: {
// the columns within this SC are also sorted by their names too
city: {name: "city", value: "san francisco"},
street: {name: "street", value: "1234 x street"},
zip: {name: "zip", value: "94107"}
}
}
练手
cqlsh> desc keyspaces;
knet system_schema system_auth system system_distributed system_traces
cqlsh> CREATE KEYSPACE knet WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> use knet;
cqlsh:knet> desc tables;
cqlsh:knet> create table userUser ( id int primary key, name varchar, age int );
cqlsh:knet> desc table userUser;
CREATE TABLE knet.useruser (
id int PRIMARY KEY,
age int,
name text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
cqlsh:knet>
正式(严肃脸
import pycassa
from pycassa.pool import ConnectionPool
pool = ConnectionPool('idmapping', ['localhost:9160'])
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily
pool = ConnectionPool('idmapping')
col_fam = ColumnFamily(pool, 'imeitoimsi')