Primary Key实际上是一个非常通用的概念。在Cassandra中,其表示用来从Cassandra中取得数据的一个或多个列:
create table sample (
key text PRIMARY KEY,
data text
);
在上面的示例中,我们指定了key域作为sample的PRIMARY KEY。而在需要的情况下,一个Primary Key也可以由多个列共同组成:
create table sample {
key_one text,
key_two text,
data text,
PRIMARY KEY(key_one, key_two)
};
在上面的示例中,我们所创建的Primary Key就是一个由两个列key_one和key_two组成的Composite Key。其中该Composite Key的第一个组成被称为是Partition Key,而后面的各组成则被称为是Clustering Key。Partition Key用来决定Cassandra会使用集群中的哪个结点来记录该数据,每个Partition Key对应着一个特定的Partition。而Clustering Key则用来在Partition内部排序。如果一个Primary Key只包含一个域,那么其将只拥有Partition Key而没有Clustering Key。
Partition Key和Clustering Key同样也可以由多个列组成:
create table sample (
key_primary_one text,
key_primary_two text,
key_cluster_one text,
key_cluster_two text,
data text,
PRIMARY KEY((key_primary_one, key_primary_two), key_cluster_one, key_cluster_two)
);
Partition Key : Cassandra会对partition key 做一个hash计算,并自己决定将这一条记录放在哪个node;
Clustering Key : 主要用于进行Range Query. 并且使用的时候需要按照建表顺序进行提供信息;
Cassandra对查询的支持很弱,只支持主键列及索引列的查询,而且主键列还有各种限制,不过查询弱归弱,但它还是支持索引和排序的。
cassandra:
索引列 支持 like
只有主键支持 group by
主键的限制:
• cassandra的查询必须在主键列上,或者查询的字段有二级索引。
• 对于(A,B)形式的主键,假如查询条件不带分区键A,则查询语句需要开启allow filtering。
• 对于((A,B),C,D)形式的主键,可以认为是第2点的变种。A,B必须同时出现在查询条件中(A和B合在一起决定分区),且C,D不可以跳跃,像where A and B and D的查询是非法的。
• 以上查询不考虑范围查询的情况。
• 所以因为第三点的关系,parition key字段过多会对以后的查询造成很大困扰,在建表的时候首先一定要考虑好数据模型,以免后期掉坑。
• 此外假如与spark集成的话,可以在一定程度上规避掉上面非法查询的问题,通过sparksql可以近似实现关系型数据库sql的查询,而不用考虑查询中一定要带上所有partition key字段
二级索引:
Cassandra之中的索引的实现相对MySQL的索引来说就要简单粗暴很多了。他实际上是自动偷偷新创建了一张表格,同时将原始表格之中的索引字段作为新索引表的Primary Key!并且存储的值为原始数据的Primary Key
cqlsh> CREATE KEYSPACE mydemo WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
cqlsh> use mydemo;
cqlsh:mydemo> create table sample (
key_primary_one text,
key_primary_two text,
key_cluster_one text,
key_cluster_two text,
data text,
PRIMARY KEY((key_primary_one, key_primary_two), key_cluster_one, key_cluster_two)
);
---插入数据
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('a','b','a','b','test1');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('a','c','c','d','test2');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('a','d','e','f','test3');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('b','a','g','h','test4');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('b','c','i','j','test5');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('b','d','k','l','test6');
insert into sample(key_primary_one,key_primary_two,key_cluster_one,key_cluster_two,data) values('c','a','m','n','test7');
cqlsh:mydemo> select * from sample;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
b | d | k | l | test6
a | c | c | d | test2
b | c | i | j | test5
a | d | e | f | test3
b | a | g | h | test4
a | b | a | b | test1
c | a | m | n | test7
(7 rows)
cqlsh:mydemo>
cqlsh:mydemo> select * from sample where key_primary_one='a' and key_cluster_one='c';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:mydemo> select * from sample where key_primary_one='a' and key_cluster_one='c' ALLOW FILTERING;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | c | c | d | test2
(1 rows)
cqlsh:mydemo>
cqlsh:mydemo> select * from sample where key_primary_one='a' ;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:mydemo> select * from sample where key_primary_one='a' and key_primary_two='b';
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | b | a | b | test1
(1 rows)
-----查询中的where子句,必须包含Partition key
cqlsh:mydemo> select * from sample where key_primary_one='a' and key_primary_two='b' and key_cluster_two='b';
InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY column "key_cluster_two" cannot be restricted as preceding column "key_cluster_one" is not restricted"
cqlsh:mydemo> select * from sample where key_primary_one='a' and key_primary_two='b' and key_cluster_one='a';
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | b | a | b | test1
(1 rows)
qlsh:mydemo> select * from sample where key_primary_one='a' and key_primary_two='b' and key_cluster_one='a' and key_cluster_two='b';
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | b | a | b | test1
(1 rows)
cqlsh:mydemo>
-----如果有多个Clustering key,不可以跳跃,本示例中不能越过key_cluster_one
cqlsh:mydemo> select * from sample where data='test1';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:mydemo> select * from sample where data='test1' ALLOW FILTERING;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | b | a | b | test1
(1 rows)
cqlsh:mydemo>
cqlsh:mydemo> create index ind_sam on sample(data);
cqlsh:mydemo> select * from sample where data='test1';
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
a | b | a | b | test1
(1 rows)
cqlsh:mydemo>
cqlsh:mydemo> select * from sample;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+-------
b | d | k | l | test6
a | c | c | d | test2
b | c | i | j | test5
a | d | e | f | test3
b | a | g | h | test4
a | b | a | b | test1
c | a | m | n | test7
(7 rows)
-----Cassandra支持二级索引列的查询
cqlsh:mydemo> update sample set data='test100' where key_primary_one='c' and key_primary_two='a';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some clustering keys are missing: key_cluster_one, key_cluster_two"
cqlsh:mydemo> update sample set data='test100' where key_primary_one='c' and key_primary_two='a' and key_cluster_one='m' and key_cluster_two='n';
cqlsh:mydemo> select * from sample;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+---------
b | d | k | l | test6
a | c | c | d | test2
b | c | i | j | test5
a | d | e | f | test3
b | a | g | h | test4
a | b | a | b | test1
c | a | m | n | test100
(7 rows)
cqlsh:mydemo>
cqlsh:mydemo> update sample set key_cluster_two='z' where key_primary_one='c' and key_primary_two='a';
InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY part key_cluster_two found in SET part"
cqlsh:mydemo> update sample set key_cluster_two='z' where key_primary_one='c' and key_primary_two='a' and key_cluster_one='m' and key_cluster_two='n';
InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY part key_cluster_two found in SET part"
cqlsh:mydemo> select * from sample;
key_primary_one | key_primary_two | key_cluster_one | key_cluster_two | data
-----------------+-----------------+-----------------+-----------------+---------
b | d | k | l | test6
a | c | c | d | test2
b | c | i | j | test5
a | d | e | f | test3
b | a | g | h | test4
a | b | a | b | test1
c | a | m | n | test100
(7 rows)
cqlsh:mydemo>
-----Cassandra中不支持修改主键列