Cassandra数据库学习

http://wayneshawn.github.io/2015/04/07/Cassandra-get-started/


在线资源

Cassandra Getting Started

  • 2010-07-15 分布式 Key-Value 存储系统:Cassandra 入门
  • 2015-03-25 Apache Cassandra Wiki
    -DATASTAX Documentation
    -Cassandra2.x中文教程系列Blog

Python Cassandra-driver

  • cassandra-driver 2.5.0

单节点Cassandra使用示范

1.启动Cassandra

若未设置环境变量,进入到Cassandra的bin目录下
[root@server1 bin]# ./cassandra -f
若未使用-f选项,Cassandra会作为daemon进程运行。

2.使用cqlsh连接本地Cassandra

[root@server1 bin]# ./cqlsh -f

[root@server1 bin]# ./cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.

cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> use mykeyspace ;

cqlsh:mykeyspace> create table users( name text primary key, age int, email text );

cqlsh:mykeyspace> insert into users(name, age, email) values('wayne', 21, '[email protected]');
cqlsh:mykeyspace> insert into users(name, age, email) values('kerr', 22, '[email protected]');
cqlsh:mykeyspace>  

cqlsh:mykeyspace> select * from users;
name   | age | email
--------+-----+-------------------
   kerr |  22 |  singleon@126.com
 lambda |  20 | 227089@qq.com
  wayne |  21 |  leon@126.com

CQL指代Cassandra Query Language。

3.使用Cassandra-driver示例cassandraDriverTest.py

from cassandra.cluster import Cluster

cluster = Cluster()
session = cluster.connect('mykeyspace')

#1.you should use %s for all types of arguments
#2.second argument should be a sequence, one element tuple should be ('blah',)
session.execute('INSERT INTO users(name, age, email) VALUES(%s, %s, %s)', ('shawn', 21, '[email protected]'))

rows = session.execute('SELECT name, age, email FROM users')
for (name, age, email) in rows:
        print name, age, email

4.关闭Cassandra进程

可以使用ps -ef|grep cassandra来查找其进程id,然后kill掉。

简单的两节点Cassandra集群配置

参考资源
-Initializing a multiple node cluster (single data center)
-简单配置cassandra集群

0.实验环境

VMware9.0.2,CentOS 6.5 64bits,Cassandra 2.0.13

1.先假定在如下系统上都安装了Cassandra

node0 192.168.56.100 (seed)
node1 192.168.56.201

2.更改防火墙设置或者直接关闭防火墙

对于CentOS,$setup进入设置(图形界面),可以关闭防火墙

3.关闭Cassandra进程并清除数据

$ps -ef|grep cassandra
$kill pid
$rm -rf /var/lib/cassandra/data/system/*

4.设置/conf/cassandra.yaml

node0:

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
	parameters:
		 - seeds: "192.168.56.100"

listen_address: 192.168.56.100
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch

node1:

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
	parameters:
		 - seeds: "192.168.56.100"

listen_address: 192.168.56.201
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch

5.设置/conf/cassandra-rackdc.properties

例如:

# indicate the rack and dc for this node
dc=DC1
rack=RAC1

6.启动Cassandra

在我的实验中,node0的主机名为master,node1的主机名为slave1.之所以这样起,因为最初是安装一个hadoop集群配置教程来设置的。对于VMware搭建Cassandra集群来说,关键在于两个能ping通的虚拟机。
先启动node0的Cassandra
[root@master bin]# ./cassandra

再启动node1的Cassandra
[root@slave1 bin]# ./cassandra

7.检查ring是否在运行

列出来的节点状态应该UN(UP Normal)

[root@master bin]# ./nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.56.201  74.89 KB   256     100.0%            e6121751-682e-4833-8de7-718eac08e718  RAC1
UN  192.168.56.100  105.21 KB  256     100.0%            a153a679-5add-4995-adbf-

8.测试

在之前节点的测试中,我已经在mykeyspace的users表中插入了4条记录。
现在我们在node0中插入第五条记录.

[root@master bin]# ./cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use mykeyspace ;
cqlsh:mykeyspace> select * from users;

 name   | age | email
--------+-----+-------------------
   kerr |  22 |  [email protected]
 lambda |  20 | [email protected]
  wayne |  21 |  [email protected]
  shawn |  21 |     [email protected]

(4 rows)

cqlsh:mykeyspace> insert into users(name, age, email) values('slave', 40, '[email protected]');
cqlsh:mykeyspace> select * from users;

 name   | age | email
--------+-----+-------------------
  slave |  40 |      [email protected]
   kerr |  22 |  [email protected]
 lambda |  20 | [email protected]
  wayne |  21 |  [email protected]
  shawn |  21 |     [email protected]

(5 rows)

cqlsh:mykeyspace>

接下来,我们在node1进行查询,由于node1之前是使用VMware的clone功能从master拷贝来并作相应修改的,因此node1最初也在users表中有4条记录。现在我们去验证是否增加了一条记录。

[root@slave1 bin]# ./cassandra-cli -h 192.168.56.201
Connected to: "Test Cluster" on 192.168.56.201/9160
Welcome to Cassandra CLI version 2.0.13

The CLI is deprecated and will be removed in Cassandra 3.0.  Consider migrating to cqlsh.
CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default@mykeyspace]
[default@mykeyspace] list users;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: slave
=> (name=, value=, timestamp=1428733896613000)
=> (name=age, value=00000028, timestamp=1428733896613000)
=> (name=email, value=7a777878403132362e636f6d, timestamp=1428733896613000)
-------------------
RowKey: kerr
=> (name=, value=, timestamp=1428733672723000)
=> (name=age, value=00000016, timestamp=1428733672723000)
=> (name=email, value=73696e676c656f6e403132362e636f6d, timestamp=1428733672723000)
-------------------
RowKey: lambda
=> (name=, value=, timestamp=1428414359621000)
=> (name=age, value=00000014, timestamp=1428414359621000)
=> (name=email, value=323237303839313030314071712e636f6d, timestamp=1428414359621000)
-------------------
RowKey: wayne
=> (name=, value=, timestamp=1428733660801000)
=> (name=age, value=00000015, timestamp=1428733660801000)
=> (name=email, value=6c656f6e5f73696e403132362e636f6d, timestamp=1428733660801000)
-------------------
RowKey: shawn
=> (name=, value=, timestamp=1428417278072000)
=> (name=age, value=00000015, timestamp=1428417278072000)
=> (name=email, value=736861776e403136332e636f6d, timestamp=1428417278072000)

5 Rows Returned.
Elapsed time: 572 msec(s).

运行程序cassandraDriverTest.py,也能看到新增加了一条记录‘slave’

[Kerr@slave1 ~]$ python cassandraDriverTest.py 
slave 40 [email protected]
kerr 22 [email protected]
lambda 20 [email protected]
wayne 21 [email protected]
shawn 21 [email protected]

多节点Cassandra配置的地址问题

情景:搭建了3节点Cassandra集群,IP分别为172.16.37.17,172.16.37.18,172.16.37.19(seed 为172.16.37.18).只启动18和19上的Cassandra,那么从17节点能否使用Cassandra-driver连接数据库并查询?(节点之间互相能ping通的)

配置1

IP 172.16.37.18
seeds: "172.16.37.18"
listen_address: c37b18
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.17
seeds: "172.16.37.18"
listen_address: c37b17
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.19
seeds: "172.16.37.18"
listen_address: c37b19
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序

from cassandra.cluster import Cluster
cluster = Cluster(['c37b18','c37b19'])
session = cluster.connect('lsflog')
res = session.execute('SELECT * FROM jcleanlog')
print res

结果:
session = cluster.connect('lsflog') File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 756, in connect self.control_connection.connect() File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1867, in connect self._set_new_connection(self._reconnect_internal()) File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1902, in _reconnect_internal raise NoHostAvailable("Unable to connect to any servers", errors) cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'c37b18': error(111, "Tried connecting to [('172.16.37.18', 9042)]. Last error: Connection refused"), 'c37b19': error(111, "Tried connecting to [('172.16.37.19', 9042)]. Last error: Connection refused")})

相关知识(添加于20150513)

broadcast_rpc_address

  • The broadcast_rpc_address should be an IP address that drivers/clients can connect to.link
  • RPC address to broadcast to ·drivers· and ·other Cassandra nodes·. This cannot be set to 0.0.0.0. If left blank, this will be set to the value of rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set.(/conf/cassandra.yaml)
  • 如果不设置broadcast_rpc_address,它会默认与设置的rpc_address相同。

rpc_address

  • unset:
    Resolves the address using the hostname configuration of the node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.
  • 0.0.0.0:
    Listens on all configured interfaces, but you must set the broadcast_rpc_address to a value other than 0.0.0.0.
  • IP address
  • hostname

关于Cassandra 的Port使用(link)

  • 7199 - JMX (was 8080 pre Cassandra 0.8.xx)
  • 7000 - Internode communication (not used if TLS enabled)
  • 7001 - TLS Internode communication (used if TLS enabled)
  • 9160 - Thift client API
  • 9042 - CQL native transport port

关于nodetool的使用

  • 从node1尝试./nodetool <-h node2-ip> Connection refused
    我目前只能在启动了Cassandra的节点上使用./nodetool status
    比如我尝试从.17节点指定-h 172.16.37.18会Failed to connect to '172.16.37.18:7199' - ConnectException: 'Connection refused'.
    值得注意的是从18节点自己来
  • ./nodetool status正常
  • ./nodetool -h 172.16.37.18 status Connection refused
  • ./nodetool -h localhost status 正常
    似乎跟JMX设置有关
    stackoverflow problem1
    /conf/cassandra-env.sh中有如下语句
    # jmx: metrics and administration interface
    #
    # add this if you're having trouble connecting:
    # JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
    #
    # see
    # https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole
    # for more on configuring JMX through firewalls, etc. (Short version:
    # get it working with no firewall first.)
    #
    # Cassandra ships with JMX accessible *only* from localhost.
    # To enable remote JMX connections, uncomment lines below
    # with authentication and/or ssl enabled. See https://wiki.apache.org/cassandra/JmxSecurity
    #
    LOCAL_JMX=yes
    if [ "$LOCAL_JMX" = "yes" ]; then
      JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"
    else
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
    fi
    

注意上述中JMX accessible *only* from localhost我尝试注释掉LOCAL_JMX=yes,并将后面的需要authenticate的语句注释掉,但是还是会报错。Error: Password file not found: /etc/cassandra/jmxremote.password
还需要进一步阅读关于jmx的文档。

配置2

IP 172.16.37.18
seeds: "172.16.37.18"
listen_address: c37b18
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.18
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.17
seeds: "172.16.37.18"
listen_address: c37b17
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.17
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.19
seeds: "172.16.37.18"
listen_address: c37b19
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.19
endpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序运行结果
[Row(job_id=1, event_time=2, idx=0)]

你可能感兴趣的:(Cassandra数据库学习)