1. zookeeper数据模型
Zookeeper拥有一个层次性名称空间,就如同一个分布式的文件系统。唯一的区别就是,名称空间中的任何节点可以和他的子节点一样,关联数据。这就如同一个文件系统中,一个文件也可以作为路径。节点的路径通常被描述为规范的,绝对的,右斜线分割的路径。没有相对路径。任何的unicode字符可以作为路径的一部分,只要遵循以下的约束:
[if !supportLists]l [endif]The null character (\u0000) cannot be part of a path name
[if !supportLists]l [endif]The following characters can't be used because they don't display well, or render in confusing ways: \u0001 - \u0019 and \u007F - \u009F
[if !supportLists]l [endif]The following characters are not allowed: \ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE - \uXFFFF (where X is a digit 1 - E), \uF0000 - \uFFFFF.
[if !supportLists]l [endif]The "." character can be used as part of another name, but "." and ".." cannot alone be used to indicate a node along a path, because ZooKeeper doesn't use relative paths. The following would be invalid: "/a/b/./c" or "/a/b/../c".
[if !supportLists]l [endif]The token "zookeeper" is reserved.
1.1 ZNode,数据节点
Zookeeper中的每个节点被描述为znode。Znode维护一个静态结构,这个静态结构包含数据变化的版本号和访问控制信息。这个静态结构同样拥有一个时间戳。这个版本号和时间戳一起,允许zookeeper验证数据的缓存和协同更新。
Znode是编程者访问的主要实体,他有以下几个特征值得描述:
[if !supportLists]l [endif]Watches
[if !supportLists]l [endif]Data Access, ZooKeeper was not designed to be a general database or large object store. Instead, it manages coordination data. This data can come in the form of configuration, status information, rendezvous, etc. A common property of the various forms of coordination data is that they are relatively small: measured in kilobytes. The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less than 1M of data, but the data should be much less than that on average. Operating on relatively large data sizes will cause some operations to take much more time than others and will affect the latencies of some operations because of the extra time needed to move more data over the network and onto storage media. If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.
[if !supportLists]l [endif]Ephemeral Nodes
[if !supportLists]l [endif]Sequence Nodes -Unique Naming
1.2 Time In Zookeeper
Zookeeper追踪时间通过以下集中方式:
[if !supportLists]l [endif]Zxid(Zookeeper transaction id) 对于zookeeper的每一次变化,都会收到一个zxid形式的邮戳。这个标识了zookeeper中的所有变化顺序。
[if !supportLists]l [endif]Version Numbers对于节点的每一次变化,都会导致这个节点的版本号增加。
[if !supportLists]l [endif]Ticks. When using multi-server ZooKeeper, servers use ticks to define timing of events such as status uploads, session timeouts, connection timeouts between peers, etc.
[if !supportLists]l [endif]RealTime除了将时间戳写入znode的创建和修改时间外,zookeeper不使用任何绝对时间和时钟时间。(ZooKeeper doesn't use real time, or clock time, at all except to put timestamps into the stat structure on znode creation and znode modification.)
1.3 Zookeeper stat structure
Zookeeper中任何节点的静态结构由以下几部分组成:
[if !supportLists]l [endif]Czxid,创建这个节点时的zxid
[if !supportLists]l [endif]Mzid,上次修改这个节点时的zxid
[if !supportLists]l [endif]Ctime,节点创建时的时间,微秒
[if !supportLists]l [endif]Mtime,节点的修改时间,微秒
[if !supportLists]l [endif]Version,节点的版本号
[if !supportLists]l [endif]Cversion,节点的子节点变化的版本号
[if !supportLists]l [endif]Aversion,节点的acl变化的版本号
[if !supportLists]l [endif]EphemeralOwner,创建这个节点对应的session的id
[if !supportLists]l [endif]dataLength,这个节点对应数据的长度
[if !supportLists]l [endif]numChildren,这个节点的子节点数量。
[if !supportLists]2. [endif]Zookeeper watches
Zookeeper中的所有读取操作,getData(),getChindren(),exists()有一个选项设置watches。一个观察事件是一次触发,发送给客户端,当观察的数据发生变化时(A watch event is one-time trigger sent to the client that set the watch,which occurs when the data for which the watch was set changes)。
3 zookeeper配置项介绍
①.tickTime:CS通信心跳数
Zookeeper服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。tickTime以毫秒为单位。
tickTime=2000
②.initLimit:LF初始通信时限
集群中的follower服务器(F)与leader服务器(L)之间初始连接时能容忍的最多心跳数(tickTime的数量)。
initLimit=5
③.syncLimit:LF同步通信时限
集群中的follower服务器与leader服务器之间请求和应答之间能容忍的最多心跳数(tickTime的数量)。
syncLimit=2
④.dataDir:数据文件目录
Zookeeper保存数据的目录,默认情况下,Zookeeper将写数据的日志文件也保存在这个目录里。
dataDir=/home/michael/opt/zookeeper/data
⑤.dataLogDir:日志文件目录
Zookeeper保存日志文件的目录。
dataLogDir=/home/michael/opt/zookeeper/log
⑥.clientPort:客户端连接端口
客户端连接Zookeeper服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
clientPort=2333
⑦.服务器名称与地址:集群信息(服务器编号,服务器地址,LF通信端口,选举端口)
这个配置项的书写格式比较特殊,规则如下:
server.N=YYY:A:B
其中N表示服务器编号,YYY表示服务器的IP地址,A为LF通信端口,表示该服务器与集群中的leader交换的信息的端口。B为选举端口,表示选举新leader时服务器间相互通信的端口(当leader挂掉时,其余服务器会相互通信,选择出新的leader)。一般来说,集群中每个服务器的A端口都是一样,每个服务器的B端口也是一样。
4单机模式启动zookeeper
修改安装包conf/zoo_sample.cfg为zoo.cfg,执行bin目录下的zkServer.sh脚本,命令为./zkServer.sh start
5单机集群模式启动zookeeper
修改实例一的配置文件为:
tickTime=2000
dataDir=/opt/zookeeper1/data
clientPort=2181
initLimit=10
syncLimit=5
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
修改实例二的配置文件为:
tickTime=2000
dataDir=/opt/zookeeper2/data
clientPort=2182
initLimit=10
syncLimit=5
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
修改实例三的配置文件为:
tickTime=2000
dataDir=/opt/zookeeper3/data
clientPort=2183
initLimit=10
syncLimit=5
server.1=127.0.0.1:2881:3881
server.2=127.0.0.1:2882:3882
server.3=127.0.0.1:2883:3883
创建myid文件:
echo 1 > /opt/zookeeper1/data/myid
echo 2 > /opt/zookeeper2/data/myid
echo 3 > /opt/zookeeper3/data/myid
myid文件中的内容即为zookeeper配置文件server.1=127.0.0.1:2881:3881中的server.后面的部分,表示zookeeper的实例id。
6单线程与多线程
zookeeper c客户端分别提供了单线程的库与多线程的库,分别叫zookeeper_st 和 zookeeper_mt。
zookeeper_st主要用于不支持pthread 或者支持不完善的系统环境, zookeeper_st只提供异步操作的api, 应用程序需要在自己的事件循环中,调用api进行相关操作。
zookeeper_mt会启动两个独立线程:io线程与completion线程, 封装了内部的事件循环,分别提供了同步和异步的api, 一般情况下建议使用这个库。
7同步与异步
zookeeper_st仅提供异步的api, zookeeper_mt提供同步与异步的api.
先说一下同步api, zookeeper_mt的同步:主线程调用完对应api后等待,io线程完成对应的操作后,通知主线程。
关于异步api, zookeeper_st,zookeeper_mt都提供,但是各自的调用方式不同。zookeeper_mt主线程执行完api调用后立即返回, completion线程完成回调函数的执行;但是zookeeper_st是单线程如果达到异步的效果呢?zookeeper_st提供了对应的事件处理函数: zookeeper_interest、zookeeper_process, 应用程序可以根据这两个api完成对应的事件循环,进行异步api的处理(可参见cli_st的实现)
fon�{�R��*