Hadoop Zookeeper HBase集群

注:Hadoop环境搭建请参考上一篇文档。

 

环境:

10.0.30.235 nn0001 NameNode/HBase HMaster

10.0.30.236 snn0001 SecondaryNameNode/HBase HMaster

10.0.30.237 dn0001 DataNode/Zookeeper/HBase HRegionServer

10.0.30.238 dn0002 DataNode/Zookeeper/HBase HRegionServer

10.0.30.239 dn0003 DataNode/Zookeeper/HBase HRegionServer

 

(生产环境需要把zookeeper单独安装)

 

集群的启动顺序:Hadoop-->Zookeeper-->HBase Master

 

下载zookeeper-3.3.4.tar.gz

[root@nn0001 conf]# tar zxvf zookeeper-3.3.4.tar.gz

 

[root@nn0001 conf]# cp zoo_sample.cfg zoo.cfg

 

Running ZooKeeper in standalone mode is convenient for evaluation, some development, and testing. But in production, you should run ZooKeeper in replicated mode.

 

[root@nn0001 conf]# vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/hadoop/zookeeper
# the port at which the clients will connect
clientPort=2181

server.1=dn0001:2888:3888
server.2=dn0002:2888:3888
server.3=dn0003:2888:3888

 

注:server.A=B:C:D:其中 A 是一个数字,表示这个是第几号服务器;B 是这个服务器的 ip 地址;C 表示的是这个服务器与集群中的 Leader 服务器交换信息的端口;D 表示的是万一集群中的 Leader 服务器挂了,需要一个端口来重新进行选举,选出一个新的 Leader,而这个端口就是用来执行选举时服务器相互通信的端口。如果是伪集群的配置方式,由于 B 都是一样,所以不同的 Zookeeper 实例通信端口号不能一样,所以要给它们分配不同的端口号。

 

除了修改 zoo.cfg 配置文件,集群模式下还要配置一个文件 myid,这个文件在 dataDir 目录下,这个文件里面就有一个数据就是 A 的值,Zookeeper 启动时会读取这个文件,拿到里面的数据与 zoo.cfg 里面的配置信息比较从而判断到底是那个 server。

 

在dn0001/dn0002/dn0003三台服务器的dataDir下面新建myid

里面的值分别为1、2、3,对应上面的数字,而且一定是数字

 

把配置好的zookeeper-3.3.4拷贝到dn0001/dn0002/dn0003服务器

 

分别在三台服务器上执行

[root@nn0001 bin]# ./zkServer.sh start
JMX enabled by default
Using config: /download/zookeeper-3.3.4/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

 

[root@nn0001 bin]# ./zkServer.sh status
JMX enabled by default
Using config: /download/zookeeper-3.3.4/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

 

查看zookeeper状态,会出现以下错误,这是因为在CentOS 5.6上,nc版本不同,没有-q参数,更改脚本去掉-q 1即可

[root@nn0001 bin]# echo stat|nc -q 1 localhost
nc: invalid option -- q
usage: nc [-46DdhklnrStUuvzC] [-i interval] [-p source_port]
          [-s source_ip_address] [-T ToS] [-w timeout] [-X proxy_version]
          [-x proxy_address[:port]] [hostname] [port[s]]
-bash: echo: write error: Broken pipe

 

另外,可以通过echo stat|nc localhost 2181来查看状态 

[root@nn0001 bin]# echo stat | nc localhost 2181
Zookeeper version: 3.3.3-1203054, built on 11/17/2011 05:47 GMT
Clients:
 /127.0.0.1:34378[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Outstanding: 0
Zxid: 0x100000000
Mode: follower
Node count: 4

 

 

配置HBase(所有服务器)

[root@nn0001 conf]# vim hbase-env.sh

export JAVA_HOME=/usr/java/jdk1.6.0_26

export HBASE_MANAGES_ZK=false

 

[root@nn0001 conf]# vim hbase-site.xml

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://nn0001:9000/hbase</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.master</name>
                <value>nn0001</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>dn0001,dn0002,dn0003</value>
        </property>
        <property>
                <name>zookeeper.session.timeout</name>
                <value>60000</value>
        </property>
        <!--<property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
        </property>
        <property>
                <name>dfs.datanode.max.xcievers</name>
                <value>4096</value>
        </property>-->
        <property>
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/hadoop/zookeeper</value>
        </property>
</configuration>

 

注意两个重要的配置参数:

第一个是由于在hadoop 0.20.205.x之前的版本没有持久的同步机制,导致hbase丢失数据

<property>
 <name>dfs.support.append</name>
 <value>true</value>
</property>

第二个是由于datanode每次接收服务的文件数量有一个上限值
<property>
 <name>dfs.datanode.max.xcievers</name>
 <value>4096</value>
</property>

具体说明详见:

http://hbase.apache.org/book/hadoop.html

 

[root@nn0001 conf]# vim regionservers

dn0001
dn0002
dn0003

 

在namenode上启动hbase

[root@nn0001 bin]# ./start-hbase.sh

 

java.io.IOException: Call to nn0001/10.0.30.235:9000 failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
        at org.apache.hadoop.ipc.Client.call(Client.java:743)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:363)
        at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:81)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:342)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:279)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

 

官方说明:http://hbase.apache.org/book/hadoop.html

HBase 0.90.x does not ship with hadoop-0.20.205.x, etc. To make it run, you need to replace the hadoop jars that HBase shipped with in its lib directory with those of the Hadoop you want to run HBase on. If even after replacing Hadoop jars you get the below exception:

sv4r6s38: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
sv4r6s38:       at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
sv4r6s38:       at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
sv4r6s38:       at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
sv4r6s38:       at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)
sv4r6s38:       at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)
sv4r6s38:       at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:229)
sv4r6s38:       at org.apache.hadoop.security.KerberosName.<clinit>(KerberosName.java:83)
sv4r6s38:       at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:202)
sv4r6s38:       at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)

you need to copy under hbase/lib, the commons-configuration-X.jar you find in your Hadoop's lib directory. That should fix the above complaint.

 

起初以为是不支持hadoop-0.20.205.0,换成hadoop-0.20.203.0,仍然存在同样的问题。

最后发现是没有用hadoop的jar替换hbase/lib下面的jar,具体替换如下:

1、删除hbase/lib/hadoop-core-0.20-append-r1056497.jar

2、导入hadoop/hadoop-core-0.20.203.0.jar和hadoop/lib/commons-collections-3.2.1.jar

 

 

2012-01-25 12:09:31,554 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1065)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1079)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
        at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
        at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
        at org.apache.hadoop.hbase.security.User.call(User.java:457)
        at org.apache.hadoop.hbase.security.User.callStatic(User.java:447)
        at org.apache.hadoop.hbase.security.User.access$200(User.java:49)
        at org.apache.hadoop.hbase.security.User$SecureHadoopUser.isSecurityEnabled(User.java:435)
        at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:406)
        at org.apache.hadoop.hbase.security.User.login(User.java:146)
        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:202)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1060)
        ... 5 more

 

这个是因为缺少hadoop/lib/commons-configuration-1.6.jar包,把这个包导入到hbase/lib下

 

WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)

 

这是因为没有先启动dn上面的zookeeper导致的,要先启动zookeeper,然后再启动hbase

 

WARN org.apache.hadoop.hbase.master.ServerManager: Server dn0001,60020,1327465650410 has been rejected; Reported time is too far out of sync with master.  Time difference of 162817ms > max allowed of 30000ms

 

这是因为几台服务器时间没有同步。

 

 

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy7.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1176)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:415)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487)
        at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:425)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:383)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:279)

 

 org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper

日志还出现上面两个问题,暂时没找到原因,系统可以正常启动!

 

nn0001上面master启动日志:

Wed Jan 25 11:45:42 CST 2012 Starting master on nn0001
ulimit -n 1024
2012-01-25 11:45:44,226 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: Initializing RPC Metrics with hostName=HMaster, port=60000
2012-01-25 11:45:46,357 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server Responder: starting
2012-01-25 11:45:46,364 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60000: starting
2012-01-25 11:45:46,364 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: starting
2012-01-25 11:45:46,419 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: starting
2012-01-25 11:45:46,419 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000: starting
2012-01-25 11:45:46,415 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: starting
2012-01-25 11:45:46,415 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000: starting
2012-01-25 11:45:46,415 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60000: starting
2012-01-25 11:45:46,414 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60000: starting
2012-01-25 11:45:46,414 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60000: starting
2012-01-25 11:45:46,365 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000: starting
2012-01-25 11:45:46,365 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60000: starting
2012-01-25 11:45:46,628 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
2012-01-25 11:45:46,628 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=nn0001
2012-01-25 11:45:46,628 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_26
2012-01-25 11:45:46,628 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
2012-01-25 11:45:46,628 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_26/jre

 

 

待续!

你可能感兴趣的:(zookeeper)