一、环境配置
由于集群至少需要三台服务器,我就拿上次做的MongoDB Master, Slave, Arbiter环境来做Hadoop集群。服务器还是ibmcloud 免费提供的。其中Arbiter在这里做的也是slave的角色。
Hostname | IP | Server Type |
Master | 192.168.0.28 | Centos6.2 |
Slave | 192.168.0.29 | Ubuntu14.04 |
Arbiter | 192.168.0.30 | Ubuntu14.04 |
配置三台机器的Master hosts文件如下:
$ cat /etc/hosts 127.0.0.1 localhost Database-Master localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.0.28 Database-Master master 192.168.0.29 Database-Slave slave 192.168.0.30 Database-Arbiter arbiter
Master机器有安装ansible,其他所需要的软件包地址:
http://apache.fayea.com/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
http://apache.opencas.org/hbase/1.2.0/hbase-1.2.0-bin.tar.gz
http://download.oracle.com/otn-pub/java/jdk/8u73-b02/jdk-8u73-linux-x64.tar.gz
java我解压缩到/usr/java/目录下,然后编辑环境变量.zshrc
export JAVA_HOME=/usr/java/jdk1.8.0_73 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tool.jar
然后重新加载,使变量生效, source .zshrc.
然后需要集群见无密码登录,此前做MongoDB实验的时候已经设置过,再次不在赘述。
二、Hadoop的安装和配置
1. 首先将刚才下载的hadoop-2.6.4.tar.gz文件解压到/home/ibmcloud/hadoop,然后编辑etc/hadoop/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>
2. 添加JAVA_HOME变量到hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_73
3. hdfs-site.xml
<configuration> <property> <name>dfs.name.dir</name> <value>/home/ibmcloud/hadoop/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/ibmcloud/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
4. 将mapred-site.xml.template 改名mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> </configuration>
5. add master and slave
echo "master" >~/hadoop/etc/hadoop/master echo -e "slave\narbiter" >~/hadoop/etc/hadoop/slaves
6. copy hadoop folder to slave and arbiter
ansible all -m copy -a "src=hadoop dest=~ '
7. 启动hadoop集群
第一次执行,需要格式化namenode,以后启动不需要执行此步骤。
hadoop/bin/hadoop -format
然后启动hadoop
hadoop/sbin/start-all.sh
启动完成后,如果没有什么错误,执行jps查询一下当前进程,NameNode是Hadoop Master进程,SecondaryNameNode,ResourceManager是Hadoop进程。
$ jps 23076 NameNode 20788 ResourceManager 23302 SecondaryNameNode 27559 Jps
三、ZooKeeper集群安装
1. 解压缩zookeeper-3.4.8.tar.gz并重命名zookeeper, 进入zookeeper/conf目录,cp zoo_sample.cfg zoo.cfg 并编辑
$ egrep -v '^$|^#' zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/ibmcloud/zookeeper/data clientPort=2181 server.1=192.168.0.28:2888:3888 server.2=192.168.0.29:2888:3888 server.3=192.168.0.30:2888:3888
2. 新建并编辑myid文件
mkdir /home/zookeeper/dataecho "1" > /home/zookeeper/data/myid
3. 然后同步zookeeper到其他两个节点,然后在其他节点需要修改myid为相应的数字。
ansible all -m copy -a "src=zookeeper dest=~ '
4. 启动zookeeper,查看启动信息
2016-03-15 06:43:00,421 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a0005 with negotiated timeout 40000 for client /192.168.0.28:57372 2016-03-15 06:43:01,755 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:57379 2016-03-15 06:43:01,757 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:57379 2016-03-15 06:43:01,760 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a0006 with negotiated timeout 40000 for client /192.168.0.28:57379 2016-03-15 06:43:02,211 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:57383 2016-03-15 06:43:02,215 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:57383 2016-03-15 06:43:02,217 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a0007 with negotiated timeout 40000 for client /192.168.0.28:57383 2016-03-15 06:46:57,531 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:57379 which had sessionid 0x15378e1050a0006 2016-03-15 06:46:57,544 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:57383 which had sessionid 0x15378e1050a0007 2016-03-15 06:46:57,555 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:57372 which had sessionid 0x15378e1050a0005 2016-03-15 06:47:10,171 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.30:60866 2016-03-15 06:47:10,184 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.30:60866 2016-03-15 06:47:10,186 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a0008 with negotiated timeout 40000 for client /192.168.0.30:60866 2016-03-15 06:47:10,625 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:58169 2016-03-15 06:47:10,626 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:58169 2016-03-15 06:47:10,629 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a0009 with negotiated timeout 40000 for client /192.168.0.28:58169 2016-03-15 06:47:11,199 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.30:60867 2016-03-15 06:47:11,200 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.30:60867 2016-03-15 06:47:11,204 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@645] - Established session 0x15378e1050a000a with negotiated timeout 40000 for client /192.168.0.30:60867
来自Slave的信息:
2016-03-15 06:43:02,667 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:58604 2016-03-15 06:43:02,667 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:58604 2016-03-15 06:43:02,670 [myid:2] - INFO [CommitProcessor:2:ZooKeeperServer@645] - Established session 0x25378e0edf00006 with negotiated timeout 40000 for client /192.168.0.28:58604 2016-03-15 06:46:55,407 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:59328 2016-03-15 06:46:55,410 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:59328 2016-03-15 06:46:55,415 [myid:2] - INFO [CommitProcessor:2:ZooKeeperServer@645] - Established session 0x25378e0edf00007 with negotiated timeout 40000 for client /192.168.0.28:59328 2016-03-15 06:46:57,242 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:59328 which had sessionid 0x25378e0edf00007 2016-03-15 06:46:57,928 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x25378e0edf00006, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:745) 2016-03-15 06:46:57,929 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:58604 which had sessionid 0x25378e0edf00006 2016-03-15 06:47:08,780 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:59377 2016-03-15 06:47:08,786 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:59377 2016-03-15 06:47:08,789 [myid:2] - INFO [CommitProcessor:2:ZooKeeperServer@645] - Established session 0x25378e0edf00008 with negotiated timeout 40000 for client /192.168.0.28:59377 2016-03-15 06:49:57,202 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.0.28:59911 2016-03-15 06:49:57,212 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@900] - Client attempting to establish new session at /192.168.0.28:59911 2016-03-15 06:49:57,215 [myid:2] - INFO [CommitProcessor:2:ZooKeeperServer@645] - Established session 0x25378e0edf00009 with negotiated timeout 40000 for client /192.168.0.28:59911 2016-03-15 06:52:15,489 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x25378e0edf00009, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:745) 2016-03-15 06:52:15,490 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.0.28:59911 which had sessionid 0x25378e0edf00009
5. 再次查看jps, 此时会看到zookeeper进程QuorumPeerMain
$ jps 23076 NameNode 20788 ResourceManager 30821 Jps 23302 SecondaryNameNode 30538 QuorumPeerMain
四、HBase集群的安装和配置
1. 解压缩hbase-1.2.0-bin.tar.gz并重命名为hbase, 编辑/hbase/conf/hbase-env.sh
$ egrep -v '^$|^#' hbase-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_73 export HBASE_CLASSPATH=/home/ibmcloud/hadoop/etc/hadoop export HBASE_OPTS="-XX:+UseConcMarkSweepGC" export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" export HBASE_MANAGES_ZK=false
2. 编辑hbase-site.xml
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>master</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave,arbiter</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000000</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> </configuration>
3. 添加Slave, Arbiter 到regionservers
4. 分发hbase到其他两个节点
ansible all -m copy -a "src=hbase dest=~"
五、启动集群
1. 启动zookeeper
zookeeper/bin/zkServer.sh start
2. 启动Hadoop
$ hadoop/sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 16/03/15 07:33:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [master] master: namenode running as process 23076. Stop it first. arbiter: datanode running as process 2111. Stop it first. slave: datanode running as process 19992. Stop it first. Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 23302. Stop it first. 16/03/15 07:33:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons resourcemanager running as process 20788. Stop it first. arbiter: starting nodemanager, logging to /home/ibmcloud/hadoop/logs/yarn-ibmcloud-nodemanager-Database-Arbiter.out slave: starting nodemanager, logging to /home/ibmcloud/hadoop/logs/yarn-ibmcloud-nodemanager-Database-Slave.out
3. 启动hbase
$ hbase/bin/start-hbase.sh master running as process 10144. Stop it first. arbiter: regionserver running as process 3515. Stop it first. slave: starting regionserver, logging to /home/ibmcloud/hbase/bin/../logs/hbase-ibmcloud-regionserver-Database-Slave.out slave: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 slave: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
查询各个节点的集群进程情况
Master:
# ibmcloud at Database-Master in ~ [7:33:44] $ jps 10144 HMaster 23076 NameNode 20788 ResourceManager 20773 Jps 23302 SecondaryNameNode 30538 QuorumPeerMain
Slave:
# ibmcloud at Database-Slave in ~/hbase/bin [6:47:55] $ jps 19992 DataNode 26794 Jps 16397 QuorumPeerMain 26526 HRegionServer
Arbiter:
# ibmcloud at Database-Arbiter in ~/hbase/bin [6:46:34] $ jps 2016 QuorumPeerMain 3515 HRegionServer 3628 Jps 2111 DataNode
进程都已经开启,进入habse shell环境,
# ibmcloud at Database-Master in ~ [7:34:03] $ hbase/bin/hbase shell 2016-03-15 07:35:04,687 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.0, r25b281972df2f5b15c426c8963cbf77dd853a5ad, Thu Feb 18 23:01:49 CST 2016 hbase(main):001:0> hbase(main):002:0* status ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
提示master还是initialzing, 我的虚拟机1.5G内存,单核,10G硬盘,跑着MongoDB, PHP, Nginx, 加上Hadoop集群,肯定消化不良了。如图:
查看链接发现Hadoop监听网卡是内网的,加上端口转发,打开公网地址查看一下Hadoop运行状态
sudo iptables -t nat -I PREROUTING -d 129.41.153.232 -p tcp --dport 50070 -j DNAT --to 192.168.0.28:50070
然后打开浏览器输入http://129.41.153.232:50070/dfshealth.html#tab-overview,如图,hadoop状态
YARN状态:
Hbase状态:
其中遇到的问题又hbase启动不起来,一直报permission denied,后来发现Slave, Arbiter bin目录下的脚本没有给执行权限,然后logs下日志文件的权限不对。
参考文章:
http://songlee24.github.io/2015/07/20/hadoop-hbase-zookeeper-distributed-mode/
http://stackoverflow.com/questions/21166542/hbase-does-not-run-after-start-hbase-sh-permission-denied