hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署

一、hadoop2.2.0、zookeeper3.4.5、hbase0.96.2、hive0.13.1都是什么?
  hadoop2.2.0的介绍以及特性,参考这里:http://blog.yidooo.net/archives/hadoop-2-2-0-new-features.html
  zookeeper的介绍,参考这里:http://baike.baidu.com/view/3061646.htm
  hbase的介绍,参考这里:http://baike.baidu.com/view/1993870.htm
  hive0.13的介绍以及特性,参考这里:http://www.csdn.net/article/2014-04-22/2819438-Cloud-Hive
 
  四款软件打包后的文件,我放到了这里: http://pan.baidu.com/s/1i35PlI1
   我想能够看这篇文章的人,都会具备一些基础知识,这里就不多介绍了。
   BTW:我是用MAC10.09+Parallels9虚拟的4个ubuntu。分别为m1,m2两个主,s1,s2两个从,共四台机器。
 
二、这些软件在哪里下载?
  hadoop2.2.0:http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
  zookeeper3.4.5:http://apache.dataguru.cn/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
  hbase0.96.2:http://mirrors.hust.edu.cn/apache/hbase/hbase-0.96.2/hbase-0.96.2-hadoop2-bin.tar.gz
  hive0.13.1:http://mirrors.cnnic.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
  JDK1.7.0_65:使用apt-get方式安装
 
  这里hadoop2.2.0使用的是源码包,因为我使用的是64bit的ubuntu,而hadoop官方提供的,只有32bit可用。如果在64bit上运行会报错util.NativeCodeLoader - Unable to load native-hadoop library for your platform..错误,所以需要重新在64bit上编辑,后面我会单独写一篇文章介绍如何操作。
 
  三、如何安装
  1、安装JDK(当前主机名为m1)
    1)执行以下命令
#sudo apt-get install oracle-java7-installer
    2)配置JAVA环境变量
#sudo vi /etc/environment
 在第一行的PASH最后加上java的bin路径。      
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-7oracle/bin”
      在PATH的后面加上下面三行
CLASSPATH="/usr/lib/jvm/java-7-oracle/lib”
JAVA_HOME="/usr/lib/jvm/java-7-oracle”
JRE_HOME="/usr/lib/jvm/java-7-oracle/jre”

      告诉系统,我们使用的sun的JDK,而非OpenJDK了

#sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-7-oracle/bin/java 300
#sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java-7-oracle/bin/javac 300
#sudo update-alternatives --config java

      这时会有几个选项,如下图选择2,然后再执行java -version就可以看到最新版本

 
   2、用parallels克隆3台机器
    1)在parallels的硬件网络中选择如下所示,这个时候这个ping www.163.com就会ping通了


    2)点击Parallels左上角=》文件=》克隆,克隆三台虚拟机名字分别命名为:m2,s1,s2(克隆前要先停止虚拟机)
    执行sudo vi /etc/hostname ,修改各自的主机名称,如果生效需要重启。
    在m1、m2、s1、s2上分别执行ifconfig查看被分配到的IP地址,然后执行sudo vi /etc/hosts,我的机器修改如下图,然后执行”sudo /etc/init.d/networking restart"生效:

 

    3)配置shhd无验证登录(我使用的是root帐号)
    安装SSH工具

#sudo apt-get install ssh openssh-server
(如果默认执行ssh存在,就不用安装了)

    在每台机器分别输入ssh-keygen,一路回车,然后会在用户的.ssh目录生成id_rsa和id_rsa.pub文件。
    在m1上执行:

#scp -r root@m2:/root/.ssh/id_rsa.pub ~/.ssh/m2.pub
#scp -r root@s1:/root/.ssh/id_rsa.pub ~/.ssh/s1.pub
#scp -r root@s2:/root/.ssh/id_rsa.pub ~/.ssh/s2.pub
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/m2.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/s1.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/s2.pub >> ~/.ssh/authorized_keys
#scp -r ~/.ssh/authorized_keys root@m2:~/.ssh/
#scp -r ~/.ssh/authorized_keys root@s1:~/.ssh/
#scp -r ~/.ssh/authorized_keys root@s2:~/.ssh/


 
  3、安装Zookeeper-3.4.5
    1)配置zoo.cfg(默认是没有zoo.cfg,将zoo_sample.cfg复制一份,并命名为zoo.cfg)
root@m1:/home/hadoop/zookeeper-3.4.5/conf# vi zoo.cfg 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/home/hadoop/zookeeper-3.4.5/data
dataLogDir=/home/hadoop/zookeeper-3.4.5/logs
server.1=m1:2888:3888
server.2=m2:2888:3888
server.3=s1:2888:3888
server.4=s2:2888:3888
# the port at which the clients will connect
clientPort=2181
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

    2)将zookeeper从m1复制到m2,s1,s2机器上
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@m2:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s1:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s2:/home/hadoop

    3)在m1,m2,s1,s2机器上,的/home/hadoop/zookeeper-3.4.5/dataDir目录下创建 myid文件,内容为在zoo.cfg中配置的server.后面的数字,记住只能是数字
    m1为1
    m2为2
    s1为3
    s2为4
    至此,zookeeper的配置结束。
 
  4、安装hadoop2.2.0
    修改以下7个配置文件:
    1)/home/hadoop/hadoop-2.2.0/etc/hadoop/hadoop-env.sh(主要修改java路径)
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
#export JAVA_HOME=${JAVA_HOME}
 
    2)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-env.sh(主要修改java路径)
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-env.sh 
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
 
    3)/home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hdfs-site.xml






      
              dfs.nameservices
              mycluster
      

      
              dfs.ha.namenodes.mycluster
              m1,m2
      

      
              dfs.namenode.rpc-address.mycluster.m1
              m1:9000
      

      
              dfs.namenode.rpc-address.mycluster.m2
              m2:9000
      

      
              dfs.namenode.http-address.mycluster.m1
              m1:50070
      

      
              dfs.namenode.http-address.mycluster.m2
              m2:50070
      

      
              dfs.namenode.shared.edits.dir
              qjournal://m1:8485;m2:8485/mycluster
      

     
          dfs.ha.automatic-failover.enabled.mycluster
        true
    

      
              dfs.client.failover.proxy.provider.mycluster
       org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
      

      
              dfs.ha.fencing.methods
              sshfence
      

      
              dfs.ha.fencing.ssh.private-key-files
              /root/.ssh/id_rsa
      

      
              dfs.journalnode.edits.dir
              /home/hadoop/hadoop-2.2.0/tmp/journal
      

      
              dfs.replication
              3
      

      
              dfs.webhdfs.enabled
              true
      

         
    dfs.permissions
    false
 

 
    dfs.permissions.enabled
    false
 

 
    4)/home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi mapred-site.xml



    
          mapreduce.framework.name
          yarn
          Execution framework set to Hadoop YARN.
    

 
    5)/home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi core-site.xml






 
  fs.defaultFS
  hdfs://mycluster
 


  dfs.nameservices
  mycluster
 


  ha.zookeeper.quorum
  m1:2181,m2:2181,s1:2181,s2:2181
 

       
                hadoop.tmp.dir
                /home/hadoop/hadoop-2.2.0/tmp
               
       

 
    6)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-site.xml




      
              yarn.nodemanager.aux-services
              mapreduce_shuffle
      

      
              yarn.nodemanager.aux-services.mapreduce.shuffle.class
              org.apache.hadoop.mapred.ShuffleHandler
      

      
              yarn.resourcemanager.hostname
              m1
      


 
    7)/home/hadoop/hadoop-2.2.0/etc/hadoop/slaves
root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi slaves 
s1
s2
至此, hadoop的配置结束。
 
 
    5、启动zookeeper
    1)在m1,m2,s1,s2所有机器上执行,下面的代码是在m1上执行的示例:
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower
root@m1:/home/hadoop#

    2)在每台机器上执行下面的命令,可以查看状态,在s1上是leader,其他机器是follower
root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: leader
root@s1:/home/hadoop#

    3)测试zookeeper是否启动成功
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh

Connecting to localhost:2181
2014-07-27 00:27:16,621 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-07-27 00:27:16,628 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=m1
2014-07-27 00:27:16,628 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_65
2014-07-27 00:27:16,629 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-07-27 00:27:16,629 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-27 00:27:16,630 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/home/hadoop/zookeeper-3.4.5/bin/../build/classes:/home/hadoop/zookeeper-3.4.5/bin/../build/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/home/hadoop/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/home/hadoop/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../conf:/usr/lib/jvm/java-7-oracle/lib
2014-07-27 00:27:16,630 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=:/usr/local/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-07-27 00:27:16,631 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-07-27 00:27:16,631 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2014-07-27 00:27:16,632 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.11.0-15-generic
2014-07-27 00:27:16,633 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2014-07-27 00:27:16,633 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2014-07-27 00:27:16,634 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2014-07-27 00:27:16,636 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@19b1ebe5
Welcome to ZooKeeper!
2014-07-27 00:27:16,672 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-07-27 00:27:16,685 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localhost/127.0.0.1:2181, initiating session
JLine support is enabled
2014-07-27 00:27:16,719 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x147737cd5d30000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1]


    4)在m1上格式化zookeeper
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs zkfc -formatZK
14/07/27 00:31:59 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at m1/192.168.1.50:9000
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:host.name=m1
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_65
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-client-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-api-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-site-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-2.2.0/lib/native
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.compiler=
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.version=3.11.0-15-generic
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.name=root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=m1:2181,m2:2181,s1:2181,s2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5990054a
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Opening socket connection to server m1/192.168.1.50:2181. Will not attempt to authenticate using SASL (unknown error)
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Socket connection established to m1/192.168.1.50:2181, initiating session
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Session establishment complete on server m1/192.168.1.50:2181, sessionid = 0x147737cd5d30001, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/mycluster already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/mycluster? (Y or N) 14/07/27 00:32:00 INFO ha.ActiveStandbyElector: Session connected.
y
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
14/07/27 00:32:13 INFO zookeeper.ClientCnxn: EventThread shut down
14/07/27 00:32:13 INFO zookeeper.ZooKeeper: Session: 0x147737cd5d30001 closed
root@m1:/home/hadoop#

    5)验证zkfc是否格式化成功,如果多了一个 hadoop-ha包就是成功了。
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 1]
  6、启动JournalNode集群
     1) 依次在m1,m2,s1,s2上面执行
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-journalnode-m1.out
root@m1:/home/hadoop# jps
2884 JournalNode
2553 QuorumPeerMain
2922 Jps
root@m1:/home/hadoop#
     2) 格式化集群的一个NameNode(m1),有两种方法,我使用的是第一种
    方法一
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –format
    方法二
root@m1:/home/hadoop/hadoop-2.2.0/bin/hdfs namenode -format -clusterId m1

     3) 在m1上启动刚才格式化的 namenode,执行命令后,浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

 
     4) 将m1的数据复制到m2上来,在m2上执行
root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –bootstrapStandby
 
     5) 启动m2上的namenode,执行命令后,浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态。这个时候在网址上可以发现m1和m2的状态都是 standby。
root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

 
     6) 启动所有的datanode,在m1上执行
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemons.sh start datanode
s2: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s2.out
s1: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s1.out
root@m1:/home/hadoop#

     7) 启动yarn,在m1上执行以下命令,然后浏览可以看到效果:http://m1:8088/cluster
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-resourcemanager-m1.out
s1: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s1.out
s2: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s2.out
root@m1:/home/hadoop#

 
     8) 、启动 ZooKeeperFailoverCotroller,在m1,m2机器上依次执行以下命令,这个时候再浏览50070端口,可以发现m1变成active状态了,而m2还是standby状态
root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-zkfc-m1.out
root@m1:/home/hadoop#

 
     9) 、测试HDFS是否可用
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /
Found 2 items
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -mkdir /input  
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /        
Found 3 items
drwxr-xr-x   - root supergroup          0 2014-07-27 01:20 /input
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input      
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -put hadoop.cmd /input
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input            
Found 1 items
-rw-r--r--   3 root supergroup       7530 2014-07-27 01:20 /input/hadoop.cmd
root@m1:/home/hadoop/hadoop-2.2.0/bin#
     10) 、测试YARN是否可用,我们来做一个经典的例子,统计刚才放入input下面的hadoop.cmd的单词频率
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hadoop jar /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
14/07/27 01:22:41 INFO client.RMProxy: Connecting to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:43 INFO input.FileInputFormat: Total input paths to process : 1
14/07/27 01:22:44 INFO mapreduce.JobSubmitter: number of splits:1
14/07/27 01:22:44 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/07/27 01:22:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406394452186_0001
14/07/27 01:22:46 INFO impl.YarnClientImpl: Submitted application application_1406394452186_0001 to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:46 INFO mapreduce.Job: The url to track the job: http://m1:8088/proxy/application_1406394452186_0001/
14/07/27 01:22:46 INFO mapreduce.Job: Running job: job_1406394452186_0001
14/07/27 01:23:10 INFO mapreduce.Job: Job job_1406394452186_0001 running in uber mode : false
14/07/27 01:23:10 INFO mapreduce.Job:  map 0% reduce 0%
14/07/27 01:23:31 INFO mapreduce.Job:  map 100% reduce 0%
14/07/27 01:23:48 INFO mapreduce.Job:  map 100% reduce 100%
14/07/27 01:23:48 INFO mapreduce.Job: Job job_1406394452186_0001 completed successfully
14/07/27 01:23:49 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=6574
                FILE: Number of bytes written=175057
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=7628
                HDFS: Number of bytes written=5088
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=18062
                Total time spent by all reduces in occupied slots (ms)=14807
        Map-Reduce Framework
                Map input records=240
                Map output records=827
                Map output bytes=9965
                Map output materialized bytes=6574
                Input split bytes=98
                Combine input records=827
                Combine output records=373
                Reduce input groups=373
                Reduce shuffle bytes=6574
                Reduce input records=373
                Reduce output records=373
                Spilled Records=746
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=335
                CPU time spent (ms)=2960
                Physical memory (bytes) snapshot=270057472
                Virtual memory (bytes) snapshot=1990762496
                Total committed heap usage (bytes)=136450048
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=7530
        File Output Format Counters 
                Bytes Written=5088
root@m1:/home/hadoop/hadoop-2.2.0/bin#
     11)、验证HA的高可用性,故障转移,刚才我们用浏览器打开m1和m2的50070端口,已经看到m1的状态是active,m2的状态是 standby,
        a)我们在m1上kill掉namenode进程
root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
5492 Jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
3898 NameNode
4075 ResourceManager
root@m1:/home/hadoop/hadoop-2.2.0/bin# kill -9 3898
root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
5627 Jps
root@m1:/home/hadoop/hadoop-2.2.0/bin#
       b)再浏览m1和m2的50070端口,发现m1是打不开,而m2是active状态。

    这时候在m2上的HDFS和mapreduce还是可以正常运行的,虽然m1上的namenode进程已经被kill掉,但不影响使用这就是故障转移的优势!

 

  7、Hbase-0.96.2-hadoop2(启动双HMaster的配置,m1是主HMaster,m2是从HMaster)

     1)、修改hbase-env.sh配置,主要修JAVA_HOME的目录,以及HBASE_MANAGES_ZK

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-env.sh 
#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.


# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
#这个值为false时,表示启动的是独立的zookeeper。而配置成true则是hbase自带的zookeeper。
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to ",DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.


     2)、修改hbase-site.xml配置

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-site.xml 



       
                
               hbase.rootdir
               hdfs://mycluster/hbase
       

       
                
               hbase.cluster.distributed
               true
       

       
               hbase.tmp.dir
               /home/hadoop/hbase-0.96.2-hadoop2/tmp
       

       
                
               hbase.master
               60000 
        

       
                
               hbase.zookeeper.quorum
               m1,m2,s1,s2
       

       
                
               hbase.zookeeper.property.clientPort
                2181
       

       
               hbase.zookeeper.property.dataDir
               /home/hadoop/zookeeper-3.4.5/data
       


     2)、修改regionservers文件
     
   通常部署master的机器上不就部署slave了,用两台集群做Hbase从服务器

 

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi regionservers 
s1
s2

     3)、创建hadoop的hdfs-site.xml的软连接到hbase的配置文件目录

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:15 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ln -s /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hdfs-site.xml
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:16 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
lrwxrwxrwx 1 root root    50 Jul 27 09:16 hdfs-site.xml -> /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml*
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf#

    3)、hbase0.96.2版本的jar包不需要复制,官方提供的是已经打包好的

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# ls | grep hadoop
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-client-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-hdfs-2.2.0.jar
hadoop-hdfs-2.2.0-tests.jar
hadoop-mapreduce-client-app-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0-tests.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
hadoop-yarn-server-nodemanager-2.2.0.jar
hbase-client-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2-tests.jar
hbase-examples-0.96.2-hadoop2.jar
hbase-hadoop2-compat-0.96.2-hadoop2.jar
hbase-hadoop-compat-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2-tests.jar
hbase-prefix-tree-0.96.2-hadoop2.jar
hbase-protocol-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2-tests.jar
hbase-shell-0.96.2-hadoop2.jar
hbase-testing-util-0.96.2-hadoop2.jar
hbase-thrift-0.96.2-hadoop2.jar
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib#

    4)、将m1上面的hbase0.96.2复制到m2,s1,s2同样的目录中

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@m2:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s1:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s2:/home/hadoop

    5)、在m1上启动hbase0.96.2,执行命令后,浏览网址可以看效果:http://m1:60010/master-status

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/start-hbase.sh
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m1.out
s1: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s1.out
s2: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s2.out
root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
7769 Jps
4075 ResourceManager
root@m1:/home/hadoop#

 

    6)、在m1上用shell测试连接hbase

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 09:31:07,601 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):001:0> list
TABLE                                                                                                                                                                                                                  
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 2.8030 seconds
=> []
hbase(main):002:0> version
0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):003:0> status
2 servers, 0 dead, 1.0000 average load
hbase(main):004:0> create 'test_idoall_org','uid','name'
0 row(s) in 0.5800 seconds
=> Hbase::Table - test_idoall_org
hbase(main):005:0> list
TABLE                                                                                                                                                                                                                  
test_idoall_org                                                                                                                                                                                                        
1 row(s) in 0.0320 seconds
=> ["test_idoall_org"]
hbase(main):006:0> put 'test_idoall_org','10086','name:idoall','idoallvalue'
0 row(s) in 0.1090 seconds                 ^
hbase(main):009:0> get 'test_idoall_org','10086'
COLUMN                                                 CELL                                                                                                                                                            
 name:idoall                                           timestamp=1406424831473, value=idoallvalue                                                                                                                      
1 row(s) in 0.0450 seconds
hbase(main):010:0> scan 'test_idoall_org'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 10086                                                 column=name:idoall, timestamp=1406424831473, value=idoallvalue                                                                                                  
1 row(s) in 0.0620 seconds
hbase(main):011:0>

 

    7)、在m2上启动hbase,同样执行命令后,在浏览器打开网址也可以看到m2上的hbase状态:http://m2:60010/master-status

root@m2:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase-daemon.sh start master
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m2.out
root@m2:/home/hadoop#

 

 

    8)、测试m1和m2的主从备份切换
       a)这时在浏览器打开http://m1:60010/master-status和http://m2:60010/master-status,可以看到下图的状态

 

 

       b)我们在m1上停止掉hbase的进程,再打开网址,会发现m1已经打不开,而m2的hbase集群状态已经被改变

root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
8645 Jps
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
root@m1:/home/hadoop# kill -9 7540
root@m1:/home/hadoop# jps
6688 NameNode
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
8655 HMaster
8719 Jps
root@m1:/home/hadoop#

 

     至此,hbase已经配置完,并且主从故障转移是可用的。

 

  8、在ubuntu12.04的m1上面安装mysql5.5.x

     1)、apt-get install mysql-server mysql-client mysql-common
     过程中会弹出一个界面,让你输入root的密码。我设置的是123456
     安装后可以测试下mysql的连接状态:mysql -uroot -p123456
     可以用service mysql stop/service mysql start来启动和停止mysql状态

     2)、授权可以远程访问mysql
root@m1:/home/hadoop# mysql -uroot -p123456
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 36
Server version: 5.5.22-0ubuntu1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all on *.* to 'root'@'%'  identified by '123456' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye
root@m1:/home/hadoop#

     3)、如果还无法远程连接,打开:vi /etc/mysql/my.cnf。将bind-address=127.0.0.1,改为本机ip,重新启动mysql

 

  9、hive 0.13.1安装(在m1上操作)

     1)、将apache-hive-0.13.1-bin.tar.gz解压到/home/hadoop/hive-0.13.1
 
     2)、进入到hive的conf文件,将模板文件复制出对应的配置文件
root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-env.sh.template hive-env.sh
root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-default.xml.template hive-site.xml

     3)、修改hive-env.sh文件,主要设置hadoop目录
root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-env.sh
HADOOP_HOME=/home/hadoop/hadoop-2.2.0

     4)、修改hive-site.xml文件
root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-site.xml 
      


               
               hive.metastore.warehouse.dir
               hdfs://mycluster/user/hive/warehouse
      

      
      

               The list of zookeeper servers to talk to. This isonly needed for read/write locks.
               
               hive.exec.scratchdir
               hdfs://mycluster/user/hive/scratchdir
      
      
               
               hive.querylog.location
               /home/hadoop/hive-0.13.1/logs
      

      
               
               javax.jdo.option.ConnectionURL
               jdbc:mysql://m1:3306/hiveMeta?createDatabaseIfNotExist=true
      

      
               
               javax.jdo.option.ConnectionDriverName
               com.mysql.jdbc.Driver
      

      
               javax.jdo.option.ConnectionUserName
               root
      

      
               javax.jdo.option.ConnectionPassword
               123456
      

      
                
               
               hive.aux.jars.path                
               file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hive-h
base-handler-0.13.1.jar,file:///home/hadoop/hive-0.13.1/lib/protobuf-java-2.5.0.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-client-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-common-0.96.2-hadoop2
.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-protocol-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-server-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/zookeeper-3.4.5.jar,file:///home/had
oop/hive-0.13.1/lib/guava-11.0.2.jar,file:///home/hadoop/hive-0.13.1/lib/htrace-core-2.04.jar

      

      
                
               hive.zookeeper.quorum
               m1,m2,s1,s2
      

 

     5)、hive-site.xml中hive.aux.jars.path配置项包含的jar,hive-hbase-handler-0.13.1.jar和guava-11.0.2.jar是默认就有的,只需要执行以下命令,将其他的从hadoop/zookeeper/hbase中复制过来即可

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/zookeeper-3.4.5/dist-maven/zookeeper-3.4.5.jar /home/hadoop/hive-0.13.1/lib

 

    6)、mysql的odbc驱动,可以到这里下载http://dev.mysql.com/downloads/connector/j/,解压后,将目录中的mysql-connector-java-5.1.31-bin.jar复制到 /home/hadoop/hive-0.13.1/lib

 

     7)、创建测试数据,以及数据仓库目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi /home/hadoop/hive-0.13.1/testdata001.dat
12306,mname,yname
10086,myidoall,youidoall
/home/hadoop/hadoop-2.2.0/bin/hadoop fs -mkdir -p /user/hive/warehouse

 

    8)、使用shell命令,测试hive

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive 
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.464 seconds, Fetched: 1 row(s)
hive> create database testidoall;
OK
Time taken: 0.279 seconds
hive>  show databases;
OK
default
testidoall
Time taken: 0.021 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.039 seconds
hive> create external table testtable(uid int,myname string,youname string) row format delimited fields terminated by ',' location '/user/hive/warehouse/testtable';
OK
Time taken: 0.205 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/hive-0.13.1/testdata001.dat' OVERWRITE INTO TABLE testtable;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.testtable
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://mycluster/user/hive/warehouse/testtable
Table testidoall.testtable stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.77 seconds
hive> select * from testtable;
OK
12306   mname   yname
10086   myidoall        youidoall
Time taken: 0.279 seconds, Fetched: 2 row(s)
hive>

     至此,hive已经安装完成。

 

  10、hive to hbase(Hive中的表数据导入到Hbase中去)

     1)、创建hbase可以识别的表
root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.45 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testtable
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive> CREATE TABLE hive2hbase_idoall(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_idoall");
OK
Time taken: 2.332 seconds
hive> show tables;
OK
hive2hbase_idoall
testtable
Time taken: 0.036 seconds, Fetched: 2 row(s)
hive>

 

    2)、创建本地表,用来存储数据,然后插入到Hbase用的,相当于一张中间表了。同时将之前的测试数据导入到这张中间表。

hive> create table hive2hbase_idoall_middle(foo int,bar string)row format delimited fields terminated by ',';
OK
Time taken: 0.086 seconds
hive> show tables;                                                                                           
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.03 seconds, Fetched: 3 row(s)
hive> load data local inpath '/home/hadoop/hive-0.13.1/testdata001.dat' overwrite into table hive2hbase_idoall_middle;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.hive2hbase_idoall_middle
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://mycluster/user/hive/warehouse/testidoall.db/hive2hbase_idoall_middle
Table testidoall.hive2hbase_idoall_middle stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]
OK
Time taken: 0.683 seconds    
hive>

 

    3)、将本地中间表(hive2hbase_idoall_middle)导入到表(hive2hbase_idoall)中,会自动同步到hbase。

hive> insert overwrite table hive2hbase_idoall select * from hive2hbase_idoall_middle;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1406394452186_0002, Tracking URL = http://m1:8088/proxy/application_1406394452186_0002/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1406394452186_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-07-27 11:44:11,491 Stage-0 map = 0%,  reduce = 0%
2014-07-27 11:44:22,684 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
MapReduce Total cumulative CPU time: 1 seconds 510 msec
Ended Job = job_1406394452186_0002
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.51 sec   HDFS Read: 288 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 510 msec
OK
Time taken: 25.613 seconds
hive> select * from hive2hbase_idoall;
OK
10086   myidoall
12306   mname
Time taken: 0.179 seconds, Fetched: 2 row(s)
hive> select * from hive2hbase_idoall_middle;
OK
12306   mname
10086   myidoall
Time taken: 0.088 seconds, Fetched: 2 row(s)
hive>

 

    4)、用shell连接hbase,查看hive过来的数据是否已经存在

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:47:14,454 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> list
TABLE                                                                                                                                                                                                                  
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hive2hbase_idoall                                                                                                                                                                                                      
test_idoall_org                                                                                                                                                                                                        
2 row(s) in 2.9480 seconds

=> ["hive2hbase_idoall", "test_idoall_org"]
hbase(main):002:0> scan "hive2hbase_idoall"
ROW                                                    COLUMN+CELL                                                                                                                                                     
 10086                                                 column=cf1:val, timestamp=1406432660860, value=myidoall                                                                                                         
 12306                                                 column=cf1:val, timestamp=1406432660860, value=mname                                                                                                            
2 row(s) in 0.0540 seconds

hbase(main):003:0> get "hive2hbase_idoall",'12306'
COLUMN                                                 CELL                                                                                                                                                            
 cf1:val                                               timestamp=1406432660860, value=mname                                                                                                                            
1 row(s) in 0.0110 seconds

hbase(main):004:0>

    至此,hive to hbase的测试功能正常。

 

  11、hbase to hive(Hbase中的表数据导入到Hive)

     1)、在hbase下创建表hbase2hive_idoall

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:54:25,844 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> create 'hbase2hive_idoall','gid','info'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 3.4970 seconds

=> Hbase::Table - hbase2hive_idoall
hbase(main):002:0> put 'hbase2hive_idoall','3344520','info:time','20140704'
0 row(s) in 0.1020 seconds

hbase(main):003:0> put 'hbase2hive_idoall','3344520','info:address','HK'
0 row(s) in 0.0090 seconds

hbase(main):004:0> scan 'hbase2hive_idoall'
ROW                                                    COLUMN+CELL                                                                                                                                                    
 3344520                                               column=info:address, timestamp=1406433302317, value=HK                                                                                                          
 3344520                                               column=info:time, timestamp=1406433297567, value=20140704                                                                                                       
1 row(s) in 0.0330 seconds

hbase(main):005:0>
 
    2)、Hive下创建表连接Hbase中的表
root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.449 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.02 seconds
hive> show tables;
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.026 seconds, Fetched: 3 row(s)
hive> create external table hbase2hive_idoall (key string,gid map)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="info:")  TBLPROPERTIES ("hbase.table.name" = "hbase2hive_idoall");
OK
Time taken: 1.696 seconds
hive> show tables;                       
OK
hbase2hive_idoall
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.034 seconds, Fetched: 4 row(s)
hive> select * from hbase2hive_idoall;
OK
3344520 {"address":"HK","time":"20140704"}
Time taken: 0.701 seconds, Fetched: 1 row(s)
hive>

至此,如文章标题所描述的ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署,全部测试完毕,过程中也遇到了一些坑,会在常见问题中介绍。希望这个测试笔记可以帮助到更多的人。

 
  四、常见问题
  1、过程中如果在hadoop(namenode/datanode/yarn)、hbase、hive启动出现问题时,一定要用tail -n 100 ***.log仔细查看相关的日志,可以发现很多有用的信息。以下几个命令,也有助于在命令行模式追踪错误。

      1)、hadoop在控制台输出debug信息,执行完以下命令后,可以启动namenode,datanode,yarn测试效果

export HADOOP_ROOT_LOGGER=DEBUG,console
 
      2)、hive 在控制台输出debug信息
/home/hadoop/hive-0.13.1/bin/hive --hiveconf  hive.root.logger=DEBUG,console
 
  2、mysql在启动时,遇到过job failed to start,可以用以下几个命令,重新安装解决。
rm /var/lib/mysql/ -R
rm /etc/mysql/ -R
apt-get autoremove mysql* —purge
apt-get remove apparmor
apt-get install mysql-server mysql-client mysql-common

  3、dpkg 被中断,您必须手工运行 sudo dpkg --configure -a解决此问题
sudo rm /var/lib/dpkg/updates/*
sudo apt-get update
sudo apt-get upgrade
 
  五、参考资料
  _00018 Hadoop-2.2.0 + Hbase-0.96.2 + Hive-0.13.1 分布式环境整合,Hadoop-2.X使用HA方式
  Hadoop2.2.0源代码编译
  hadoop2.1.0编译安装教程
  CentOS6.4编译Hadoop2.2.0

你可能感兴趣的:(Hbase,zookeeper,Hadoop)