搭建Hadoop 2.2.0 和 HBase 0.94.14 集成

       


      使用最新的Hadoop集群与HBase搭建一个分布式的运行环境,最新的Hadoop稳定版本是2.2.0,HBase 的稳定版本是0.94.14 ,搭建过程如下:

1. 安装Hadoop

         这个步骤,我的blog hadoop集群安装


 2. 安装Hbase

        HBase 可以用3个模式之一来安装,分别是:

  1. 独立模式( Standalone Mode )
  2. 伪分布式模式( Pseudo-Distributed Mode )
  3. 完全分布式集群模式(Fully-Distributed Mode)


HBase 默认情况下,它自己管理自己的一个Zookeeper集群,它作为Zookeeper的一种 嵌入模式运行。即,Hbase内部管理
Zookeeper;如果需要 Zookeeper在外部管理,需要在配置中设置
export HBASE_MANAGES_ZK=false

 2.1  准备工作

     2.1.1 编译Hbase对应Hadoop版本的Hbase
           下载稳定HBase数据库
        
    
        稳定版本是0.94.14,估计不久后会变成0.96 
      

修改pom.xml文件, 集成对应的hadoop版本.
                  +++ pom.xml (working copy)
@@ -1034,7 +1034,7 @@
     <slf4j.version>1.4.3</slf4j.version>
     <log4j.version>1.2.16</log4j.version>
     <mockito-all.version>1.8.5</mockito-all.version>
-    <protobuf.version>2.4.0a</protobuf.version>
+    <protobuf.version>2.5.0</protobuf.version>
     <stax-api.version>1.0.1</stax-api.version>
     <thrift.version>0.8.0</thrift.version>
     <zookeeper.version>3.4.5</zookeeper.version>
@@ -2241,7 +2241,7 @@
         </property>
       </activation>
       <properties>
-        <hadoop.version>2.0.0-alpha</hadoop.version>
+        <hadoop.version>2.2.0</hadoop.version>
         <slf4j.version>1.6.1</slf4j.version>
       </properties>
       <dependencies>
 具体的Hadoop版本匹配新增:
HBase-0.92.x HBase-0.94.x HBase-0.96.0 HBase-0.98.0
Hadoop-0.20.205 S X X X
Hadoop-0.22.x S X X X
Hadoop-1.0.0-1.0.2[a] S S X X
Hadoop-1.0.3+ S S S X
Hadoop-1.1.x NT S S X
Hadoop-0.23.x X S NT X
Hadoop-2.0.x-alpha X NT X X
Hadoop-2.1.0-beta X NT S X
Hadoop-2.2.0 X NT[b] S S
Hadoop-2.x X NT S S

Where

S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.

      修改完成后,运行maven脚本
mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests
 

2.2 配置OS系统

     配置host文件,假设有四台集群,一个作为Master,以外三个作为RegionServer。
192.168.177.168 machine-2
192.168.177.167 machine-1
192.168.177.158 machine-0
192.168.177.172 hadoop-master hbase-master

2.3 配置HBase

    编辑 hbase-env.sh文件,设置Java环境和Zookeeper的管理方式
     vim  hbase-env.sh
     export JAVA_HOME=your_java_home
     export HBASE_MANAGES_ZK=false



    编辑 hbase-site.xml  
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://hadoop-master:9000/hbase</value>
                <description>The directory shared by region servers.</description>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
                <description>The mode the cluster will be in. Possible values are
                false: standalone and pseudo-distributed setups with managed
                Zookeeper true: fully-distributed with unmanaged Zookeeper
                Quorum (see hbase-env.sh)
                </description>
        </property>
        <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2222</value>
                <description>Property from ZooKeeper's config zoo.cfg.
                The port at which the clients will connect.
                </description>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>machine-0,machine-1,machine-2</value>
                <description>Comma separated list of servers in the ZooKeeper Quorum.
                                                         For example,
                                                         "host1.mydomain.com,host2.mydomain.com".
                                                         By default this is set to localhost for local and
                                                         pseudo-distributed modes of operation. For a
                                                         fully-distributed setup, this should be set to a
full
                                                         list of ZooKeeper quorum servers. If
                                                         HBASE_MANAGES_ZK is set in hbase-env.sh
                                                         this is the list of servers which we will start/s
top
                                                         ZooKeeper on.
                </description>
        </property>
</configuration>


  2.4  将配置好的HBase实例分发到各台机器上

   #machine-0
   scp -r /opt/hbase machine-0:/opt/

   #machine-1
   scp -r /opt/hbase machine-1:/opt/

   #machine-2
   scp -r /opt/hbase machine-2:/opt/


    master机上,配置regionservers:
machine-0
machine-1
machine-2


    在Hadoop-master上启动HBase:
   #注意,首先要启动Hadoo集群
   /opt/hbase/bin/start-hbase.sh



  2.5 测试HBase

         查看HBase-master 运行的进程:
        [app@hadoop-master ~]$ jps
        3453 Jps
       3166 HMaster
       2779 ResourceManager
       2022 Bootstrap
       2466 NameNode
       2618 SecondaryNameNode
   
       显示如上的信息说明,服务进程已经启动了。
        通过浏览器浏览HBase集群的运行状况,打开浏览器,输入:
http://192.168.177.172:60010
      

         
        当然, 使用HBase Shell命令来创建几个表也是可以的(slf4J的配置有点冗余,主要是HBase的pom.xml配置的修改,后续会fix掉的)。

[app@hadoop-master ~]$ hbase shell
14/01/10 20:19:33 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.14, rUnknown, Wed Jan  8 04:02:25 EST 2014

hbase(main):001:0> list
TABLE                                                                                                       
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
car                                                                                                         
weblogs                                                                                                     
2 row(s) in 3.1790 seconds

     


     PS:这个安装过程,本人已经安装成功了。如果有任何遗漏或者不足,请指正。
     转载请注明出处,谢谢!

你可能感兴趣的:(hadoop,分布式,hbase,cloud)