Hbase0.98版本的安装部署配置管理(Hadoop2.3、Hbase0.98、Hive0.13整合)


简介:

HStore存储是HBase存储的核心了,其中由两部分组成,一部分是MemStore,一部分是StoreFiles。MemStore是Sorted Memory Buffer,用户写入的数据首先会放入MemStore,当MemStore满了以后会Flush成一个StoreFile(底层实现是HFile),当StoreFile文件数量增长到一定阈值,会触发Compact合并操作,将多个StoreFiles合并成一个StoreFile,合并过程中会进行版本合并和数据删除,因此可以看出HBase其实只有增加数据,所有的更新和删除操作都是在后续的compact过程中进行的,这使得用户的写操作只要进入内存中就可以立即返回,保证了HBase I/O的高性能。

        当StoreFiles Compact后,会逐步形成越来越大的StoreFile,当单个StoreFile大小超过一定阈值后,会触发Split操作,同时把当前Region Split成2个Region,父Region会下线,新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer上,使得原先1个Region的压力得以分流到2个Region上。
 

 

1HBase的架构:

LSM - 解决磁盘随机写问题(顺序写才是王道);

HFile - 解决数据索引问题(只有索引才能高效读);

WAL - 解决数据持久化(面对故障的持久化解决方案);

zooKeeper - 解决核心数据的一致性和集群恢复;

Replication - 引入类似MySQL的数据复制方案,解决可用性;

此外还有:自动分拆Split、自动压缩(compaction,LSM的伴生技术)、自动负载均衡、自动region迁移。

HBase集群需要依赖于一个Zookeeper ensemble。HBase集群中的所有节点以及要访问HBase

的客户端都需要能够访问到该Zookeeper ensemble。HBase自带了Zookeeper,但为了方便

其他应用程序使用Zookeeper,最好使用单独安装的Zookeeper ensemble。此外,Zookeeper ensemble一般配置为奇数个节点,并且Hadoop集群、Zookeeper ensemble、HBase集群是三个互相独立的集群,并不需要部署在相同的物理节点上,他们之间是通过网络通信的。

Hbase和hadoop的关系可以如下图所示:
Hbase0.98版本的安装部署配置管理(Hadoop2.3、Hbase0.98、Hive0.13整合)_第1张图片

2HadoopHbase的版本匹配

下面在给列出官网信息:
下面面符号的含义:
S =支持并且测试,
X = 不支持,
NT =应该可以,但是没有测试。如下图所示:
Hbase0.98版本的安装部署配置管理(Hadoop2.3、Hbase0.98、Hive0.13整合)_第2张图片




3,下载地址

从Step2的图中看出,由于我安装的hadoop是2.3.0,所以可以选择0.96以上的hbase版本,这里选择比较稳健的0.98版本的hbase下载。

进hbase官网

http://hbase.apache.org/

进去,找到下载,进去

http://www.apache.org/dyn/closer.cgi/hbase/

再进去,选择HTTP,第一个mirrors,找到下载地址如下:

http://mirrors.cnnic.cn/apache/hbase/hbase-0.98.9/hbase-0.98.9-hadoop2-bin.tar.gz


 

 

 

4 ,开始安装

tar zxvf hbase-0.98.9-hadoop2-bin.tar.gz -C /home/hadoop/src/

 

5,配置

5.1),配置hbase-site.xml

开始修改配置文件:/home/hadoop/src/hbase-0.98.9-hadoop2/conf

完全分布式安装:

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.52.128:9000/hbase</value>
<description>HBase数据存储目录</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>指定HBase运行的模式:false:单机/伪分布;true:完全分布</description>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://192.168.52.128:60000</value>
<description>指定Master位置</description>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property><property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.52.128, 192.168.52.129, 192.168.52.130</value>
<description>指定ZooKeeper集群</description>
</property>

<property>
<name>hbase.master.info.bindAddress</name>
<value>192.168.52.128</value>
<description>The bind address for the HBase Master web UI
</description>
</property></configuration>

 

    5.1),  配置

[root@name01 conf]# more hbase-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

/**

 *

 * Licensed to the Apache Software Foundation (ASF) under one

 * or more contributor license agreements.  See the NOTICE file

 * distributed with this work for additional information

 * regarding copyright ownership.  The ASF licenses this file

 * to you under the Apache License, Version 2.0 (the

 * "License"); you may not use this file except in compliance

 * with the License.  You may obtain a copy of the License at

 *

 *     http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */

-->

<configuration>

<property>

<name>hbase.rootdir</name>

<value>hdfs://192.168.52.128:9000/hbase</value>

<description>HBase data directory</description>

</property>

 

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

<description>指定HBase运行的模式:false:单机/伪分布;true:完全分布</description>

</property>

 

<property>

<name>hbase.master</name>

<value>hdfs://192.168.52.128:60000</value>

<description>指定Master位置</description>

</property>

 

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hadoop/zookeeper</value>

</property><property>

<name>hbase.zookeeper.quorum</name>

<value>192.168.52.128, 192.168.52.129, 192.168.52.130</value>

<description>指定ZooKeeper集群</description>

</property>

 

<property>

<name>hbase.master.info.bindAddress</name>

<value>192.168.52.128</value>

<description>The bind address for the HBase Master web UI

</description>

</property>

 

</configuration>

[root@name01 conf]#

 

5.2)配置文件regionservers:

[root@name01 conf]# more regionservers

192.168.52.128

192.168.52.129

192.168.52.130

[root@name01 conf]#

 


5.3),设置环境变量hbase-env.sh:

vim hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60

export HBASE_CLASSPATH=/home/hadoop/src/hbase-0.98.9-hadoop2/conf

export HBASE_HEAPSIZE=2048

export HBASE_MANAGES_ZK=false

 

其中,JAVA_HOME表示java安装目录,HBASE_CLASSPATH指向存放有Hadoop配置文件的目录,这样HBase可以找到HDFS的配置信息,由于本文Hadoop和HBase部署在相同的物理节点,所以就指向了Hadoop安装路径下的conf目录。HBASE_HEAPSIZE单位为MB,可以根据需要和实际剩余内存设置,默认为1000。HBASE_MANAGES_ZK=false指示HBase使用已有的Zookeeper而不是自带的。

 

6,向各个节点复制,然后配置各个节点的环境变量

 

第二个节点:

scp -r /home/hadoop/zookeeper hadoop@data01:/home/hadoop/zookeeper

scp -r /home/hadoop/src/hbase-0.98.9-hadoop2/ hadoop@data01:/home/hadoop/src/hbase-0.98.9-hadoop2

 

第三个节点:

scp -r /home/hadoop/zookeeper hadoop@data02:/home/hadoop/zookeeper

scp -r /home/hadoop/src/hbase-0.98.9-hadoop2/ hadoop@data02:/home/hadoop/src/hbase-0.98.9-hadoop2

 

7,启动和停止HBase

启动HBase:需事先启动HDFS和Zookeeper,启动顺序为HDFS-》Zookeeper-》HBase

7.1先启动hadoop进程:

[hadoop@name01 conf]$ /home/hadoop/src/hadoop-2.3.0/sbin/start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [name01]

name01: starting namenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-namenode-name01.out

data01: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data01.out

data02: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data02.out

Starting secondary namenodes [name01]

name01: starting secondarynamenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-secondarynamenode-name01.out

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-resourcemanager-name01.out

data02: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data02.out

data01: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data01.out

[hadoop@name01 conf]$

 

7.2再在节点一上启动hbaseserver1上启动所有的节点:start-hbase.sh

[hadoop@name01 conf]$ /home/hadoop/src/hbase-0.98.9-hadoop2/bin/start-hbase.sh

192.168.52.129: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-data01.out

192.168.52.130: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-data02.out

192.168.52.128: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-name01.out

starting master, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/logs/hbase-hadoop-master-name01.out

192.168.52.129: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-data01.out

192.168.52.130: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-data02.out

192.168.52.128: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-name01.out

 

8,管理操作Hbase

8.1启动结束,使用jps查看当前的进程

[hadoop@name01 conf]$ jps

8939 Jps

8755 HMaster

8890 HRegionServer

6794 NameNode

7117 ResourceManager

8691 HQuorumPeer

6971 SecondaryNameNode

[hadoop@name01 conf]$

 

8.2 进去hbase,查看状态

[hadoop@name01 conf]$ hbase shell

2015-01-08 01:11:25,986 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.9-hadoop2, r96878ece501b0643e879254645d7f3a40eaf101f, Mon Dec 15 23:00:20 PST 2014

 

hbase(main):001:0> status

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/src/hbase-0.98.9-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

3 servers, 0 dead, 0.6667 average load

 

hbase(main):002:0>

 

8.3 查看版本号:

hbase(main):002:0> version

0.98.9-hadoop2, r96878ece501b0643e879254645d7f3a40eaf101f, Mon Dec 15 23:00:20 PST 2014

 

hbase(main):003:0>

 

 

8.4 进入HBase,建表:

hbase(main):003:0> list

TABLE                                                                                                                                                                                         

0 row(s) in 0.1400 seconds

 

=> []

建表

hbase(main):004:0> create 'member','member_id','address','info';

查看所有表

hbase(main):005:0* list

0 row(s) in 2.2460 seconds

 

TABLE                                                                                                                                                                                          

member                                                                                                                                                                                        

1 row(s) in 0.0100 seconds

 

=> ["member"]

hbase(main):006:0>

 

8.5 查看表结构:

hbase(main):006:0> describe 'member'

Table member is ENABLED                                                                                                                                                                        

COLUMN FAMILIES DESCRIPTION                                                                                                                                                                   

{NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_

CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                            

{NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CEL

LS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                              

{NAME => 'member_id', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETE

D_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                          

3 row(s) in 0.1310 seconds

 

hbase(main):007:0>

 

8.6 查看表是否存在

hbase(main):007:0> is_enabled 'member'

true                                                                                                                                                                                          

0 row(s) in 0.1030 seconds

 

hbase(main):008:0>

 

8.7 录入表数据

插入数据,是 put

hbase(main):008:0> put 'member','xiaofeng','info:company','alibaba'

0 row(s) in 0.4970 seconds

 

hbase(main):009:0> put 'member','xiaofeng','address:company','alibaba'

0 row(s) in 0.0660 seconds

 

hbase(main):010:0>

 

8.8 添加新列age,值为27

hbase(main):018:0> put 'member','zhijie','info:age','27'

0 row(s) in 0.0550 seconds

hbase(main):019:0>

 

8.9 查询数据,是 get,查询zhijie的记录

hbase(main):025:0* get 'member','zhijie'

COLUMN                                           CELL                                                                                                                                         

 address:dingxilu                                timestamp=1420709522821, value=pl                                                                                                            

 info:age                                        timestamp=1420710488841, value=27                                                                                                             

2 row(s) in 0.7950 seconds

 

hbase(main):026:0>

 

8.10 查询表中所有info列族的数据:

hbase(main):026:0> scan 'member',{COLUMNS => 'info'}

ROW                                              COLUMN+CELL                                                                                                                                   

 xiaofeng                                        column=info:company, timestamp=1420708739539, value=alibaba                                                                                   

 zhijie                                          column=info:age, timestamp=1420710488841, value=27                                                                                           

2 row(s) in 0.2380 seconds

 

hbase(main):027:0>

 

8.11 删除member表:

hbase(main):027:0> disable 'member'

0 row(s) in 4.9110 seconds

 

hbase(main):028:0> drop 'member'

0 row(s) in 2.1370 seconds

 

hbase(main):029:0> list

TABLE                                                                                                                                                                                         

0 row(s) in 0.1030 seconds

 

=> []

hbase(main):030:0>

 

9Web上查看架构:

Hbase默认端口是60010,默认网址是:http://192.168.52.128:60010/master-status

如下图所示:
Hbase0.98版本的安装部署配置管理(Hadoop2.3、Hbase0.98、Hive0.13整合)_第3张图片

 

10,后,3个节点上通过jps查看hadoop+hbase启动的进程:

Name01上:

[hadoop@name01 conf]$ jps

9292 Main

8755 HMaster

8890 HRegionServer

6794 NameNode

11972 Jps

7117 ResourceManager

8691 HQuorumPeer

6971 SecondaryNameNode

[hadoop@name01 conf]$

 

Data01上:

[hadoop@data01 root]$ jps

3201 DataNode

3854 HRegionServer

3773 HQuorumPeer

3307 NodeManager

9948 Jps

[hadoop@data01 root]$

 

Data02上:

[hadoop@data02 root]$ jps

5840 Jps

3853 HRegionServer

3219 DataNode

3774 HQuorumPeer

3325 NodeManager

[hadoop@data02 root]$

 

 

11,报错记录统计:

 

11.1 碰到中文乱码问题:

centos乱码

1),yum install font* -y

 

2),编辑这个文件:   vi /etc/sysconfig/i18n   (说明:第二步 是否必须完成 有待考证,但我按第二步做了可以达到目的)

将LANG="en_US.UTF-8"

SYSFONT="latarcyrheb-sun16"

 

修改原内容为

LANG="zh_CN.GB18030"

LANGUAGE="zh_CN.GB18030:zh_CN.GB2312:zh_CN"

SUPPORTED="zh_CN.UTF-8:zh_CN:zh:en_US.UTF-8:en_US:en"

SYSFONT="lat0-sun16"

 

3),最为关键的步骤:命令行输入以下两条语句:  

cd /usr/share/fonts/

fc-cache   -fv            待fc-cache  -fv执行完成后。

 

4), 重启reboot

 

 

11.2,启动报错如下:

2015-01-08 00:14:29,707 FATAL [main] conf.Configuration: error parsing conf hbase-site.xml

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 1 字节的 UTF-8 序列的字节 1 无效。

 

解决:删除hbase-site.xml里面的中文注释

 

11.3,添加列数据报错:

hbase(main):016:0> put 'member','zhijie','age:27','pl'

 

ERROR: Unknown column family! Valid column names: address:*, info:*, member_id:*

 

Here is some help for this command:

Put a cell 'value' at specified table/row/column and optionally

timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'

at row 'r1' under column 'c1' marked with the time 'ts1', do:

 

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'

  hbase> put 't1', 'r1', 'c1', 'value'

  hbase> put 't1', 'r1', 'c1', 'value', ts1

  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}

  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

 

The same commands also can be run on a table reference. Suppose you had a reference

t to table 't1', the corresponding command would be:

 

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

 

 

hbase(main):017:0>

 

解决:

hbase(main):018:0> put 'member','zhijie','info:age','27'

0 row(s) in 0.0550 seconds

 

hbase(main):019:0>

 

 

  ----------------------------------------------------------------------------------------------------------------
<版权所有,允许转载,但必须以链接方式注明源地址,否则追究法律责任!>
原博客地址:   http://blog.itpub.net/26230597/viewspace-1400535/
原作者:
黄杉 (mchdba)
----------------------------------------------------------------------------------------------------------------

你可能感兴趣的:(Hbase0.98版本的安装部署配置管理(Hadoop2.3、Hbase0.98、Hive0.13整合))