简介:
HStore存储是HBase存储的核心了,其中由两部分组成,一部分是MemStore,一部分是StoreFiles。MemStore是Sorted Memory Buffer,用户写入的数据首先会放入MemStore,当MemStore满了以后会Flush成一个StoreFile(底层实现是HFile),当StoreFile文件数量增长到一定阈值,会触发Compact合并操作,将多个StoreFiles合并成一个StoreFile,合并过程中会进行版本合并和数据删除,因此可以看出HBase其实只有增加数据,所有的更新和删除操作都是在后续的compact过程中进行的,这使得用户的写操作只要进入内存中就可以立即返回,保证了HBase I/O的高性能。
当StoreFiles Compact后,会逐步形成越来越大的StoreFile,当单个StoreFile大小超过一定阈值后,会触发Split操作,同时把当前Region Split成2个Region,父Region会下线,新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer上,使得原先1个Region的压力得以分流到2个Region上。
1,HBase的架构:
LSM - 解决磁盘随机写问题(顺序写才是王道);
HFile - 解决数据索引问题(只有索引才能高效读);
WAL - 解决数据持久化(面对故障的持久化解决方案);
zooKeeper - 解决核心数据的一致性和集群恢复;
Replication - 引入类似MySQL的数据复制方案,解决可用性;
此外还有:自动分拆Split、自动压缩(compaction,LSM的伴生技术)、自动负载均衡、自动region迁移。
HBase集群需要依赖于一个Zookeeper ensemble。HBase集群中的所有节点以及要访问HBase
的客户端都需要能够访问到该Zookeeper ensemble。HBase自带了Zookeeper,但为了方便
其他应用程序使用Zookeeper,最好使用单独安装的Zookeeper ensemble。此外,Zookeeper ensemble一般配置为奇数个节点,并且Hadoop集群、Zookeeper ensemble、HBase集群是三个互相独立的集群,并不需要部署在相同的物理节点上,他们之间是通过网络通信的。
2,Hadoop和Hbase的版本匹配
http://hbase.apache.org/book.html#configuration
下面在给列出官网信息:
下面面符号的含义:
S =支持并且测试,
X = 不支持,
NT =应该可以,但是没有测试。如下图所示:
3,下载地址
从Step2的图中看出,由于我安装的hadoop是2.3.0,所以可以选择0.96以上的hbase版本,这里选择比较稳健的0.98版本的hbase下载。
进hbase官网
http://hbase.apache.org/
进去,找到下载,进去
http://www.apache.org/dyn/closer.cgi/hbase/
再进去,选择HTTP,第一个mirrors,找到下载地址如下:
http://mirrors.cnnic.cn/apache/hbase/hbase-0.98.9/hbase-0.98.9-hadoop2-bin.tar.gz
4 ,开始安装
tar zxvf hbase-0.98.9-hadoop2-bin.tar.gz -C /home/hadoop/src/
5,配置
5.1),配置hbase-site.xml
开始修改配置文件:/home/hadoop/src/hbase-0.98.9-hadoop2/conf
完全分布式安装:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.52.128:9000/hbase</value>
<description>HBase数据存储目录</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>指定HBase运行的模式:false:单机/伪分布;true:完全分布</description>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://192.168.52.128:60000</value>
<description>指定Master位置</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property><property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.52.128, 192.168.52.129, 192.168.52.130</value>
<description>指定ZooKeeper集群</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>192.168.52.128</value>
<description>The bind address for the HBase Master web UI
</description>
</property></configuration>
5.1), 配置
[root@name01 conf]# more hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.52.128:9000/hbase</value>
<description>HBase data directory</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>指定HBase运行的模式:false:单机/伪分布;true:完全分布</description>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://192.168.52.128:60000</value>
<description>指定Master位置</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property><property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.52.128, 192.168.52.129, 192.168.52.130</value>
<description>指定ZooKeeper集群</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>192.168.52.128</value>
<description>The bind address for the HBase Master web UI
</description>
</property>
</configuration>
[root@name01 conf]#
5.2),配置文件regionservers:
[root@name01 conf]# more regionservers
192.168.52.128
192.168.52.129
192.168.52.130
[root@name01 conf]#
5.3),设置环境变量hbase-env.sh:
vim hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60
export HBASE_CLASSPATH=/home/hadoop/src/hbase-0.98.9-hadoop2/conf
export HBASE_HEAPSIZE=2048
export HBASE_MANAGES_ZK=false
其中,JAVA_HOME表示java安装目录,HBASE_CLASSPATH指向存放有Hadoop配置文件的目录,这样HBase可以找到HDFS的配置信息,由于本文Hadoop和HBase部署在相同的物理节点,所以就指向了Hadoop安装路径下的conf目录。HBASE_HEAPSIZE单位为MB,可以根据需要和实际剩余内存设置,默认为1000。HBASE_MANAGES_ZK=false指示HBase使用已有的Zookeeper而不是自带的。
6,向各个节点复制,然后配置各个节点的环境变量
第二个节点:
scp -r /home/hadoop/zookeeper hadoop@data01:/home/hadoop/zookeeper
scp -r /home/hadoop/src/hbase-0.98.9-hadoop2/ hadoop@data01:/home/hadoop/src/hbase-0.98.9-hadoop2
第三个节点:
scp -r /home/hadoop/zookeeper hadoop@data02:/home/hadoop/zookeeper
scp -r /home/hadoop/src/hbase-0.98.9-hadoop2/ hadoop@data02:/home/hadoop/src/hbase-0.98.9-hadoop2
7,启动和停止HBase
启动HBase:需事先启动HDFS和Zookeeper,启动顺序为HDFS-》Zookeeper-》HBase
7.1先启动hadoop进程:
[hadoop@name01 conf]$ /home/hadoop/src/hadoop-2.3.0/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [name01]
name01: starting namenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-namenode-name01.out
data01: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data01.out
data02: starting datanode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-datanode-data02.out
Starting secondary namenodes [name01]
name01: starting secondarynamenode, logging to /home/hadoop/src/hadoop-2.3.0/logs/hadoop-hadoop-secondarynamenode-name01.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-resourcemanager-name01.out
data02: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data02.out
data01: starting nodemanager, logging to /home/hadoop/src/hadoop-2.3.0/logs/yarn-hadoop-nodemanager-data01.out
[hadoop@name01 conf]$
7.2再在节点一上启动hbase,server1上启动所有的节点:start-hbase.sh
[hadoop@name01 conf]$ /home/hadoop/src/hbase-0.98.9-hadoop2/bin/start-hbase.sh
192.168.52.129: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-data01.out
192.168.52.130: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-data02.out
192.168.52.128: starting zookeeper, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-zookeeper-name01.out
starting master, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/logs/hbase-hadoop-master-name01.out
192.168.52.129: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-data01.out
192.168.52.130: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-data02.out
192.168.52.128: starting regionserver, logging to /home/hadoop/src/hbase-0.98.9-hadoop2/bin/../logs/hbase-hadoop-regionserver-name01.out
8,管理操作Hbase
8.1启动结束,使用jps查看当前的进程
[hadoop@name01 conf]$ jps
8939 Jps
8755 HMaster
8890 HRegionServer
6794 NameNode
7117 ResourceManager
8691 HQuorumPeer
6971 SecondaryNameNode
[hadoop@name01 conf]$
8.2 进去hbase,查看状态
[hadoop@name01 conf]$ hbase shell
2015-01-08 01:11:25,986 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.9-hadoop2, r96878ece501b0643e879254645d7f3a40eaf101f, Mon Dec 15 23:00:20 PST 2014
hbase(main):001:0> status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/src/hbase-0.98.9-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/src/hadoop-2.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
3 servers, 0 dead, 0.6667 average load
hbase(main):002:0>
8.3 查看版本号:
hbase(main):002:0> version
0.98.9-hadoop2, r96878ece501b0643e879254645d7f3a40eaf101f, Mon Dec 15 23:00:20 PST 2014
hbase(main):003:0>
8.4 进入HBase,建表:
hbase(main):003:0> list
TABLE
0 row(s) in 0.1400 seconds
=> []
建表
hbase(main):004:0> create 'member','member_id','address','info';
查看所有表
hbase(main):005:0* list
0 row(s) in 2.2460 seconds
TABLE
member
1 row(s) in 0.0100 seconds
=> ["member"]
hbase(main):006:0>
8.5 查看表结构:
hbase(main):006:0> describe 'member'
Table member is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_
CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CEL
LS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'member_id', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETE
D_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
3 row(s) in 0.1310 seconds
hbase(main):007:0>
8.6 查看表是否存在
hbase(main):007:0> is_enabled 'member'
true
0 row(s) in 0.1030 seconds
hbase(main):008:0>
8.7 录入表数据
插入数据,是 put
hbase(main):008:0> put 'member','xiaofeng','info:company','alibaba'
0 row(s) in 0.4970 seconds
hbase(main):009:0> put 'member','xiaofeng','address:company','alibaba'
0 row(s) in 0.0660 seconds
hbase(main):010:0>
8.8 添加新列age,值为27:
hbase(main):018:0> put 'member','zhijie','info:age','27'
0 row(s) in 0.0550 seconds
hbase(main):019:0>
8.9 查询数据,是 get,查询zhijie的记录
hbase(main):025:0* get 'member','zhijie'
COLUMN CELL
address:dingxilu timestamp=1420709522821, value=pl
info:age timestamp=1420710488841, value=27
2 row(s) in 0.7950 seconds
hbase(main):026:0>
8.10 查询表中所有info列族的数据:
hbase(main):026:0> scan 'member',{COLUMNS => 'info'}
ROW COLUMN+CELL
xiaofeng column=info:company, timestamp=1420708739539, value=alibaba
zhijie column=info:age, timestamp=1420710488841, value=27
2 row(s) in 0.2380 seconds
hbase(main):027:0>
8.11 删除member表:
hbase(main):027:0> disable 'member'
0 row(s) in 4.9110 seconds
hbase(main):028:0> drop 'member'
0 row(s) in 2.1370 seconds
hbase(main):029:0> list
TABLE
0 row(s) in 0.1030 seconds
=> []
hbase(main):030:0>
9,Web上查看架构:
Hbase默认端口是60010,默认网址是:http://192.168.52.128:60010/master-status
10,后,3个节点上通过jps查看hadoop+hbase启动的进程:
Name01上:
[hadoop@name01 conf]$ jps
9292 Main
8755 HMaster
8890 HRegionServer
6794 NameNode
11972 Jps
7117 ResourceManager
8691 HQuorumPeer
6971 SecondaryNameNode
[hadoop@name01 conf]$
Data01上:
[hadoop@data01 root]$ jps
3201 DataNode
3854 HRegionServer
3773 HQuorumPeer
3307 NodeManager
9948 Jps
[hadoop@data01 root]$
Data02上:
[hadoop@data02 root]$ jps
5840 Jps
3853 HRegionServer
3219 DataNode
3774 HQuorumPeer
3325 NodeManager
[hadoop@data02 root]$
11,报错记录统计:
11.1 碰到中文乱码问题:
centos乱码
1),yum install font* -y
2),编辑这个文件: vi /etc/sysconfig/i18n (说明:第二步 是否必须完成 有待考证,但我按第二步做了可以达到目的)
将LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
修改原内容为
LANG="zh_CN.GB18030"
LANGUAGE="zh_CN.GB18030:zh_CN.GB2312:zh_CN"
SUPPORTED="zh_CN.UTF-8:zh_CN:zh:en_US.UTF-8:en_US:en"
SYSFONT="lat0-sun16"
3),最为关键的步骤:命令行输入以下两条语句:
cd /usr/share/fonts/
fc-cache -fv 待fc-cache -fv执行完成后。
4), 重启reboot
11.2,启动报错如下:
2015-01-08 00:14:29,707 FATAL [main] conf.Configuration: error parsing conf hbase-site.xml
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 1 字节的 UTF-8 序列的字节 1 无效。
解决:删除hbase-site.xml里面的中文注释
11.3,添加列数据报错:
hbase(main):016:0> put 'member','zhijie','age:27','pl'
ERROR: Unknown column family! Valid column names: address:*, info:*, member_id:*
Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase(main):017:0>
解决:
hbase(main):018:0> put 'member','zhijie','info:age','27'
0 row(s) in 0.0550 seconds
hbase(main):019:0>
----------------------------------------------------------------------------------------------------------------
<版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!>
原博客地址: http://blog.itpub.net/26230597/viewspace-1400535/
原作者:黄杉 (mchdba)