Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
应用场景(Use Case):
-存储大数据 billions of rows X millions of columns (TB/PB)
-实时-随机读取以及写入 real-time random write/read to a big Data
* Hadoop 理论基础*
项目 | 理论 |
---|---|
Apache Hadoop | Google Map-Reduce |
Apache HBase | BigTable |
Apache HDFS | Google File System |
本文主要记录:如何安装单节点Standalone HBase、如何利用HBase Shell管理以及访问HBase
HBase 包括Master、Region Servers、Zookeepers
$tar zxf hbase-1.2.6-bin.tar.gz
$cd hbase-1.2.6
$vim ./conf/hbase-env.sh //hbase-env.sh 保存JAVA环境配置
#The java implementation to use. Java 1.7+ required.
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144/
$vim ./conf/hbase-site.xml // hbase-site.xml 是hbase的主要配置文件
<configuration>
<property>
<name>hbase.rootdirname> //hbase data dir
<value>file:///data/hbasevalue>
property>
<property>
<name>hbase.zookeeper.property.dataDirname> //zookeeper dir
<value>/data/zookeepervalue>
property>
configuration>
~/install/hbase-1.2.6$ ./bin/start-hbase.sh //启动
starting master, logging to /home/jack/install/hbase-1.2.6/bin/../logs/hbase-jack-master-jack-Alienware-14.out
~/install/hbase-1.2.6$ jps //查看Java进程
7091 HMaster
7403 Jps
在Standalone模式下,HBase 在单个 HMaster JVM的虚拟机下,运行所有角色,包括HMaster、单个HRegionServer、Zookeeper.通过[HBase Web UI](http://localhost:16010/)访问管理页。
~/install/hbase-1.2.6$ ./bin/hbase shell //1.启动Shell,连接HBase
hbase(main):001:0>
hbase(main):001:0>help //2.显示Shell命令帮助信息
hbase(main):004:0> create 'course','topic','course' // 3.创建Table
hbase(main):004:0> list //4.查询所有表
TABLE
course
test
2 row(s) in 0.1380 seconds
=> ["course", "test"]
hbase(main):005:0> describe 'course' //5.描述表 缩写(abbreviated)成 desc
Table course is ENABLED
course
COLUMN FAMILIES DESCRIPTION
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',
BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'topic', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', B
LOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0740 seconds
# 6.put 添加或者更新行数据
# HBase是基于Column存储的NoSql数据库:
# 'course' 是 table Name 'c1' 'c2' 是rowId
# 'topic:name' 'topic:id' 'topic:description' 是'ColumnFamily:Column Name' 表示列信息
# 'java' '13051'等是ColumnValue
hbase(main):006:0> put 'course','c1','topic:name','java'
0 row(s) in 0.0560 seconds
hbase(main):007:0> put 'course','c1','topic:id','13051'
0 row(s) in 0.0040 seconds
hbase(main):008:0> put 'course','c1','topic:description','Core Java Description'
0 row(s) in 0.0030 seconds
hbase(main):009:0> put 'course','c2','topic:description','Lucence Description'
0 row(s) in 0.0030 seconds
# 7.1 获取数据 - scan 扫描表数据
hbase(main):010:0> scan 'course'
ROW COLUMN+CELL
c1 column=topic:description, timestamp=1502630720311, value=Core Java Description
c1 column=topic:id, timestamp=1502630703966, value=13051
c1 column=topic:name, timestamp=1502630673421, value=java
c2 column=topic:description, timestamp=1502630739399, value=Lucence Description
2 row(s) in 0.0170 seconds
#7.2 获取数据 - get row数据
hbase(main):019:0> get 'course','c1'
COLUMN CELL
topic:description timestamp=1502630720311, value=Core Java Description
topic:id timestamp=1502630703966, value=13051
topic:name timestamp=1502630673421, value=java
3 row(s) in 0.0180 seconds
#8. disable or enable 表
hbase(main):032:0> disable 'course'
0 row(s) in 0.0130 seconds
hbase(main):033:0> enable 'course'
0 row(s) in 1.2330 seconds
#9. drop 表
hbase(main):032:0> drop 'course'
#10.退出Hbase Shell
# 仅退出Shell Hbase依旧在后台运行
hbase(main):032:0> quit
或者
hbase(main):032:0> exit
#11.关闭HBase
~/install/hbase-1.2.6$ ./bin/stop-hbase.sh
stopping hbase..................
~/install/hbase-1.2.6$ jps
10540 Jps
#12. 查看HBase版本
~/install/hbase-1.2.6$ ./bin/hbase version
HBase 1.2.6
Source code repository file:///home/busbey/projects/hbase/hbase-assembly/target/hbase-1.2.6 revision=Unknown
Compiled by busbey on Mon May 29 02:25:32 CDT 2017
From source with checksum 7e8ce83a648e252758e9dae1fbe779c9
#13.查看空间
hbase(main):066:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0070 seconds
#14.查看表空间内的表
hbase(main):066:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0070 seconds
#其他 查看hadoop版本
~$ hadoop version
#说明:以上所有命令如果遗忘,可以按tab提示、自动补全
使用浏览器打开 http://localhost:16010 HBase Web 管理页,查看以下信息或者配置信息
Region Name 命名规则
Table Name + ‘,’ +’start Key’ + ‘,’ + 随机生成的region Id
结论:
(1) Table 与 Region 关系是 Many To One的关系
(2) start key, end key表示region存储的数据在Table中的范围。
如果region的start key==empty key,则此region是表的第一个region;
如果region的end key是Empty key,则此region是表的最后一个region;
如果region的start key与end key都是Empty key,则此region是表的//**唯一//**region。
(3) ‘hbase:meta’ Table是内部系统表或者目录表,记录Hbase里所有region的信息。
1. HBase Book Document
2. 博客之家 - 博文
3. Youtube Cloudera 视频