1.安装环境简介
物理笔记本:i5 2.27GHz (4 CPU) 4G内存320GB硬盘32位win7操作系统
虚拟机:Product VMware® Workstation Version 7.0.0 build-203739
虚拟机安装配置URL:http://ideapad.it168.com/thread-2088751-1-1.html不会配置的朋友请见
包括(vm tools工具、linux与windows共享文件配置)
linux虚拟机配置master(h1)slave1(h2)slave2(h4)
CPU:1颗2核
内存:512MB
硬盘:10GB
Linux ISO:CentOS-6.0-i386-bin-DVD.iso 32位
JDK version:"1.6.0_25-ea"
Hadoop software version:hadoop-0.20.205.0.tar.gz
Eclipse version:eclipse-SDK-4.2-linux-gtk.tar.gz and eclipse-SDK-4.2.1-linux-gtk.tar.gz
2.Hbase安装模式
单机模式
伪分布模式:单台主机模拟分布式
完全分布模式:本次是采用完全分布式来部署Hbase列式数据库
3.部署之前的准备
(1)JDK已经安装,一般要求版本为1.6以上,安装步骤请参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=18315&fromuid=303此贴第4步
(2)Hadoop集群已经安装,安装步骤请参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=18315&fromuid=303
完全分布式模式hadoop集群安装与配置
(4)Hbase非关系型列式数据库版本选择
Hbase与Hadoop要求版本配达,不同的Hbase匹配不同的Hadoop,当匹配错误的时候就会安装失败,那么如何快捷的找到两者匹配关系呢?1是找度娘2是官方文档
本次安装使用的hadoop版本:hadoop-0.20.205.0.tar.gz
本次安装使用的hbase版本:hbase-0.90.5
(5)温馨提示hbase-0.90.5.tar.gz包的大小正常为31662866 bytes
为什么要说这个呢,在下载的时候有时会因为网络原因导致下载的不完全,但不易发现,如果使用了不完全的包来进行安装就会导致失败,而这个失败还不容易被发现,有时会让人欲罢不能。一个小小经验。
(6)验证Hadoop集群是否正常启动
一般有三种验证方式可以参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=23054&fromuid=303
1. Shell命令方式:bin/hadoop dfsadmin –report 只需在一个节点上执行即可
2. JPS进程方式:[grid@h1 bin]$ jps 需要在所有节点上执行
21465 Jps
15378 SecondaryNameNode
9488
15248 NameNode
15443 JobTracker
3. 浏览器方式:输入http://h1:50070 or http://192.168.2.102:50070/dfshealth.jsp
我们使用第一种比较方便的方法
[grid@h1 hadoop-0.20.2]$ bin/hadoop dfsadmin -report
Configured Capacity: 19865944064 (18.5 GB)
Present Capacity: 8932794368 (8.32 GB)
DFS Remaining: 8932655104 (8.32 GB)
DFS Used: 139264 (136 KB)
DFS Used%: 0%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead) --2个节点存活无shutdown
Name: 192.168.2.103:50010 -- slaves h2
Decommission Status : Normal --状态正常
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 5351727104 (4.98 GB)
DFS Remaining: 4581175296(4.27 GB)
DFS Used%: 0%
DFS Remaining%: 46.12%
Last contact: Sun Oct 28 18:18:08 CST 2012
Name: 192.168.2.105:50010 -- slaves h4
Decommission Status : Normal --状态正常
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 5581422592 (5.2 GB)
DFS Remaining: 4351479808(4.05 GB)
DFS Used%: 0%
DFS Remaining%: 43.81%
Last contact: Sun Oct 28 18:18:09 CST 2012
(7)检查/etc/hosts文件内容
192.168.2.102 h1
192.168.2.103 h2
192.168.2.105 h4
答:Hadoop一般通过主机名来与集群中的节点进行通信,因此我们要将所有节点的ip与主机名映射关系写入到/etc/hosts中来保证我们的通信正常。如果有的朋友在配置文件中使用了ip后操作不正常请修改成对应的主机名即可。
4.开始Hbase完全分布式安装与配置
(1)把hbase-0.90.5.tar.gz上传到h1:/home/grid/目录下
[grid@h1 grid]$ pwd
/home/grid
[grid@h1 grid]$ ll
总用量30996
-rwxrwxrwx. 1 grid hadoop 44 9月18 19:10 abc.txt
-rwxrwxrwx. 1 grid hadoop 5519 10月12 22:09 Exercise_1.jar
drwxr-xr-x. 14 grid hadoop 4096 9月18 07:05 hadoop-0.20.2
-rwxrw-rw-. 1 grid hadoop 31662866 10月27 20:38 hbase-0.90.5.tar.gz
请注意大小哦:31662866,如果下载的包有缺陷可是安装不成功的哦:)
(2)解包 tar -xzvf hbase-0.90.5.tar.gz
(3)替换hadoop核心jar包
注意不同的版本替换的jar包不一样哦
我用的hadoop-0.20.205.0.tar.gz版
需要用/home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar
替换/home/grid/hbase-0.90.5/lib/ hadoop-core-0.20-append-r1056497.jar
解决:hadoop与hbase内核版本不兼容问题,因为hbase的lib目录下的hadoop的包比我安装的0.20.2的版本要新,需要用0.20.2的hadoop包替换,这点官网的文档是有说明的,本文使用/home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar包
替换/home/grid/hbase-0.90.5/lib/ hadoop-core-0.20-append-r1056497.jar包
还有用cp ~/hadoop-0.20.2/lib/commons-configuration-1.6.jar ~/hbase-0.90.5/lib/
如果你使用的是cloudera公司的定制版hadoop和hbase那么就免去了替换jar的过程,因为cloudera公司已经把所有的兼容性问题都解决了
首先我先把hadoop-core-0.20-append-r1056497.jar重命名以备后用,在复制过去
mv hadoop-core-0.20-append-r1056497.jar hadoop-core-0.20-append-r1056497.jar.bak
cp /home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/grid/hbase-0.90.5/lib/
修改权限,使其具有执行权限
chmod 755 hadoop-0.20.2-core.jar
(4)编辑/home/grid/hbase-0.90.5/conf/hbase-env.sh 添加红色字符串修改3个地方就可以
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/java/jdk1.6.0_25
#指定JDK安装目录,让hbase可以默认找到JDK
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/home/grid/hadoop-0.20.2/conf
#指定Hadoop配置目录,引导hbase找到hadoop,因为hbase是基于hadoop的数据库
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
#启动hbase集成的zookeeper工具
# Where log files are stored. $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
#Hbase的log目录默认在$HBASE_HOME家目录下,如果想设置到其他地方可以修改
(5)编辑/home/grid/hbase-0.90.5/conf/hbase-site.xml 添加红色字符串
hbase.rootdir #设置hbase数据库存放数据的目录
hdfs://h1:9000/hbase
hbase.cluster.distributed #打开hbase分布模式
true
hbase.master #指定hbase集群主控节点
h1:60000
hbase.zookeeper.quorum
h1,h2,h4 #指定zookeeper集群节点名,必须为奇数
#因为是由zookeeper表决算法决定的
hbase.zookeeper.property.dataDir #指zookeeper集群data目录
/home/grid/hbase-0.90.5/zookeeper
注:上面的数据目录建议设置,因为Linux默认会把数据存放在/tmp目录下,当Linux重启后会清空/tmp目录,那时你的数据可就找不回来啦!不要当小白鼠哦!
(6)编辑/home/grid/hbase-0.90.5/conf/regionservers 添加regionservers节点
默认写着localhost修改为
h2
h4
[grid@h1 conf]$ cat regionservers
h2
h4
(7)将修改好的hbase-0.90.5软件目录同步到所有节点,我的集群中h1同步到h2 h4
scp -r /home/grid/hbase-0.90.5 grid@h2:/home/grid/
scp -r /home/grid/hbase-0.90.5 grid@h4:/home/grid/
参数-r 指定拷贝目录,不写指定拷贝文件
[grid@h2 ~]$ ll
drwxr-xr-x. 8 grid hadoop 4096 10月28 21:00 hbase-0.90.5 已经完成同步
[grid@h4 ~]$ ll
drwxr-xr-x. 8 grid hadoop 4096 10月28 21:04 hbase-0.90.5 已经完成同步
5.启动/关闭Hbase数据库集群
做完以上的配置我们就可以启动Hbase集群了,在启动之前我们要检查一下Hadoop集群是否已经启动,必须先启动Hadoop在启动Hbase,我想道理大家都应该明白吧!Hadoop是Hbase的宿主。
[grid@h1 hbase-0.90.5]$bin/start-hbase.sh 启动Hbase集群命令
h2:starting zookeeper, logging to 说明h2:HQuorumPeer进程已经启动
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h2.out
h4:starting zookeeper, logging to 说明h4:HQuorumPeer进程已经启动
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h4.out
h1:starting zookeeper, logging to 说明h1:HQuorumPeer进程已经启动
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h1.out
starting master, logging to 说明h1:HMaster进程已经启动
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-h1.out
h4:starting regionserver, logging to 说明h4:Region服务器
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-h4.out
h2:starting regionserver, logging to 说明h2:Region服务器
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-h2.out
H1 master节点进程
[grid@h1 hbase-0.90.5]$ jps
8817HMaster Hbase集群的主控进程
9149 Jps
4709 JobTracker
4515 NameNode
4650 SecondaryNameNode
8781HQuorumPeer zookeeper集群进程
H2 H4 slave节点进程
[grid@h2 ~]$ jps
17188 TaskTracker
31445HRegionServer Hbase集群的Region服务器
31355HQuorumPeer zookeeper集群进程
17077 DataNode
[grid@h4 ~]$ jps
27829 TaskTracker
17119 DataNode
29134HQuorumPeer zookeeper集群进程
29208HRegionServer Hbase集群的Region服务器
浏览器验证 http://192.168.2.102:60010/master.jsp 当看到如下界面证明Hbase已经完美安装
当我们在浏览主页的时候提示:You are currently running the HMaster without HDFS append support enabled. This may result in data loss. Please see theHBase wikifor details.
查看了hdfs-default.xml中,看到如下说明,Hadoop-0.20.2版本有bug不能支持HDFS追加功能,因此只能作罢。如果你用的是其他版本可能就不会有!
dfs.support.append
false
Does HDFS allow appends to files? 是否允许HDFS追加文件呢
This is currently set to false because there are bugs in the 当前设置为false因为有bug
"append code" and is not supported in any production cluster.所以不支持任何集群
关闭Hbase数据库集群
[grid@h1 bin]$ ./stop-hbase.sh
stopping hbase
h2: stopping zookeeper..
h4: stopping zookeeper...
h1: stopping zookeeper..
小结:到此我们安装Hbase数据库已经完美完成,在操作的步骤中注意不同版本覆盖的文件不同,还要注意版本的配达要求,如果你使用的是VM虚拟机来安装的话,当你重启机器的时候可能会遇到节点HMaster、HQuorumPeer、HRegionServer进程不同程度的无法启动现象,先使用/bin/stop-hbase.sh停掉所有集群进程,在使用/bin/start-hbase.sh启动集群即可,必须所有进程全部正常启动后才能操作数据库否则会报错禁止操作。
6.进入Hbase数据库shell命令行操作
(1)进入shell命令行
[grid@h1 hbase-0.90.5]$bin/hbase shell
HBase Shell; enter 'help' for list of supported commands.(help打开帮助命令列表)
Type "exit" to leave the HBase Shell(exit命令退出shell命令行)
Version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0>exit 完全退出hbase数据库
(2)查看数据库状态
hbase(main):001:0>status 当前数据库集群状态
2 servers, 0 dead, 1.0000 average load
有2个服务器活着,0台down(也就是失去联系),当前负荷(数字越大,负荷越大)
(3)Shell命令帮助
hbase(main):001:0>help
HBase Shell, version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011 版本信息
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general 普通命令组
Commands: status, version 命令列表
Group name: ddl 数据定义语言命令组
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
Group name: dml 数据操作语言命令组
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Group name: tools 工具组
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump
Group name: replication 复制命令组
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication
SHELL USAGE: shell语法
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "keyx03x3fxcd"
hbase> get 't1', "key 03 23 11"
hbase> put 't1', "testxefxff", 'f1:', "x01x33x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, seehttp://hbase.apache.org/docs/current/book.html
hbase(main):002:0> hbase(main):001:0> help
SyntaxError: (hbase):2: syntax error, unexpected ':'
(4)查询数据库版本
hbase(main):002:0>version
0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011 版本0.90.5
(5)创建表 create
注:Hbase除了表没有其他数据库对象,所以命令create即可
我为分组网信令监测系统-热点网站 第一张表
表名:heat_sites
列族1:msisdn
列族2:user
列族3:sites
语法:create 'heat_sites','msisdn','user','sites'
hbase(main):001:0>create 'heat_sites','msisdn','user','sites'
0 row(s) in 24.3540 seconds 创建时间
用户业务倾向性分析 第二张表
表名:user_business
列族1:msisdn
列族2:user
列族3:business
语法:create 'user_business','msisdn','user','business'
hbase(main):002:0>create 'user_business','msisdn','user','business'
0 row(s) in 1.7390 seconds 创建时间
(6)查看所有表
hbase(main):003:0>list
TABLE
heat_sites 第一张热点网站表
user_business 第二张用户业务倾向性分析
2 row(s) in 0.3040 seconds
(7)查看表结构
hbase(main):004:0>describe 'heat_sites'
DESCRIPTION ENABLED
{NAME => 'heat_sites', FAMILIES => [{NAME => 'msisdn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COM true
PRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
CACHE => 'true'}, {NAME => 'sites', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE',
VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {
NAME => 'user', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TT
L => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.2780 seconds
hbase(main):005:0>describe 'user_business'
DESCRIPTION ENABLED
{NAME => 'user_business', FAMILIES => [{NAME => 'business', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' true
, COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'msisdn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => '
NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'tru
e'}, {NAME => 'user', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '
3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0370 seconds
(8)删除表中列族
凡是要修改表的结构hbase规定,必须先禁用表->修改表->启用表 直接修改会报错
hbase(main):007:0> alter 'user_business',{NAME=>'user',METHOD=>'delete'}
ERROR: Table user_business is enabled. Disable it first before altering.
报错:表是启用状态,在修改之前首先禁用它,注意语法中关键字要区分大小写
hbase(main):008:0>disable 'user_business' 禁用表
0 row(s) in 2.1850 seconds
hbase(main):009:0>alter 'user_business',{NAME=>'user',METHOD=>'delete'} 修改表
0 row(s) in 0.0950 seconds
hbase(main):010:0>enable 'user_business' 启用表
(9)删除表
同样对表进行任何的操作都需要先禁用表->修改->启用表,删除同样
hbase(main):011:0>disable 'user_business' 禁用表
0 row(s) in 2.6690 seconds
hbase(main):013:0>is_disabled 'user_business' 查看是否已经禁用
true 是的
0 row(s) in 0.0150 seconds
hbase(main):014:0>drop 'user_business' 删除表
0 row(s) in 1.9930 seconds
(10)查询表是否存在
hbase(main):015:0>exists 'heat_sites'
Table heat_sites does exist 这个表存在
0 row(s) in 0.1540 seconds
hbase(main):016:0>exists 'user_business'
Table user_business does not exist 这个表已经删除了
0 row(s) in 0.0650 seconds
(11)判断表是否enable或disable
hbase(main):005:0*is_enabled 'heat_sites' 这个说明此表已经启动
true
0 row(s) in 0.0550 seconds
hbase(main):006:0>is_disabled 'user_business' 这个表不存在但此命令监测不出来
false
0 row(s) in 0.9820 seconds
hbase(main):007:0>is_enabled 'user_business' 同理
true
0 row(s) in 0.0150 seconds
(12)插入记录
A对于hbase来说insert update其实没有什么区别,都是插入原理
B在hbase中没有数据类型概念,都是“字符类型”,至于含义在程序中体现
C每插入一条记录都会自动建立一个时间戳,由系统自动生成。我们也可以手动“强行指定”时间戳,例如同时向n张表插入记录,要求所有记录时间戳一致。
hbase(main):015:0*put'heat_sites','leonarding','msisdn:id','13672122125'
0 row(s) in 0.6950 seconds 这是leonarding手机号信息
hbase(main):016:0>put'heat_sites','leonarding','msisdn:*#06#','100'
0 row(s) in 0.0920 seconds 这是leonarding手机信息
hbase(main):017:0>put'heat_sites','leonarding','user:name','liusheng'
0 row(s) in 0.1620 seconds 这是leonarding的用户名
hbase(main):018:0>put'heat_sites','leonarding','user:age','28'
0 row(s) in 0.0410 seconds 这是leonarding的年龄
hbase(main):019:0>put'heat_sites','leonarding','sites:http','www.dataguru.cn'
0 row(s) in 0.2090 seconds 这是leonarding的登陆网址
hbase(main):020:0>put'heat_sites','leonarding','sites:name','lianshuchengjin'
0 row(s) in 0.0460 seconds 这是leonarding的登陆的网站名
hbase(main):021:0>put'heat_sites','sunev_yu','msisdn:id','18866662222'
0 row(s) in 0.0570 seconds 这是sunev_yu手机号信息
hbase(main):022:0>put'heat_sites','sunev_yu','msisdn:*#06#','101'
0 row(s) in 0.0200 seconds 这是sunev_yu手机信息
hbase(main):023:0>put'heat_sites','sunev_yu','user:name','yushuanghai'
0 row(s) in 0.0110 seconds 这是sunev_yu的用户名
hbase(main):024:0>put'heat_sites','sunev_yu','user:age','26'
0 row(s) in 0.3310 seconds 这是sunev_yu的年龄
hbase(main):025:0>put'heat_sites','sunev_yu','sites:http','www.dataguru.cn'
0 row(s) in 0.0530 seconds 这是sunev_yu的登陆网址
hbase(main):026:0>
hbase(main):027:0*put'heat_sites','sunev_yu','sites:name','lianshuchengjin'
0 row(s) in 0.0790 seconds 这是sunev_yu的登陆的网站名
hbase(main):028:0>put'heat_sites','tigerfish','msisdn:id','15911112222'
0 row(s) in 0.0350 seconds 这是tigerfish的手机号信息
hbase(main):029:0>put'heat_sites','tigerfish','msisdn:*#06#','102'
0 row(s) in 0.0360 seconds 这是tigerfish的手机信息
hbase(main):001:0>put'heat_sites','tigerfish','user:name','huangzhihong'
0 row(s) in 0.5160 seconds 这是tigerfish的用户名
hbase(main):002:0>put'heat_sites','tigerfish','user:age','100'
0 row(s) in 0.0430 seconds 这是tigerfish的年龄
hbase(main):003:0>put'heat_sites','tigerfish','sites:http','www.itpub.net'
0 row(s) in 0.0150 seconds 这是tigerfish的登陆网址
hbase(main):004:0>put'heat_sites','tigerfish','sites:name','itpub'
0 row(s) in 0.0460 seconds 这是tigerfish的登陆的网站名
(13)获取一个行键的所有数据
注:必须通过行键Row Key来查询数据
hbase(main):001:0>get'heat_sites','leonarding' 获取leonarding行键数据
COLUMN CELL
msisdn:*#06# timestamp=1351560318018, value=100
msisdn:id timestamp=1351560274457, value=13672122125
sites:http timestamp=1351560423739, value=www.dataguru.cn
sites:name timestamp=1351560476264, value=lianshuchengjin
user:age timestamp=1351560350911, value=28
user:name timestamp=1351560335833, value=liusheng
hbase(main):006:0*get'heat_sites','sunev_yu' 获取sunev_yu行键数据
COLUMN CELL
msisdn:*#06# timestamp=1351560560622, value=101
msisdn:id timestamp=1351560540173, value=18866662222
sites:http timestamp=1351560630783, value=www.dataguru.cn
sites:name timestamp=1351560664387, value=lianshuchengjin
user:age timestamp=1351560606783, value=26
user:name timestamp=1351560585193, value=yushuanghai
hbase(main):005:0>get'heat_sites','tigerfish' 获取tigerfish行键数据
COLUMN CELL
msisdn:*#06# timestamp=1351560873212, value=102
msisdn:id timestamp=1351560851244, value=15911112222
sites:http timestamp=1351562148765, value=www.itpub.net
sites:name timestamp=1351562171874, value=itpub
user:age timestamp=1351562118827, value=100
user:name timestamp=1351562102858, value=huangzhihong
(14)获取一个行键,一个列族的所有数据(就是指定列族,又叫列键)
hbase(main):006:0>get'heat_sites','leonarding','sites' 列族sites
COLUMN CELL
sites:http timestamp=1351560423739, value=www.dataguru.cn
sites:name timestamp=1351560476264, value=lianshuchengjin
2 row(s) in 0.0760 seconds
hbase(main):009:0>get'heat_sites','sunev_yu','msisdn' 列族msisdn
COLUMN CELL
msisdn:*#06# timestamp=1351560560622, value=101
msisdn:id timestamp=1351560540173, value=18866662222
2 row(s) in 0.0370 seconds
hbase(main):010:0>get'heat_sites','tigerfish','user' 列族user
COLUMN CELL
user:age timestamp=1351562118827, value=100
user:name timestamp=1351562102858, value=huangzhihong
2 row(s) in 0.0320 seconds
(15)获取一个行键,一个列族中一个列的所有数据
hbase(main):011:0>get'heat_sites','leonarding','user:name' name列信息
COLUMN CELL
user:name timestamp=1351560335833, value=liusheng
1 row(s) in 0.0360 seconds
hbase(main):012:0>get'heat_sites','sunev_yu','msisdn:id' id列信息
COLUMN CELL
msisdn:id timestamp=1351560540173, value=18866662222
1 row(s) in 0.0850 seconds
hbase(main):013:0>get'heat_sites','tigerfish','sites:http' http列信息
COLUMN CELL
sites:http timestamp=1351562148765, value=www.itpub.net
1 row(s) in 0.0110 seconds
(16)更新一条记录
实质:与插入一条记录一样
hbase(main):014:0>put'heat_sites','leonarding','msisdn:id','18977777777' 更新手机号
0 row(s) in 0.0220 seconds
hbase(main):003:0*get'heat_sites','leonarding','msisdn:id'
COLUMN CELL
msisdn:id timestamp=1351563680951, value=18977777777
1 row(s) in 1.2500 seconds 时间戳已经更新,只显示最新版本
(17)通过timestamp来获取数据
hbase(main):004:0>get'heat_sites','leonarding',{COLUMN=>'msisdn:id',TIMESTAMP=>1351563680951}
COLUMN CELL
msisdn:id timestamp=1351563680951, value=18977777777
1 row(s) in 0.0230 seconds 通过指定时间戳查询想要的版本
(18)全面扫描
hbase(main):005:0>scan 'heat_sites'
ROW COLUMN+CELL
leonarding column=msisdn:*#06#, timestamp=1351560318018, value=100
leonarding column=msisdn:id, timestamp=1351563680951, value=18977777777
leonarding column=sites:http, timestamp=1351560423739, value=www.dataguru.cn
leonarding column=sites:name, timestamp=1351560476264, value=lianshuchengjin
leonarding column=user:age, timestamp=1351560350911, value=28
leonarding column=user:name, timestamp=1351560335833, value=liusheng
sunev_yu column=msisdn:*#06#, timestamp=1351560560622, value=101
sunev_yu column=msisdn:id, timestamp=1351560540173, value=18866662222
sunev_yu column=sites:http, timestamp=1351560630783, value=www.dataguru.cn
sunev_yu column=sites:name, timestamp=1351560664387, value=lianshuchengjin
sunev_yu column=user:age, timestamp=1351560606783, value=26
sunev_yu column=user:name, timestamp=1351560585193, value=yushuanghai
tigerfish column=msisdn:*#06#, timestamp=1351560873212, value=102
tigerfish column=msisdn:id, timestamp=1351560851244, value=15911112222
tigerfish column=sites:http, timestamp=1351562148765, value=www.itpub.net
tigerfish column=sites:name, timestamp=1351562171874, value=itpub
tigerfish column=user:age, timestamp=1351562118827, value=100
tigerfish column=user:name, timestamp=1351562102858, value=huangzhihong
3 row(s) in 0.4740 seconds
注:上面显示了18行的记录,但大家要明白根据hbase存储结构是以行键RowKey来划分的,因此才会显示3row(s),一个行键表示实际的一行。
(19)删除指定行键中某个列族:列
hbase(main):006:0>create 'user_business','msisdn','user','business' 创建一张表
0 row(s) in 2.8970 seconds
hbase(main):007:0>put'user_business','leonarding','business:type','E-mail' 插入第一条记录
0 row(s) in 0.1930 seconds
hbase(main):012:0>put'user_business','leonarding','user:name','liusheng' 插入第二条记录
0 row(s) in 0.1360 seconds
hbase(main):008:0>get'user_business','leonarding' 查看行键记录
COLUMN CELL
business:type timestamp=1351564724811, value=E-mail
user:name timestamp=1351565015611, value=liusheng
row(s) in 0.1330 seconds
hbase(main):009:0>delete 'user_business','leonarding','business:type' 删除指定的列族:列
0 row(s) in 0.0970 seconds
hbase(main):015:0>get'user_business','leonarding' 但不会影响其他列族:列
COLUMN CELL
user:name timestamp=1351565015611, value=liusheng
1 row(s) in 0.1420 seconds
(20)删除整行
hbase(main):020:0>deleteall 'user_business','leonarding' 删除指定行键的所有数据
0 row(s) in 0.0460 seconds
hbase(main):021:0>get'user_business','leonarding' 没有数据了
COLUMN CELL
0 row(s) in 0.0150 seconds
(21)查询表中有多少行记录
hbase(main):022:0>count 'heat_sites' 即行键数 leonarding sunev_yu tigerfish
3 row(s) in 0.1310 seconds
(22)清空表truncate
hbase(main):023:0>put'user_business','leonarding','user:name','liusheng' 插入一条记录
0 row(s) in 0.0960 seconds
hbase(main):024:0>get'user_business','leonarding' 已经有数据了
COLUMN CELL
user:name timestamp=1351567430718, value=liusheng
1 row(s) in 0.0150 seconds
hbase(main):025:0>truncate 'user_business' 截断表
Truncating 'user_business' table (it may take a while): 会需要一段时间
- Disabling table... 先禁用
- Dropping table... 再删除
- Creating table... 然后重建
0 row(s) in 8.2200 seconds
hbase(main):026:0> get'user_business','leonarding' 已经清空
COLUMN CELL
0 row(s) in 1.2120 seconds
Truncate表的处理过程:由于Hadoop的HDFS文件系统不允许直接修改,所以只能先删除表在重新创建已达到清空表的目的
小结:本章节详细记录了Hbase数据库集群的管理与应用,对于常用的建表、修改表、删除表、截断表、插入记录、修改记录、删除记录等操作进行了详细说明与操作。重要一点要掌握不能直接修改表的属性应该先禁用->修改->启动,这是Hbase的一个重要特性。