HBase以表的形式存储数据,每个表由行和列组成,每个列属于一个特定的列族(Column Family)。表中由行列确定的存储单元称为一个元素(Cell),每个元素保存了同一份数据的多个版本,由时间戳来标识。
下面就从安装开始...........
1、下载与安装
选择一个 Apache 下载镜像,下载 HBase Releases. 点击 stable
目录,然后下载后缀为 .tar.gz
的文件; 例如 hbase-0.95-SNAPSHOT.tar.gz
.
解压缩,然后进入到那个要解压的目录.
$ tar xfz hbase-0.95-SNAPSHOT.tar.gz
$ cd hbase-0.95-SNAPSHOT
编辑 conf/hbase-site.xml(以下配置均为伪分布模式配置)
- <configuration>
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://localhost:9000/hbase</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
编辑conf/hbase-env.xml,添加一行JAVA目录
- export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_40
运行HBase
伪分布模式下运行方式:
$ bin/start-hbase.sh
进入HBase shell之中
$ bin/hbase shell
如下所示:
root@ubuntu:~/hbase/hbase-0.94.18# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.18, r1577788, Sat Mar 15 04:46:47 UTC 2014
hbase(main):001:0>
退出:
hbase(main):001:0> exit
关闭hbase
$ bin/stop-hbase.sh
MapReduce与HBase结合使用的例子;
改写之前的WordCount代码,将其结果写入到HBase中;
所使用的JAVA API
1、HBaseConfiguration
对HBase进行配置,以及初始化;
用法:
HBaseConfiguration config = new HBaseConfiguration();
2、HBaseAdmin
可以用来对添加,删除表格等等操作。
用法:
HBaseAdmin admin = new HBaseAdmin(config);
3、HTableDescriptor
HTableDescriptor 类包含了表的名字及其表的列表。可以用来添加列,删除列等等。
用法:
HTableDescriptor htd = new HTableDescriptor(tablename);
4、HColumnDescriptor
HColumnDescriptor 类维护着列的信息。
用法:
HColumnDescriptor col = new HColumnDescriptor("content");
5、Put
对单个行进行添加操作;
用法:
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));
代码:
- import java.io.IOException;
- import java.util.StringTokenizer;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.HColumnDescriptor;
- import org.apache.hadoop.hbase.HTableDescriptor;
- import org.apache.hadoop.hbase.client.HBaseAdmin;
- import org.apache.hadoop.hbase.client.Put;
- import org.apache.hadoop.hbase.util.Bytes;
- import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
- import org.apache.hadoop.hbase.mapreduce.TableReducer;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.NullWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
-
- public class WordCountHbase {
-
- public static class MapClass
- extends Mapper<LongWritable, Text, Text, IntWritable> {
-
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(LongWritable key, Text value,
- Context context ) throws IOException,
- InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()){
- word.set(itr.nextToken());
- context.write(word,one);
- }
- }
- }
-
- public static class Reduce extends TableReducer<Text, IntWritable, NullWritable> {
-
- public void reduce(Text key, Iterable<IntWritable> values,
- Context context) throws IOException,InterruptedException {
-
- int count = 0;
- for (IntWritable val : values) {
- count += val.get();
- }
- Put put = new Put(Bytes.toBytes(key.toString()));
- put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));
- context.write(NullWritable.get(), put);
- }
- }
-
- public static void createHBaseTable(String tablename)throws IOException{
- HTableDescriptor htd = new HTableDescriptor(tablename);
- HColumnDescriptor col = new HColumnDescriptor("content");
- htd.addFamily(col);
- HBaseConfiguration config = new HBaseConfiguration();
- HBaseAdmin admin = new HBaseAdmin(config);
- if(admin.tableExists(tablename)){
- System.out.println("table exists, trying recreate table! ");
- admin.disableTable(tablename);
- admin.deleteTable(tablename);
- }
- System.out.println("create new table: " + tablename);
- admin.createTable(htd);
- }
-
- public static void main(String[] args) throws Exception {
- String tablename = "wordcount";
- Configuration conf = new Configuration();
- conf.set(TableOutputFormat.OUTPUT_TABLE,tablename);
- createHBaseTable(tablename);
-
- Job job = new Job(conf, "word count table with"+ args[0]);
- job.setJarByClass(WordCountHbase.class);
- job.setMapperClass(MapClass.class);
-
- job.setReducerClass(Reduce.class);
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TableOutputFormat.class);
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
- }
这里编译的时候,需要将Hbase目录下的hbase-0.94.18.jar、 zookeeper-3.4.5.jar、protobuf-java-2.4.0a.jar、guava-11.0.2.jar 拷贝到hadoop/lib目录下。不然就会报一堆错误。
编译指令:
$ javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar:lib/commons-logging-api-1.0.4.jar:hbase-0.94.18.jar:zookeeper-3.4.5.jar -d practise/WordCountHbase/classes practise/WordCountHbase/src/WordCountHbase.java
打包之后,再运行:
$ bin/hadoop jar practise/WordCountHbase/WordCountHbase.jar WordCountHbase 1.txt
这里的1.txt中存有
hello world
hello hadoop
之后再到HBase里查看,Mapreduce是否已经把结果写入到表中;
hbase(main):001:0> list
TABLE
tab1
wordcount
2 row(s) in 8.3730 seconds
hbase(main):002:0> scan 'wordcount'
ROW COLUMN+CELL
hadoop column=content:count, timestamp=1397376976400, value=1
hello column=content:count, timestamp=1397376976400, value=2
world column=content:count, timestamp=1397376976400, value=1
3 row(s) in 0.5460 seconds
hbase(main):003:0>
以下是一些使用Hbase报错的问题解决方法
hadoop+hbase导致报错
HBase异常:hbase-default.xml file seems to be for and old version of HBase的解决方法
HBase MapReduce实例分析
编写MR运行在Hbase上面注意事项
hive、hbase整合后,reduce过程总找不到zookeeper问题解决