The shell opens a connection to HBase and greets you with a prompt. With the shell
prompt ahead of you, create your first table:
A word about Java
The vast majority of code used in this book is written in Java. We use pseudo-code
here and there to help teach concepts, but the working code is Java. Java is a practical
reality of using HBase. The entire Hadoop stack, including HBase, is implemented
in Java. The HBase client library is Java. The MapReduce library is Java. An HBase
deployment requires tuning the JVM for optimal performance. But there are means
for interacting with Hadoop and HBase from non-Java and non-JVM languages. We
cover many of these options in chapter 6.
1.Starting from scratch
$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012 hbase(main):001:0>
Presumably 'users' is the name of the table, but what about this 'info' business?
Just like tables in a relational database, tables in HBase are organized intorowsandcolumns.
HBase treats columns a little differently than a relational database. Columns in
HBase are organized into groups calledcolumn families.info is a column family in the
users table. A table in HBase must have at least one column family. Among other
things, column families impact physical characteristics of the data store in HBase. For
this reason, at least one column family must be specified at table creation time. You
can alter column families after the table is created, but doing so is a little tedious.
We’ll discuss column families in more detail later. For now, know that your users table
is as simple as it gets—a single column family with default parameters.
2.Examine table schema
If you’re familiar with relational databases, you’ll notice right away that the table creation
didn’t involve any columns or types. Other than the column family name, HBase
doesn’t require you to tell it anything about your data ahead of time. That’s why HBase
is often described as a schema-less database.
You can verify that your users table was created by asking HBase for a listing of all
registered tables:
hbase(main):002:0> list TABLE users 1 row(s) in 0.0220 seconds hbase(main):003:0>
The list command proves the table exists, but HBase can also give you extended
details about your table. You can see all those default parameters using the describe
command:
hbase(main):003:0> describe 'users' DESCRIPTION ENABLED {NAME => 'users', FAMILIES => [{NAME => 'info', true BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0 ', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR Y => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0330 seconds hbase(main):004:0>
The shell describes your table as a map with two properties: the table name and a list
of column families. Each column family has a number of associated configuration
3. Establish a connection
The shell is well and good, but who wants to implement TwitBase in shell commands?
Those wise HBase developers thought of this and equipped HBase with a complete
Java client library. A similar API is exposed to other languages too; we’ll cover those in
chapter 6. For now, we’ll stick with Java. The Java code for opening a connection to
the users table looks like this:
HTableInterface usersTable = new HTable("users");
The HTable constructor reads the default configuration information to locate HBase,
similar to the way the shell did. It then locates the users table you created earlier and
gives you a handle to it.
You can also pass a custom configuration object to the HTable object:
Configuration myConf = HBaseConfiguration.create(); HTableInterface usersTable = new HTable(myConf, "users");
This is equivalent to letting the HTable object create the configuration object on its
own. To customize the configuration, you can define parameters like this:
myConf.set("parameter_name", "parameter_value");
HBase client configuration
HBase client applications need to have only one configuration piece available to them
to access HBase—the ZooKeeper quorum address. You can manually set this configuration
like this:
myConf.set("hbase.zookeeper.quorum", "serverip");
Both ZooKeeper and the exact interaction between client and the HBase cluster are
covered in the next chapter where we go into details of HBase as a distributed store.
For now, all you need to know is that the configuration parameters can be picked either
by the Java client from the hbase-site.xml file in their classpath or by you setting
the configuration explicitly in the connection. When you leave the configuration completely
unspecified, as you do in this sample code, the default configuration is read
and localhost is used for the ZooKeeper quorum address. When working in local
mode, as you are here, that’s exactly what you want.
Connection management
Creating a table instance is a relatively expensive operation, requiring a bit of network
overhead. Rather than create a new table handle on demand, it’s better to use a
Closing the table when you’re finished with it allows the underlying connection
resources to be returned to the pool.
What good is a table without data in it? No good at all. Let’s store some data.
connection pool. Connections are allocated from and returned to the pool. Using anHTablePool is more common in practice than instantiating HTables directly: HTablePool pool = new HTablePool(); HTableInterface usersTable = pool.getTable("users"); ... // work with the table usersTable.close();