Hbase 学习笔记一 》Summary

Summary

In case you missed something along the way, here is a quick overview of the material

covered in this chapter.

HBase is a database designed for semistructured data and horizontal scalability. It

stores data in tables. Within a table, data is organized over a four-dimensional coordinate

system: rowkey, column family, column qualifier, and version. HBase is schema-less,

requiring only that column families be defined ahead of time. It’s also type-less, storing

all data as uninterpreted arrays of bytes. There are five basic commands for interacting

with data in HBase: Get, Put, Delete, Scan, and Increment. The only way to query

HBase based on non-rowkey values is by a filtered scan.6

HBase is not an ACID-compliant database6

HBase isn’t an ACID-compliant database. But HBase provides some guarantees that

you can use to reason about the behavior of your application’s interaction with the

system. These guarantees are as follows:

1 Operations are row-level atomic. In other words, any Put() on a given row

either succeeds in its entirety or fails and leaves the row the way it was

before the operation started. There will never be a case where part of the row

is written and some part is left out. This property is regardless of the number

of column families across which the operation is being performed.

2 Interrow operations are not atomic. There are no guarantees that all operations

will complete or fail together in their entirety. All the individual operations

are atomic as listed in the previous point.

3 checkAnd* and increment* operations are atomic.

4 Multiple write operations to a given row are always independent of each other

in their entirety. This is an extension of the first point.

5 Any Get() operation on a given row returns the complete row as it exists at

that point in time in the system.

6 A scan across a table is not a scan over a snapshot of the table at any point.

If a row R is mutated after the scan has started but before R is read by the

scanner object, the updated version of R is read by the scanner. But the data

read by the scanner is consistent and contains the complete row at the time

it’s read.

From the context of building applications with HBase, these are the important points

you need to be aware of.

The data model is logically organized as either a key-value store or as a sorted map of maps.

The physical data model is column-oriented along column families and individual records

are stored in a key-value style. HBase persists data records into HFiles, an immutable file

format. Because records can’t be modified once written, new values are persisted to

new HFiles. Data view is reconciled on the fly at read time and during compactions.

The HBase Java client API exposes tables via the HTableInterface. Table connections

can be established by constructing an HTable instance directly. Instantiating an

HTable instance is expensive, so the preferred method is via the HTablePool because it

manages connection reuse. Tables are created and manipulated via instances of the

HBaseAdmin, HTableDescriptor, and HColumnDescriptor classes. All five commands

are exposed via their respective command objects: Get, Put, Delete, Scan, and Increment.

Commands are sent to the HTableInterface instance for execution. A variant of

Increment is also available using the HTableInterface.incrementColumnValue()

method. The results of executing Get, Scan, and Increment commands are returned in

instances of Result and ResultScanner objects. Each record returned is represented

by a KeyValue instance. All of these operations are also available on the command line

via the HBase shell.

Schema designs in HBase are heavily influenced by anticipated data-access patterns.

Ideally, the tables in your schema are organized according to these patterns.

The rowkey is the only fully indexed coordinate in HBase, so queries are often implemented

as rowkey scans. Compound rowkeys are a common practice in support of

these scans. An even distribution of rowkey values is often desirable. Hashing algorithms

such as MD5 or SHA1 are commonly used to achieve even distribution.

 

你可能感兴趣的:(Hbase 学习笔记一 》Summary)