Tutorials for HBase: concepts, architecture, mapreduce, etc.

I still remember my 'column family' aha moment two years ago.  It's been a quite challenging journey to travel from RDBMS to BigTable. Here are some good materials to get you started:

  • Treat ColumnFamily as multi-dimensional maps  is a great way to migrate existing knowledge to new field. I especially like his way to explain how rowkey , family , qualifier works. 
  • HBase schema design model : another concrete examples comparing solving the same data model using RDBMS and HBase.
  • WTF is a SuperColumn? An Intro to the Cassandra Data Model
More in-depth information to get started with HBase
  • HBase shell and 0.18 programming API :  A bit out-of-date usage of API but the concepts were still valid. 
  • Official HBase Architecture : different from the one below, this one focus on physical design of data location, etc. A must read for serious hbase performance tuning and a sound schema design.  The "descending" byte order of the physical layout is a key to understand "pagination" link below.
  • HBase Archtiecture: Storage : In-depth article on how hbase uses hdfs and region server communication details. 
  • HBase pagination like SQL's LIMIT/OFFSET : the key is to create the composite key and use a scanner to show the results within a range using old-faithful counter. 
Use HBase with Hadoop mapreduce:
  • HBase MapReduce 101 - Part I
  • Use HBase as input and output for Hadoop MapReduce tasks

你可能感兴趣的:(Tutorials for HBase: concepts, architecture, mapreduce, etc.)