HBase Tutorial: Theory and Practice of a Distributed Data Store (2)

Non-Relational Databases

They originally do not support SQL

(1).In practice, this is becoming a thin line to make the distinction.

(2).One difference is in the data model.

(3).Another difference is in the consistency model(ACID and transactions are generally sacrificed).

Consistency models and the CAP theorem

Strict: all changes to data are atomic.

Sequential: changes to data are seen in the same order as they were applied.

Causal: causally related changes are seen in the same order.

Eventual: updates propagates through the system and replicas when in steady state.

Weak: no guarantee.

Data model:

How the data is stored: key/value, semi-structured, column-oritened,…

Consistency model: This translates in how fast the system handles READS and WRITES.

Atomic read-modify-write

(1).Easy in a centralized system, difficult in a distributed one.

(2).Prevent race conditions in multi-threaded or shared-nothing designs.

(3).Can reduces client-side complexity.

(4).Support for multiple clients accessing data simultaneously.

Database Normalization

Schema design at scale

(1).A good methodology is to apply the DDI principle

      Denormalization

      Duplication

      Intelligent Key design

Denormalization

     Duplicate data in more than one table such that at READ time no further aggregation is required.

What is BigTable?

      BigTable is a distributed storage system for managing structured data designed to scale to a very large size

      BigTable is a sparse,distributed, persistent multi-dimensional sorted map

What is HBase?

      Essentially it’s an open-source version of BigTable

The most basic unit in HBase is a column

(1).Each column may have multiple versions, with each distinct value contained in a separate cell

(2).One or more columns form a row,that is addressed uniquely by a row key.

你可能感兴趣的:(分布式数据存储,Hadoop)