hadoop2.2

Resource

git clone https://github.com/tomwhite/hadoop-book.git -b 3e

 

Apache Hadoop and the Hadoop Ecosystem

The Hadoop projects that are covered in this book are described briefly here:

  • Common

A set of components and interfaces for distributed filesystems and general I/O (serialization, Java RPC, persistent data structures).

  • Avro

A serialization system for efficient, cross-language RPC and persistent data storage.

  • MapReduce

A distributed data processing model and execution environment that runs on large clusters of commodity machines.

  • HDFS

A distributed filesystem that runs on large clusters of commodity machines.

  • Pig

A data flow language and execution environment for exploring very large datasets. Pig runs on HDFS and MapReduce clusters.

  • Hive

A distributed data warehouse. Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for querying the data.

  • HBase

A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads).

  • ZooKeeper

A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed applications.

  • Sqoop

A tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS.

  • Oozie

 

A service for running and scheduling workflows of Hadoop jobs (including Map- Reduce, Pig, Hive, and Sqoop jobs).

 

Installation

Download

 http://codesfusion.blogspot.hk/2013/10/setup-hadoop-2x-220-on-ubuntu.html

% tar xzf hadoop-x.y.z.tar.gz
% export HADOOP_INSTALL=/home/tom/hadoop-x.y.z
% export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

hadoop version

你可能感兴趣的:(hadoop2)