Introduction of Hadoop/ MapReduce

What is MapReduce ?

Parallel programming model for big data processing:

split data> chunks

define steps to process chunks

process the chunks parallelly

    Hadoop is a platform implements MapReduce . 

1. Map

 ->

eg:   -> < word, count>

After mapping, the oupput is passed to Reduce part

2. Reduce

Merge/Reduce the output of Mapping phase, which is optional .

The output of MapReduce could be printed, Summed, Counted , loaded to DB or sent to next MapReduce job


Introduction of Hadoop/ MapReduce_第1张图片


Introduction of Hadoop/ MapReduce_第2张图片
Introduction of Hadoop/ MapReduce_第3张图片
Introduction of Hadoop/ MapReduce_第4张图片

Idea: MapReduce , massive unstructured data storage

Physical: Jave classes for and The Hadoop Distributed file System

Hadoop Operational Modes

Java MapReduce Mode: read record incrementally

Streaming Mode: Any language, input can be a line or stream


Introduction of Hadoop/ MapReduce_第5张图片
MapReduce and HDFS

Query Languages for Hadoop

Builds on core Hadoop to enhanve the development and manpulation of Hadoop cluster

Pig:Data flow language and execution enviroment

Hive(HiveQL) Query language based on SQL for building MapReduced jobs

HBase  Column oriented database 


Introduction of Hadoop/ MapReduce_第6张图片

Pig(Data flow language in Latin)

2 Execution environment modes:

Local flie system

MapReduce in Hadoop environment

Suitable for large dataset and batch processing

Introduction of Hadoop/ MapReduce_第7张图片

你可能感兴趣的:(Introduction of Hadoop/ MapReduce)