Hadoop MapReduce 学习

MapReduce

1. Map: (K, V) -> (K', V')

2. 合并相同K'值的项
(K', V') -> (K', V' *)

3. Reduce
(K', V'*) ->   新的(K,V)

Hadoop实现自动的 Fail-over, Redistribute M/R tasks

Hadoop为Master/Slave结构

1 Master ( Job Tracker) / M Slave (Task Tracker)


HDFS
文件分成固定块大小,“Write Once", 只能有一个写
HDFS也是Master/Salve结构

1 NameNode / N DataNode

其他相关资源:

Distributed Systems课程 http://www.cs.brandeis.edu/~cs147a/
对HBase的评测: www.cs.duke.edu/~kcd/hadoop/
Cloud9: A library for Hadoop http://www.umiacs.umd.edu/~jimmylin/cloud9/umd-hadoop-dist/cloud9-docs/index.html
Cloud Computing课程: http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html
UCSD的Network Services课程: http://www-cse.ucsd.edu/classes/fa07/cse124/assignments.html
Virginia的入门课 http://www.cs.virginia.edu/~cbs6n/hadoop/

Hadoop的监控 http://www.x-trace.net/wiki/doku.php

Scaling up Hadoop (超强,改善Hadoop) http://www.cs.washington.edu/homes/ak/clusterworkshop/slides/YahooHadoopDISC08.pdf
(此人开了咨询公司 http://www.spinnakerlabs.com/ )
Washington的课程 http://www.cs.washington.edu/education/courses/cse490h/07sp/index.html

Distributed Systems课程 http://www.cs.williams.edu/~jeannie/cs339/index.html
http://pages.cs.wisc.edu/~dusseau/Classes/CS739/index.html
Parrallel Processing http://www.cs.colostate.edu/~cs575dl/

你可能感兴趣的:(mapreduce)