map-reduce编程核心问题



1-How do we break up a large problem into smaller tasks? More speci cally, how do
we decompose the problem so that the smaller tasks can be executed in parallel?

2- How do we assign tasks to workers distributed across a potentially large number
of machines (while keeping in mind that some workers are better suited to running
some tasks than others, e.g., due to available resources, locality constraints, etc.)?

3-How do we ensure that the workers get the data they need?

4-How do we coordinate synchronization among the di erent workers?

5-How do we share partial results from one worker that is needed by another?

6- How do we accomplish all of the above in the face of software errors and hardware
faults?

你可能感兴趣的:(hadoop,map,reduce)