MapReduce的适用场合

 

MapReduce is a good fit for problems

that need to analyze the whole dataset, in a batch fashion, particularly for ad hoc analysis.

 

MapReduce suits applications where the data is written once, and read many

times, whereas a relational database is good for datasets that are continually updated.

 

MapReduce works well on unstructured or semistructured

data, since it is designed to interpret the data at processing time.

 

MapReduce is a linearly scalable programming model.

 

but becomes a problem when nodes need to

access larger data volumes (hundreds of gigabytes, the point at which MapReduce really

starts to shine), since the network bandwidth is the bottleneck, and compute nodes

become idle.

MapReduce tries to colocate the data with the compute node, so data access is fast

since it is local.* This feature, known as data locality, is at the heart of MapReduce and

is the reason for its good performance.

MPI gives great control to the programmer, but requires that he or she explicitly handle

the mechanics of the data flow, exposed via low-level C routines and constructs, such

as sockets, as well as the higher-level algorithm for the analysis. MapReduce operates

only at the higher level: the programmer thinks in terms of functions of key and value

pairs, and the data flow is implicit.

MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated

hardware running in a single data center with very high aggregate bandwidth interconnects.

你可能感兴趣的:(mapreduce,C++,c,Access,performance)