CanChen
[email protected]
Spark on Hadoop vs MPI/OpenMP on Beowulf
Today I read this paper just to extend my knowledge scope.
- Stark and OpenMPI/MPI are two cluster computing frameworks. Spark is good at handling fault tolerance support and data replication while OpenMP/MPI is designed to maxmize high performance computing.
- GCP: Google Cloud Platform
- Core data units in Spark: Resilient Distributed Datasets(RDDs)
- Hadoop: programming model--mapreduce(mapping: data processing locally; shuffling: data redistribution over network; reduction: data summarization); a distributed file system(HDFS); a cluster manager--YARN(handling resources and job scheduling).
- MPI:Message Passing Interface, a communication protocol supporting point to point and collective communication. Disads: do not support fault tolerance and not suitable for small grain level of parallelism.
- OpenMP: user-friendly interface allows easy parallelizing complex algorithms; support small grain parallelism.