Benchmarking Big Data Systems: A Review阅读笔记

论文原文链接

摘要

随着大数据系统近年来的飞速发展,各种开源的基准测试被设计出来比较和评估这些系统的性能,并促进了他们性能的提升。文章首先给出了流行的benchmark的总览。并且总结出benchmark侧重测试的三个方面:

  1. workload generation techniques 负载生成技术
  2. workload input data generation techniques 输入负载生成技术
  3. metrics 度量标准。

当前主流的大数据系统主要有三种:

  1. Hadoop and its related systems
  2. data stores(database management systems (DBMSs) and NoSQL)
  3. specialized systems(connected graphs, continu- ous streams, and complex scientific data)
    具体参考下图(本文图表引自文章原文)


    Figure 1. Overview of big data systems
Table 1. Overview of the State-of-the-Art Open Source Big Data Benchmarks

当前存在的benchmark可以主要分为三大类:

  1. Micro benchmarks. 用于评估单个系统组件或特定系统行为,常见的有Word count, NNBench, TestDFSIO等
  2. End to end. 使用典型的应用场景评估整个系统,每个场景对应一组相关的工作负载,常见的有TPC(Transaction Processing Performance Council)提供的一系列OLTP(On-Line Transaction Processing)查询
  3. Benchmark suites. 多个1和2的组合,常见的有HiBench, CloudSuite, BigDataBench


    Figure 2. Advent of big data benchmarks: A timeline

常见的NoSQL类型及例子:

  1. key/ value stores (e.g., Amazon Dynamo, Cassandra, Linkedin Voldemort)
  2. column-oriented databases (e.g., BigTable and Hypertable)
  3. document- oriented stores (e.g., CouchDB and MongoDB)

针对图数据的两种系统:

  1. graph databases such as Neo4j
  2. distributed graph processing systems such as Google Pregel

你可能感兴趣的:(Benchmarking Big Data Systems: A Review阅读笔记)