hibench适配3.1.1

参数说明

/*
 * 
  • read or write test
  • *
  • date and time the test finished
  • *
  • number of files
  • *
  • total number of bytes processed
  • *
  • throughput in mb/sec (total number of bytes / sum of processing times)
  • *
  • average i/o rate in mb/sec per file
  • *
  • standard deviation of i/o rate
  • * */
    • Number of files (处理的文件个数,每个都对应一个map任务数)
    • throughput(计算公式如下)

    T h r o u g h p u t ( N ) = ∑ i = 0 N f i l e s i z e i ∑ i = 0 N t i m e i Throughput(N)=\frac{\sum_{i=0}^{N}{filesize_i}}{\sum_{i=0}^{N}{time_i}} Throughput(N)=i=0Ntimeii=0Nfilesizei

    • average i/o rate(计算公式如下)可以看到这个是单个文件

    A v e r a g e I O r a t e ( N ) = ∑ i = 0 N r a t e i N = ∑ i = 0 N f i l e s i z e i t i m e i N Average IO rate(N)=\frac{\sum_{i=0}^{N}{rate_i}}{N}=\frac{\sum_{i=0}^{N}{\frac{filesize_i}{time_i}}}{N} AverageIOrate(N)=Ni=0Nratei=Ni=0Ntimeifilesizei

    • concurrent throughput(并发平均吞吐)

    C o n c u r r e n t T h r o u g h p u t = T h r o u g h p u t ( N ) ∗ N = ∑ i = 0 N f i l e s i z e i ∑ i = 0 N t i m e i ∗ N Concurrent Throughput=Throughput(N)*N=\frac{\sum_{i=0}^{N}{filesize_i}}{\sum_{i=0}^{N}{time_i}}*N ConcurrentThroughput=Throughput(N)N=i=0Ntimeii=0NfilesizeiN

    • concurrent average IO rate(并发平均IO)

    C o n c u r r e n t A v e r a g e I O r a t e = A v e r a g e I O r a t e ( N ) ∗ N = ∑ i = 0 N f i l e s i z e i t i m e i Concurrent Average IO rate=Average IO rate(N)*N=\sum_{i=0}^{N}{\frac{filesize_i}{time_i}} ConcurrentAverageIOrate=AverageIOrate(N)N=i=0Ntimeifilesizei

    上述2个公式N表示的是集群mapslot数目,本次测试的时候,为6

    • N计算公式(mapred.tasktracker.map.tasks.maximum缺省为2,mapred.tasktracker.reduce.tasks.maximum缺省为2,maxreduces为你mapreduce集群中集群的机器的数量)

    N = m a p S l o t s = m a p r e d . t a s k t r a c k e r . m a p . t a s k s . m a x i m u m ∗ m a x r e d u c e s m a p r e d . t a s k t r a c k e r . r e d u c e . t a s k s . m a x i m u m N=mapSlots=mapred.tasktracker.map.tasks.maximum*\frac{maxreduces}{mapred.tasktracker.reduce.tasks.maximum} N=mapSlots=mapred.tasktracker.map.tasks.maximummapred.tasktracker.reduce.tasks.maximummaxreduces

    环境信息

    • hdp 3.1.4
    • os CentOS 7.6
    • hibench

    hadoop-2.7.3hadoop-3.1.1跨度较大,部分接口有变动,因此需要修改hibench来支持测试

    // pom.xml 修改hadoop.mr2.version为3.1.1
    <hadoop.mr2.version>3.1.1</hadoop.mr2.version>
    // TestDFSIOEnh.java移除copyMerge接口
    
    -			 FileUtil.copyMerge(fs, DfsioeConfig.getInstance().getReportDir(fsConfig), fs, DfsioeConfig.getInstance().getReportTmp(fsConfig), false, fsConfig, null);
    -			 FileUtil.
    -			 LOG.info("remote report file " + DfsioeConfig.getInstance().getReportTmp(fsConfig) + " merged.");
    +			 BufferedReader lines = new BufferedReader(new InputStreamReader(fs.open(new Path(DfsioeConfig.getInstance().getReportDir(fsConfig),"part-r-00000"))));
    //TestDFSIO.java FileInputFormat log接口依赖在2.7中为org.apache.commons.logging.Log,3.1.1中变更为org.slf4j.Logger,修改log处理方式
    - private static final Log LOG = FileInputFormat.LOG;
    + private static final Log LOG = (Log) FileInputFormat.LOG;
    

    编译:

     ~/apache-maven-3.5.4/bin/mvn -Phadoopbench -Dspark=2.2 -Dscala=2.11 clean package -X
    

    你可能感兴趣的:(大数据,java)