hadoop2.2编程:hadoop性能测试

 

《hadoop the definitive way》(third version)中的Benchmarking a Hadoop Cluster Test Cases 的class在新的版本中已不再是hadoop-*-test.jar, 新版本中做BanchMark Test应采用如下方法:

1. TestDFSIO

  • write

TestDFSIO用来测试HDFS的I/O 性能,用一个MapReduce job来并行读取/写入文件, 每个文件在一个独立的map task里被读取或写入,而map的输出用来收集该文件被执行过程中的统计数据,

  •  写入2个文件,每个10MB

    $yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar \
    
    TestDFSIO -write -nrFiles 2 -fileSize 10
  • 提交job时的consol输出:

 1 3/11/13 01:59:06 INFO fs.TestDFSIO: TestDFSIO.1.7

 2 13/11/13 01:59:06 INFO fs.TestDFSIO: nrFiles = 2

 3 13/11/13 01:59:06 INFO fs.TestDFSIO: nrBytes (MB) = 10.0

 4 13/11/13 01:59:06 INFO fs.TestDFSIO: bufferSize = 1000000

 5 13/11/13 01:59:06 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO

 6 13/11/13 01:59:15 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 2 files

 7 13/11/13 01:59:26 INFO fs.TestDFSIO: created control files for: 2 files

 8 13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032

 9 13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032

10 13/11/13 01:59:56 INFO mapred.FileInputFormat: Total input paths to process : 2

11 13/11/13 02:00:21 INFO mapreduce.JobSubmitter: number of splits:2

12 13/11/13 02:00:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384321503481_0003

13 13/11/13 02:00:34 INFO impl.YarnClientImpl: Submitted application application_1384321503481_0003 to ResourceManager at cluster1/172.16.102.201:8032

14 13/11/13 02:00:36 INFO mapreduce.Job: The url to track the job: http://cluster1:8888/proxy/application_1384321503481_0003/

15 13/11/13 02:00:36 INFO mapreduce.Job: Running job: job_1384321503481_0003
  • 从consol输出可以看到:

(1)最终文件默认会被写入hdfs里的/benchmarks/TestDFSIO文件夹下, benchmarks文件夹默认位于hdfs里当前用户下面,此处位于/user/grid/下面,通过test.build.data的系统变量可以修改默认设置。

(2)2个map task (number of splits:2), 同时也证明每一个文件的写入或读取都被单独作为一个map task

  • job跑完后的console输出:

13/11/13 02:08:15 INFO mapreduce.Job:  map 100% reduce 100%

13/11/13 02:08:17 INFO mapreduce.Job: Job job_1384321503481_0003 completed successfully

13/11/13 02:08:21 INFO mapreduce.Job: Counters: 43

    File System Counters

        FILE: Number of bytes read=174

        FILE: Number of bytes written=240262

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=468

        HDFS: Number of bytes written=20971595

        HDFS: Number of read operations=11

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=4

    Job Counters

        Launched map tasks=2

        Launched reduce tasks=1

        Data-local map tasks=2

        Total time spent by all maps in occupied slots (ms)=63095

        Total time spent by all reduces in occupied slots (ms)=14813

    Map-Reduce Framework

        Map input records=2

        Map output records=10

        Map output bytes=148

        Map output materialized bytes=180

        Input split bytes=244

        Combine input records=0

        Combine output records=0

        Reduce input groups=5

        Reduce shuffle bytes=180

        Reduce input records=10

        Reduce output records=5

        Spilled Records=20

        Shuffled Maps =2

        Failed Shuffles=0

        Merged Map outputs=2

        GC time elapsed (ms)=495

        CPU time spent (ms)=3640

        Physical memory (bytes) snapshot=562757632

        Virtual memory (bytes) snapshot=2523807744

        Total committed heap usage (bytes)=421330944

    Shuffle Errors

        BAD_ID=0

        CONNECTION=0

        IO_ERROR=0

        WRONG_LENGTH=0

        WRONG_MAP=0

        WRONG_REDUCE=0

    File Input Format Counters

        Bytes Read=224

    File Output Format Counters

        Bytes Written=75

13/11/13 02:08:23 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write

13/11/13 02:08:23 INFO fs.TestDFSIO:            Date & time: Wed Nov 13 02:08:22 PST 2013

13/11/13 02:08:23 INFO fs.TestDFSIO:        Number of files: 2

13/11/13 02:08:23 INFO fs.TestDFSIO: Total MBytes processed: 20.0

13/11/13 02:08:23 INFO fs.TestDFSIO:      Throughput mb/sec: 0.5591277606933184

13/11/13 02:08:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 0.5635650753974915

13/11/13 02:08:23 INFO fs.TestDFSIO:  IO rate std deviation: 0.05000733272172887

13/11/13 02:08:23 INFO fs.TestDFSIO:     Test exec time sec: 534.566

13/11/13 02:08:23 INFO fs.TestDFSIO:

 

  • 从图中可以看到map task 2, reduce task 1, 统计结果中有平均I/O速率,整体速率, job运行时间,写入文件数;

  • read

    $yarn jar \
    
    share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar \
    
    TestDFSIO -read  -nrFiles 2 -fileSize 10

    就不仔细分析了,自己试试。

2. MapReduce Test with Sort

hadoop提供了一个MapReduce 程序,可以测试整个MapReduce System。此基准测试分三步:

  1. 产生random data

  2. sort data

  3. validate results

步骤如下:

  • 产生random data

    $yarn jar \
    
    share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar\ randomwriter random-data

        用RandomWriter产生random data, 在yarn上运行RandomWriter会启动一个MapReduce job, 每个node上默认启动10个map task, 每个map 会产生1GB的random data.    

        修改默认参数: test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map

  • sort data

    $yarn jar \
    
    share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar \
    
    sort random-data sorted-data
    
    #the command 会启动一个SortValidator 程序,
    
    #此程序会做一些列检查例如检查unsorted和sorted data是否精确

     

3. 其他Tests

  • MRBench –invoked by mrbench, 此程序会启动一个程序,运行多次

  • NNBench – invoked by nnbench, namenode上的负载测试

  • Gridmix  --暂时没兴趣

                                                                                                                    (完)

你可能感兴趣的:(hadoop2)