《hadoop the definitive way》(third version)中的Benchmarking a Hadoop Cluster Test Cases的class在新的版本中已不再试hadoop-*-test.jar, 新版本中做BanchMark Test应采用如下方法:
1. TestDFSIO
write
TestDFSIO用来测试HDFS的I/O 性能,用一个MapReduce job来并行读取/写入文件, 每个文件在一个独立的map task里被读取或写入,而map的输出用来收集该文件被执行过程中的统计数据,
test1 写入2个文件,每个10MB
%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles
2
-fileSize10
提交job时的consol输出:
13/11/13 01:59:06 INFO fs.TestDFSIO: TestDFSIO.1.7 13/11/13 01:59:06 INFO fs.TestDFSIO: nrFiles = 2 13/11/13 01:59:06 INFO fs.TestDFSIO: nrBytes (MB) = 10.0 13/11/13 01:59:06 INFO fs.TestDFSIO: bufferSize = 1000000 13/11/13 01:59:06 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 13/11/13 01:59:15 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 2 files 13/11/13 01:59:26 INFO fs.TestDFSIO: created control files for: 2 files 13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032 13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032 13/11/13 01:59:56 INFO mapred.FileInputFormat: Total input paths to process : 2 13/11/13 02:00:21 INFO mapreduce.JobSubmitter: number of splits:2 13/11/13 02:00:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384321503481_0003 13/11/13 02:00:34 INFO impl.YarnClientImpl: Submitted application application_1384321503481_0003 to ResourceManager at cluster1/172.16.102.201:8032 13/11/13 02:00:36 INFO mapreduce.Job: The url to track the job: http://cluster1:8888/proxy/application_1384321503481_0003/ 13/11/13 02:00:36 INFO mapreduce.Job: Running job: job_1384321503481_0003
从consol输出可以看到:
(1)最终文件默认会被写入id_data文件夹下的/benchmarks/TestDFSIO文件夹下, 通过test.build.data的系统变量可以修改默认设置。
(2)2个map task (number of splits:2), 同时也证明每一个文件的写入或读取都被单独作为一个map task
job跑完后的console输出:
13/11/13 02:08:15 INFO mapreduce.Job: map 100% reduce 100% 13/11/13 02:08:17 INFO mapreduce.Job: Job job_1384321503481_0003 completed successfully 13/11/13 02:08:21 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=174 FILE: Number of bytes written=240262 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=468 HDFS: Number of bytes written=20971595 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=63095 Total time spent by all reduces in occupied slots (ms)=14813 Map-Reduce Framework Map input records=2 Map output records=10 Map output bytes=148 Map output materialized bytes=180 Input split bytes=244 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=180 Reduce input records=10 Reduce output records=5 Spilled Records=20 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=495 CPU time spent (ms)=3640 Physical memory (bytes) snapshot=562757632 Virtual memory (bytes) snapshot=2523807744 Total committed heap usage (bytes)=421330944 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=224 File Output Format Counters Bytes Written=75 13/11/13 02:08:23 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 13/11/13 02:08:23 INFO fs.TestDFSIO: Date & time: Wed Nov 13 02:08:22 PST 2013 13/11/13 02:08:23 INFO fs.TestDFSIO: Number of files: 2 13/11/13 02:08:23 INFO fs.TestDFSIO: Total MBytes processed: 20.0 13/11/13 02:08:23 INFO fs.TestDFSIO: Throughput mb/sec: 0.5591277606933184 13/11/13 02:08:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 0.5635650753974915 13/11/13 02:08:23 INFO fs.TestDFSIO: IO rate std deviation: 0.05000733272172887 13/11/13 02:08:23 INFO fs.TestDFSIO: Test exec time sec: 534.566 13/11/13 02:08:23 INFO fs.TestDFSIO:
从图中可以看到map task 2, reduce task 1, 统计结果中有平均I/O速率,整体速率, job运行时间,写入文件数;
read
%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 2 -fileSize 10
就不仔细分析了,自己试试。
2. MapReduce Test with Sort
hadoop提供了一个MapReduce 程序,可以测试整个MapReduce System。此基准测试分三步:
# 产生random data
# sort data
# validate results
步骤如下:
1. 产生random data
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter random-data
用RandomWriter产生random data, 在yarn上运行RandomWriter会启动一个MapReduce job, 每个node上默认启动10个map task, 每个map 会产生1GB的random data.
修改默认参数: test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map
2. sort data
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort random-data sorted-data
3.validate results
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar testmapredsort –sortInput randomdata –sortOutput sorted-data
the command 会启动一个SortValidator 程序,此程序会做一些列检查例如检查unsorted和sorted data是否精确。
3. 其他Tests
MRBench –invoked by mrbench, 此程序会启动一个程序,运行多次
NNBench – invoked by nnbench, namenode上的负载测试
Gridmix --没兴趣