第一步:数据准备要准备2份数据 一份key-value形式的,一份非key-value的形式
key-value准备,写了个py脚本:
import random import string a='abcdefghijklmnopqrstuvwxyz' alist=list(a) blist=range(0,10) f=open('testdata.txt','wb') flag=True j=0 while flag: astr=''.join(str(i) for i in random.sample(alist,5)) bstr=''.join(str(i) for i in random.sample(blist,5)) #num j 决定生成数据的行数 if j==20000000 : flag=False f.write("%s\t%s\n"%(astr,bstr)) j+=1
将数据导入HDFS
hadoop fs -put testdata.txt /test/input/
另一份数据有hadoop-exmaple.jar里面的randomwriter生成
cd /usr/lib/hadoop/
hadoop jar hadoop-exmaple.jar randomwriter /test/input1/
第二步:执行测试
MRReliabilityTest:
hadoop jar hadoop-test.jar MRReliabilityTest -libjars hadoop-examples.jar
loadgen:
Usage: [-m <maps>] [-r <reduces>]
[-keepmap <percent>] [-keepred <percent>]
[-indir <path>] [-outdir <path]
[-inFormat[Indirect] <InputFormat>] [-outFormat <OutputFormat>]
[-outKey <WritableComparable>] [-outValue <Writable>]
可以根据情况设置参数
hadoop jar hadoop-test.jar loadgen -m 6 -r 3 -indir /test/input/ -outdir /test/output/
mapredtest:
Usage: TestMapRed <range> <counts>
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar mapredtest 2 10
testarrayfile:
Usage: TestArrayFile [-count N] [-nocreate] [-nocheck] file
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testarrayfile -count 4 /test/input/testdata.txt
testsequencefile:
Usage: SequenceFile [-count N] [-seed #] [-check] [-compressType <NONE|RECORD|BLOCK>] -codec <compressionCodec> [[-rwonly] | {[-megabytes M] [-factor F] [-nocreate] [-fast] [-merge]}] file
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsequencefile -count 4 -check True -fast True /test/input/testdata.txt
testsetfile:
Usage: TestSetFile [-count N] [-nocreate] [-nocheck] [-compress type] file
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsetfile -count 4 /test/input/testdata.txt
threadedmapbench:
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar threadedmapbench
testfilesystem:
Usage: TestFileSystem -files N -megaBytes M [-noread] [-nowrite] [-noseek] [-fastcheck]
hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar -file 1 -megaBytes 1000
testmapredsort:
sortvalidate [-m <maps>] [-r <reduces>] [-deep] -sortInput <sort-input-dir> -sortOutput <sort-output-dir>
hadoop jar hadoop-test.jar -m 10 -r 5 -sortInput /test/input/ -sortOutpur /test/output
testbigmapoutput:
BigMapOutput -input <input-dir> -output <output-dir> [-create <filesize in MB>]hadoop jar hadoop-test.jar testbigmapoutput -input /test/input1/ -output /test/output1/
TestDFSIO基准测试HDFS
测试顺序应该是先写测试后读测试
Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]
写测试:
使用10个map任务写10个文件,每个500m。
hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 /tmp/TestDFSIO_log.txt
在运行的最后,结果被写入控制台并记录到路径/tmp/TestDFSIO_log.txt。
数据默认写入 /benchmarks/TestDFSIO目录下
读测试:
hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -read-nrFiles 10 -fileSize 1000 /tmp/TestDFSIO_log.txt
清除测试数据:
hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -clean
namenode 基准测试:
12个mapper和6个reducer来创建1000个文件
hadoop jar hadoop-test.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`
mapreduce 基准测试:
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效
运行一个小作业50次
hadoop jar hadoop-test.jar mrbench -numRuns 50
testipc和tectrpc:
hadoop jar hadoop-test.jar testipc
hadoop jar hadoop-test.jar testrpc
PS:命令参数选择和设计可以根据硬件环境的设定
一些错误解决办法:
目的文件夹已存在:删除目标文件夹,再重跑相关命令
java heapsize不足:调高相应参数,或者跑任务之前参数设置多点maptask和reducetask