hadoop基准测试

第一步:数据准备要准备2份数据 一份key-value形式的,一份非key-value的形式

key-value准备,写了个py脚本:

import random
import string
a='abcdefghijklmnopqrstuvwxyz'
alist=list(a)
blist=range(0,10)
f=open('testdata.txt','wb')
flag=True
j=0
while flag:
        astr=''.join(str(i) for i in random.sample(alist,5))
        bstr=''.join(str(i) for i in random.sample(blist,5))
#num j 决定生成数据的行数
        if j==20000000 :
                flag=False

        f.write("%s\t%s\n"%(astr,bstr))
        j+=1

将数据导入HDFS

      hadoop fs -put testdata.txt /test/input/

另一份数据有hadoop-exmaple.jar里面的randomwriter生成

      cd /usr/lib/hadoop/

      hadoop jar hadoop-exmaple.jar randomwriter /test/input1/

 

第二步:执行测试

MRReliabilityTest:

 

hadoop jar hadoop-test.jar MRReliabilityTest -libjars hadoop-examples.jar

 

 

 

loadgen:

Usage: [-m <maps>] [-r <reduces>]
       [-keepmap <percent>] [-keepred <percent>]
       [-indir <path>] [-outdir <path]
       [-inFormat[Indirect] <InputFormat>] [-outFormat <OutputFormat>]
       [-outKey <WritableComparable>] [-outValue <Writable>]

可以根据情况设置参数

 

hadoop jar hadoop-test.jar loadgen -m 6 -r 3 -indir /test/input/ -outdir /test/output/

 

mapredtest:

Usage: TestMapRed <range> <counts>

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar mapredtest 2 10

 

testarrayfile:

Usage: TestArrayFile [-count N] [-nocreate] [-nocheck] file

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testarrayfile -count 4 /test/input/testdata.txt

testsequencefile:

 

Usage: SequenceFile [-count N] [-seed #] [-check] [-compressType <NONE|RECORD|BLOCK>] -codec <compressionCodec> [[-rwonly] | {[-megabytes M] [-factor F] [-nocreate] [-fast] [-merge]}] file

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsequencefile -count 4 -check True -fast True /test/input/testdata.txt

 

testsetfile:

 

Usage: TestSetFile [-count N] [-nocreate] [-nocheck] [-compress type] file

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsetfile -count 4 /test/input/testdata.txt

 

threadedmapbench:

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar threadedmapbench

testfilesystem:

 

Usage: TestFileSystem -files N -megaBytes M [-noread] [-nowrite] [-noseek] [-fastcheck]

 

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar -file 1 -megaBytes 1000

testmapredsort:

 

sortvalidate [-m <maps>] [-r <reduces>] [-deep] -sortInput <sort-input-dir> -sortOutput <sort-output-dir>

 

hadoop jar hadoop-test.jar -m 10 -r 5 -sortInput /test/input/ -sortOutpur /test/output

testbigmapoutput:

BigMapOutput -input <input-dir> -output <output-dir> [-create <filesize in MB>]hadoop jar hadoop-test.jar testbigmapoutput -input /test/input1/ -output /test/output1/

 

TestDFSIO基准测试HDFS 

 

测试顺序应该是先写测试后读测试

 

Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]

 

写测试:

 

使用10map任务写10个文件,每个500m

 

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -write -nrFiles 10  -fileSize 1000 /tmp/TestDFSIO_log.txt

 

在运行的最后,结果被写入控制台并记录到路径/tmp/TestDFSIO_log.txt

 

数据默认写入 /benchmarks/TestDFSIO目录下 

 

读测试:

 

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -read-nrFiles 10  -fileSize 1000 /tmp/TestDFSIO_log.txt

 

清除测试数据:

 

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -clean

 

 

 

namenode 基准测试:

 

12mapper6reducer来创建1000个文件

 

hadoop jar hadoop-test.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`

mapreduce 基准测试:

 

mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效

 

运行一个小作业50

 

hadoop jar hadoop-test.jar mrbench -numRuns 50

 

 

 

 

 

 

 

 

 

testipc和tectrpc:

hadoop jar hadoop-test.jar testipc

 

 

hadoop jar hadoop-test.jar testrpc

 

PS:命令参数选择和设计可以根据硬件环境的设定

一些错误解决办法:

目的文件夹已存在:删除目标文件夹,再重跑相关命令

java heapsize不足:调高相应参数,或者跑任务之前参数设置多点maptask和reducetask

 

 

 

 

 

 

 

你可能感兴趣的:(hadoop)