HiBench是一款用于hadoop集群性能测试的开源工具。支持MR,HIVE,SPARK等计算框架,且支持多种维度的测试。早在HiBench5.0的时候作者就用过,个人感觉还不错,无论是apache hadoop,cdh还是hdp都可以支持。
最新发现,HiBench 7.x 版本发布了。因此就趁着机会,结合新版本总结一下使用中遇到的问题,助大家过坑。
HiBench的github地址如下:https://github.com/intel-hadoop/HiBench
OS:CentOS 7.3
Hadoop: HDP 2.6.5.x
HiBench: 7.1
使用maven命令即可,具体请见:https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md
问题描述:
maven构建过程中报以下错误(以mahout为例):
INFO] --- download-maven-plugin:1.2.0:wget (default) @ mahout ---
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Retrying (1 more)
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
at com.googlecode.WGet.doGet(WGet.java:293)
at com.googlecode.WGet.execute(WGet.java:223)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
层主在构建"hadoopbench-sql","mahout"和"nutchindexing"时都遇到了类似问题。
问题原因:
网站访问协议更换导致的对应文件下载异常。
解决方法:
方式1. 将pom.xml中的"http"修改为"https",例如:
https://archive.apache.org
dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz
方式2. 将对应文件在"https://archive.apache.org/dist"此源中下载至本地http,再修改pom.xml中的文件路径,例如:
http://private.repo.io/archive.apache.org
dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz
第一种方式最简单。如果网速不给力,请用第二种方法。
具体不同的测试请参见:https://github.com/intel-hadoop/HiBench/blob/master/docs/ 中的"run-*.md"
===================================================================
问题描述:
执行HiBench命令报错:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/var/tmp":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1934)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1917)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4185)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:645)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:610)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3099)
... 21 more
问题原因:
HiBench在使用过程中,会需要在HDFS下建立目录,如果默认使用root用户进行测试,且HDFS开启简单权限模式的话(hdfs为超级用户),自然会权限限制。
解决方法:
预先创建以下目录,并修改权限(暴力点可以直接给777,或是确定进行测试用户具备相应目录的写权限):
注:部分目录可通过conf/hibench.conf进行修改,以下目录参考是以默认配置文件为基准的。
/HiBench
/var
/user/root # root是作者用来测试的用户
===================================================================
问题描述:
使用默认的HiBench的运行命令执行spark作业测试,例如:
bin/workloads/sql/scan/spark/run.sh
该spark作业的日志将不会出现在job history server中。
问题原因:
HiBench的作业默认使用的是自生成的spark.conf而不是使用实际环境下的spark-defaults.conf。
解决方法:
首先,根据实际环境下的spark-defaults.conf找到以下用于规定日志输出的三个参数:
spark.eventLog.enabled
spark.yarn.historyServer.address
spark.eventLog.dir
将上述三个参数拼接成用于spark-submit的conf信息:
--conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/"
修改HiBench的"bin/functions/workload_functions.sh"文件,将上述conf信息添加至提交命令中,大概在216行(修改前请备份):
修改前:
if [[ "$CLS" == *.py ]]; then
LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"
SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
else
SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi
修改后:
if [[ "$CLS" == *.py ]]; then
LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"
SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
else
SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi
===================================================================
问题描述:
执行以下命令将在HDFS生成的种子数据发往Kafka:
bin/workloads/streaming/identity/prepare/dataGen.sh
发生异常:
====================================================
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "pool-1-thread-1" java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:579)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:524)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at com.intel.hibench.datagen.streaming.util.SourceFileReader.getReader(SourceFileReader.java:33)
at com.intel.hibench.datagen.streaming.util.CachedData.(CachedData.java:56)
at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43)
at com.intel.hibench.datagen.streaming.util.KafkaSender.(KafkaSender.java:55)
at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: mycluster
... 20 more
问题原因:
作者本人的hadoop集群的HDFS是采用高可用部署的,namenode为active/standby模式,拥有一个namespace叫"mycluster",因此默认在HiBench的conf/hadoop.conf中的配置信息为:
# The root HDFS path to store HiBench data
hibench.hdfs.master hdfs://mycluster
具体原因应该出自数据传输的实现,请见"com.intel.hibench.datagen.streaming.DataGenerator"。目测是该实现不支持namespace的访问方式,本人没有细究,有兴趣的可以去看下上述类的源码。
解决方法:
将上述conf/hadoop.conf中的配置信息修改为指定的namenode信息即可,例如:
# The root HDFS path to store HiBench data
hibench.hdfs.master hdfs://node1.com:8020
===================================================================
很多默认的HiBench的测试结果文件都是以SequenceFile进行输出的(mahout必须是SequenceFile),具体请见各个作业的执行脚本,例如Wordcount的run.sh(截取部分):
run_hadoop_job ${HADOOP_EXAMPLES_JAR} wordcount \
-D mapreduce.job.maps=${NUM_MAPS} \
-D mapreduce.job.reduces=${NUM_REDS} \
-D mapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \
-D mapreduce.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \
-D mapreduce.job.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \
-D mapreduce.job.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \
${INPUT_HDFS} ${OUTPUT_HDFS}
END_TIME=`timestamp`
为了方便阅读或是结果数据的后续使用,可通过如下方式进行转化:
使用FS SHELL进行文本输出:
sudo -u hdfs hadoop fs -text /path/xxx
使用MR(转载)实现文件转换:
https://blog.csdn.net/sinat_29508201/article/details/49155127
使用Spark(pyspark):
# SparkSession available as 'spark'
reader=sc.sequenceFile("xxxx",keyClass="org.apache.hadoop.io.Text",valueClass="org.apache.hadoop.io.IntWritable")
# 直接打印出来
def out(x):
print x
reader.foreach(out)
# 转为sql
df=spark.read.json(reader.map(lambda x: x))