温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。

1.问题描述

Hive的MapReduce作业无法正常运行,日志如下:

0: jdbc:hive2://localhost:10000>select count(*) from student;

command(queryId=hive_20170902081616_d676f921-c62c-4fac-84b9-272663a2fca0); _Time_taken: 10.029 seconds

Error: Error while processing statement: FAILED: Execution Error,return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

0: jdbc:hive2://localhost:10000>

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第1张图片

MapRedecu作业无法正常运行,日志如下:

[root@ip-172-31-6-148 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5
...
Diagnostics: Exception from container-launch.
Container id: container_1504338960864_0005_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
        at org.apache.hadoop.util.Shell.run(Shell.java:504)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
17/09/02 08:19:36 INFO mapreduce.Job: Counters: 0
Job Finished in 8.452 seconds
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-6-148:8020/user/root/QuasiMonteCarlo_1504340365604_1994724640/out/reduce-out
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1820)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1844)
        at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
        at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[root@ip-172-31-6-148 hadoop-mapreduce]# 

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第2张图片

通过JobHistory页面无法查看作业的日志:

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第3张图片

2.问题分析

1.查看Yarn的ResourceManager日志,无法正常创建Container,异常如下:

Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
        at org.apache.hadoop.util.Shell.run(Shell.java:504)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
…
Container id: container_1504341269835_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
        at org.apache.hadoop.util.Shell.run(Shell.java:504)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第4张图片

2.查看NodeManager节点日志,异常日志如下:

2017-09-02 08:37:35,317 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1504341269835_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
        at org.apache.hadoop.util.Shell.run(Shell.java:504)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1504341269835_0001_01_000001

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第5张图片

3.查看JobHistory服务的log日志

2017-09-02 08:40:31,676 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files
2017-09-02 08:40:32,880 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:PROXY) via mapred (auth:SIMPLE) cause:java.io.FileNotFoundException:
File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

2017-09-02 08:40:32,882 WARN org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process job files
java.io.FileNotFoundException: File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第6张图片

4.查看HDFS的Namenode日志,异常如下:

2017-09-02 08:37:29,445 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/root/.staging/job_1504341269835_0001/job.xml is closed by DFSClient_NONMAPREDUCE_478129775_1
2017-09-02 08:37:29,451 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.10.118:50010 is added to blk_1073744484_3660 size 106954
2017-09-02 08:37:35,265 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:35,265 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.5.190:46293 Call#5 
Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.10.118:49343 Call#5
 Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:41,200 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/fail/root_appattempt_1504341269835_0001_000002 is closed by DFSClient_NONMAPREDUCE_-
860670620_215
2017-09-02 08:37:41,276 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073744476_3652 172.31.10.118:50010 172.31.9.33:50010 172.31.5.190:50010 

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第7张图片

分析过程:

  1. 查看ResourceManager日志未发现原因
  2. 查看NodeManager日志未发现原因
  3. JobHistory日志无法正常查看,由于MapReduce作业先在(/user/xxx用户/xxxJob)目录下创建临时日志文件,然后将日志文件移至/user/history目录。
  4. 查看HDFS的NameNode日志,作业产生的临时日志文件无法正常写入/user/history目录
  5. 问题原因是由于HDFS的/user/history目录权限低,导致Yarn作业日志无法记录

3.解决方法

修改/user/history目录的权限及属主

sudo -u hdfs hadoop dfs -chmod 777 /user/history
sudo –u hdfs hadoop dfs –chown mapred:hadoop /user/history

修改权限前

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第8张图片

修改权限后,数据正常写入,MapReduce任务正常

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第9张图片

醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。

欢迎关注Hadoop实操,第一时间,分享更多Hadoop干货,喜欢请关注分享。

0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常_第10张图片
原创文章,欢迎转载,转载请注明:转载自微信公众号Hadoop实操