MapReduce相关问题排查思路

最近在调试通过MR进行跨机房的数据传输问题,由于MR是单独的集群,期间遇到非常多的问题,本文将遇到的问题做个简单的总结,方便后续的学习和实践。

一、MR异常的检查和排查思路

1、检查hdfs集群是否正常,通过访问namenode的页面

       http://namenodeip:port/dfshealth.jsp

2、检查resourcemanager、historyserver、nodemanager是否启动正常

3、检查页面的调度是否正常http://resourcemanager:webport/cluster

     备注:可以用mapreduce自带的测试范例进程测试,来检查MR本身是否正常。

4、检查MR集群的机器到zookeeper的访问策略、到namenode的策略、到datanode的策略是否OK

5、检查对应调度的nodemanager的日志

6、检查resourcemanager的日志

7、检查resourcemanager上对应的任务的详情:

    hadoop fs -cat /tmp/logs/hbaseadmin/logs/application_1538048614615_0001/10.128.167.230_46881

二、遇到的相关报错和解决办法

1、jar文件不存在

MapReduce相关问题排查思路_第1张图片

解决原因:是由于相关jar文件不存

解决办法:把 lib 目录传到 hdfs对应的路径下就OK

2、UnknownHostException解决办法

yarn-hdfsadmin-nodemanager-10.128.168.169.log:2018-09-14 19:47:55,713 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://hbase-ns-cft-history-db-ss/tmp/hadoop-yarn/staging/hbaseadmin/.staging/job_1536806516885_0007/libjars/hbase-hadoop-compat-0.98-tdw-1.6.8.jar, 1536925670561, FILE, null }, java.net.UnknownHostException: hbase-ns-cft-history-db-ss

添加对对应集群的配置支持,范例如下:

        dfs.nameservices

hbase-ns-cft-mr,hbase-ns-cft-history-db-ss

dfs.ha.namenodes.hbase-ns-cft-history-db-ss

        nn1,nn2

dfs.namenode.rpc-address.hbase-ns-cft-history-db-ss.nn2

        9.7.158.40:9000

dfs.namenode.rpc-address.hbase-ns-cft-history-db-ss.nn1

        9.7.157.26:9000

dfs.client.failover.proxy.provider.hbase-ns-cft-history-db-ss

        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

3、datanode节点异常

ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbaseadmin (auth:SIMPLE) cause:java.io.IOException: File /tmp/logs/hbaseadmin/logs/application_1536806516885_0017/10.128.167.182_51815.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

2018-09-26 19:23:45,472 INFO org.apache.hadoop.ipc.Server: IPC Server handler 72 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.128.167.182:47057 Call#1154172 Retry#0: error: java.io.IOException: File /tmp/logs/hbaseadmin/logs/application_1536806516885_0017/10.128.167.182_51815.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

java.io.IOException: File /tmp/logs/hbaseadmin/logs/application_1536806516885_0017/10.128.167.182_51815.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

原因:hdfs集群有异常导致

解决办法:恢复hdfs集群就Ok

4、namenode节点格式化后的遗留问题

2018-09-27 17:35:34,879 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-1391819990-10.128.167.196-1538040017684 (storage id DS-1807265670-10.128.169.160-9003-1463040846426) service to 10.128.167.196/10.128.167.196:9000

原因:namenode格式化后,没有对data的数据盘也进行格式化

解决办法:对datanode进行格式并重新挂载后恢复(注意重新挂载后数据目录的权限)

5、配置的mapreduce.jobhistory.intermediate-done-dir目录不存在

2018-09-27 19:46:48,264 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Creating intermediate history logDir: [hdfs://cft-history-db-ss-hbase-nn-1.tencent-distribute.com:9000/history/log] + based on conf. Should ideally be created by the JobHistoryServer: yarn.app.mapreduce.am.create-intermediate-jh-base-dir

2018-09-27 19:46:48,280 ERROR [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed checking for the existance of history intermediate done directory: [hdfs://cft-history-db-ss-hbase-nn-1.tencent-distribute.com:9000/history/log]

2018-09-27 19:46:48,280 INFO [main] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbaseadmin, access=WRITE, inode="/":hdfsadmin:supergroup:drwxr-xr-x

原因:MR上mapreduce.jobhistory.intermediate-done-dir的目录不存在

解决办法:创建即可

你可能感兴趣的:(MapReduce相关问题排查思路)