spark mr on yarn查看日志

要通过web页面查看运行日志,需要启动两个东西
hadoop启动jobhistoryserver和spark的history-server.

etc/hadoop/mapred-site.xml

   
   
         mapreduce.jobhistory.address
         spark-master:10020
    
    
        mapreduce.jobhistory.webapp.address
        spark-master:19888
    

yarn-site.xml


     
        yarn.log-aggregation-enable
        true
     
       
     
         yarn.log.server.url
         http://spark-master:19888/jobhistory/logs/
      
      
      
          yarn.log-aggregation.retain-seconds
          86400
      

spark-defaults.conf (spark的安装目录下)

spark.eventLog.enabled=true
spark.eventLog.compress=true
#保存在本地
#spark.eventLog.dir=file://usr/local/hadoop-2.7.6/logs/userlogs
#spark.history.fs.logDirectory=file://usr/local/hadoop-2.7.6/logs/userlogs

#保存在hdfs上
spark.eventLog.dir=hdfs://spark-master:9000/tmp/logs/root/logs
spark.history.fs.logDirectory=hdfs://spark-master:9000/tmp/logs/root/logs
spark.yarn.historyServer.address=spark-master:18080

启动

1.首先启动 hadoop的jobhistory

[root@spark-master hadoop-2.7.6]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.6/logs/mapred-root-historyserver-spark-master.out

2.启动spark的history-server

[root@spark-master spark-2.3.0]# sbin/start-history-server.sh 
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.3.0/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-spark-master.out

如果配置正确,启动完成之后,就可以访问18080 和19888

通过yarn提交任务出现Failed while trying to construct the redirect url to the log server. Log Server url may

1、在通过yarn-client模式提交任务时,打开http://master:8088/网页出现如下错误:

报错:
Aggregation is not enabled. Try the nodemanager at server-3:44981
Or see application

2、而且显示任务是成功运行的,并且任务运行结果也出来了

3、出现此问题是由于启动historyserver服务,默认情况关闭的,它是一个独立的服务,首先需要配置yarn-site.xml文件,在该配置文件中加入以下配置


    yarn.log.server.url
    http://master:19888/jobhistory/logs

4、然后在mapred-site.xml中加入如下配置,端口是在yarn-site.xml中一样,是19888:

    
        mapreduce.jobhistory.address
        master:10020
    
 
    
        mapreduce.jobhistory.webapp.address
        master:19888
    

5、将此更改的配置分发到其他节点上去,可通过如下命令进行分发:

scp /usr/local/src/hadoop-2.6.5/etc/hadoop/yarn-site.xml root@slave1:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/yarn-site.xml root@slave2:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/mapred-site.xml root@slave1:/usr/local/src/hadoop-2.6.5/etc/hadoop/
scp /usr/local/src/hadoop-2.6.5/etc/hadoop/mapred-site.xml root@slave2:/usr/local/src/hadoop-2.6.5/etc/hadoop/

6、在master上通过如下命令启动historyserver:

/usr/local/src/hadoop-2.6.5/sbin/mr-jobhistory-daemon.sh start historyserver

7、此时可以打开http://master:19888查看页面了,如下图所示:

但是当去点击log的链接的时候,会碰到Aggregation function is not enabled错误。为了能看到每个Map和Reduce任务的Log,还必须在yarn-site.xml里面配置aggregation为true。

  
    yarn.log-aggregation-enable
    true
    Configuration to enable or disable log aggregation
  

然后将yarn-site.xml同步到所有的节点,在重启集群。这个时候再点击上面那个logs链接,就可以看到每个任务的log了,而Logger们输出的内容也在里面!!

到了这里,就只剩下一个问题了。这个log文件在哪里?查看yarn-site.xml后终于发现了MapReduce任务的log的位置。

    yarn.nodemanager.remote-app-log-dir
    /logs
    HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled. The default value is "/logs" or "/tmp/logs" 
  

这里清楚的指明了这个log是存放在HDFS文件系统里面的,不是放在Linux文件系统里面的。在hdfs://namenode/logs/hadoop/logs里面,终于发现了每个任务对应的log文件夹。每个任务文件夹里有两个文件。分别对应的Map任务和Reduce任务。
[hadoop@SXV2V999 ~]$ hdfs dfs -ls hdfs://namenode/logs/hadoop/logs/application_1430285399789_0001

你可能感兴趣的:(spark mr on yarn查看日志)