spark on yarn exitCode: -104

 

 

执行spark任务时,每次启动后,少则一个小时,多则两三天左右,任务就会死掉,yarn日志报错见下图:

AM Container for appattempt_1554609747730_49028_000001 exited with exitCode: -104
For more detailed output, check application tracking page:http:/xxx:8088/cluster/app/application_1554609747730_49028Then, click on links to logs of each attempt.
Diagnostics: Container [pid=14954,containerID=container_e06_1554609747730_49028_01_000001] is running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB physical memory used; 4.3 GB of 12.2 TB virtual memory used. Killing container.
Dump of the process-tree for container_e06_1554609747730_49028_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 15519 14954 14954 14954 (java) 23448925 131083 4470353920 656393 /usr/lib/jvm/java-1.8.0/bin/java -server -Xmx2048m -Djava.io.tmpdir=/mnt/disk1/yarn/usercache/hadoop/appcache/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/tmp -Dlog4j.ignoreTCL=true -Dspark.yarn.app.container.log.dir=/mnt/disk4/log/hadoop-yarn/containers/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class com.miaoke.job.online.realhouse.RealClassBeforeStuNum --jar file:/data/job/mkspark.jar --properties-file /mnt/disk1/yarn/usercache/hadoop/appcache/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/__spark_conf__/__spark_conf__.properties
|- 14954 14952 14954 14954 (bash) 5 7 115855360 358 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop-current/lib/native::/usr/lib/hadoop-current/lib/native::/opt/apps/ecm/service/hadoop/2.7.2-1.2.15/package/hadoop-2.7.2-1.2.15/lib/native:/usr/lib/hadoop-current/lib/native::/opt/apps/ecm/service/hadoop/2.7.2-1.2.15/package/hadoop-2.7.2-1.2.15/lib/native:/opt/apps/ecm/service/hadoop/2.7.2-1.2.15/package/hadoop-2.7.2-1.2.15/lib/native /usr/lib/jvm/java-1.8.0/bin/java -server -Xmx2048m -Djava.io.tmpdir=/mnt/disk1/yarn/usercache/hadoop/appcache/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/tmp '-Dlog4j.ignoreTCL=true' -Dspark.yarn.app.container.log.dir=/mnt/disk4/log/hadoop-yarn/containers/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.miaoke.job.online.realhouse.RealClassBeforeStuNum' --jar file:/data/job/mkspark.jar --properties-file /mnt/disk1/yarn/usercache/hadoop/appcache/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/__spark_conf__/__spark_conf__.properties 1> /mnt/disk4/log/hadoop-yarn/containers/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/stdout 2> /mnt/disk4/log/hadoop-yarn/containers/application_1554609747730_49028/container_e06_1554609747730_49028_01_000001/stderr
 
Container killed on request. Exit code is 143

 

解决:

  这是物理内存使用超过了限定值,YARN的NodeManager监控到内存使用超过阈值,强制终止该container进程。

  在Spark客户端“spark-defaults.conf”配置文件中增加如下参数,或者在提交命令时添加--conf指定如下参数,来增大memoryOverhead。

   spark.yarn.driver.memoryOverhead:设置堆外内存大小(cluster模式使用)。

   spark.yarn.am.memoryOverhead:设置堆外内存大小(client模式使用)。

  --conf spark.yarn.driver.memoryOverhead=768m

   --conf spark.yarn.am.memoryOverhead=768m

 

转载于:https://www.cnblogs.com/liugui/p/10892621.html

你可能感兴趣的:(spark on yarn exitCode: -104)