OOM:GC overhead limit exceeded 错处理方法

*OOM:GC overhead limit exceeded spark.driver.memory spark.sql.shuffle.partitions*
错误现象:
调整不同资源进行执行时报错:
18/07/26 17:02:03 INFO spark.ContextCleaner: Cleaned accumulator 18
Exception in thread “broadcast-hash-join-1” 18/07/26 17:10:59 WARN nio.NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded

18/07/26 15:11:04 INFO spark.ContextCleaner: Cleaned accumulator 18
Exception in thread “broadcast-hash-join-1” java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.copy(UnsafeRow.java:537)
at org.apache.spark.sql.execution.joins.UnsafeHashedRelation .apply(HashedRelation.scala:403)atorg.apache.spark.sql.execution.joins.HashedRelation . a p p l y ( H a s h e d R e l a t i o n . s c a l a : 403 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . H a s h e d R e l a t i o n .apply(HashedRelation.scala:128)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin

anonfun$broadcastFuture$1 a n o n f u n $ b r o a d c a s t F u t u r e $ 1
anonfun apply a p p l y 1.apply(BroadcastHashOuterJoin.scala:92)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin
anonfun$broadcastFuture$1 a n o n f u n $ b r o a d c a s t F u t u r e $ 1
anonfun apply a p p l y 1.apply(BroadcastHashOuterJoin.scala:82)
at org.apache.spark.sql.execution.SQLExecution .withExecutionId(SQLExecution.scala:90)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin . w i t h E x e c u t i o n I d ( S Q L E x e c u t i o n . s c a l a : 90 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . B r o a d c a s t H a s h O u t e r J o i n anonfun a n o n f u n broadcastFuture 1.apply(BroadcastHashOuterJoin.scala:82)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin 1. a p p l y ( B r o a d c a s t H a s h O u t e r J o i n . s c a l a : 82 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . B r o a d c a s t H a s h O u t e r J o i n anonfun a n o n f u n broadcastFuture 1.apply(BroadcastHashOuterJoin.scala:82)atscala.concurrent.impl.Future 1. a p p l y ( B r o a d c a s t H a s h O u t e r J o i n . s c a l a : 82 ) a t s c a l a . c o n c u r r e n t . i m p l . F u t u r e PromiseCompletingRunnable.liftedTree1 1(Future.scala:24)atscala.concurrent.impl.Future 1 ( F u t u r e . s c a l a : 24 ) a t s c a l a . c o n c u r r e n t . i m p l . F u t u r e PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

经过度娘查看有两种说法,看样子是在yarn-standalone方式下调 executor.memory,或driver.memory(多建议由512m或1g调至2g)而当前问题发生集群上有,本身指字的driver.memory 已经是3G了,依然如此(集群限driver.memory最大3G),多次调整语句与其它未能达到解决此问题的效果;便提议集群放大一下driver.memory 的大小。
随后调整集群参数( SPARK_HOME/conf/spark-defaults.conf 中设置的spark.driver.memory )值到5g;再在集群中调用spark-submit 时指定driver.memory为 5g 启动任务就可以调用过去了。

另外在执行任务时,发现集群写出的文件数量比较大,调用时设置的: spark.sql.shuffle.partitions 与spark.default.parallelism 是
spark.executor.cores 与 spark.executor.instances 的积的1-3倍;后续经多次测试,发现文件大小与:spark.sql.shuffle.partitions 有直接关系一般是一次写出到hive会写 此值+1/2 个文件,因此调整此参数据的数值,文件数量减少,当前未明了其原因,请路过的大拿给讲解讲解。

你可能感兴趣的:(spark,OOM)