spark 常用配置参数调优

spark 参数调优

(spark.sql.hive.metastore.version,1.2.1)


三.ERROR

问题1:

ERROR YarnScheduler: Lost executor 53 on node100p32: Container killed by YARN for exceeding memory limits.
10.0 GB of 10 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

解决:

暂时换用 hive: 控制 reduce数750,过程中 Allocated memory max约3.3T,20个Job 正好8小时 。

-- set mapreduce.map.memory.mb=3000;
-- set mapreduce.reduce.memory.mb=6000;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.max.split.size=512000000;
set mapred.min.split.size.per.node=128000000;
set mapred.min.split.size.per.rack=128000000;
set hive.merge.mapfiles=true;
set hive.map.aggr=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.exec.reducers.max=750;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1500;
set hive.exec.max.dynamic.partitions.pernode=1500;

原因分析(待写):

set mapreduce.map.memory.mb=2048
set mapreduce.reduce.memory.mb=6000;
set spark.yarn.executor.memoryOverhead
set yarn.nodemanager.vmem-check-enabled
set hive.groupby.skewindata=true;
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=5000000;

一.spark常用配置

1.spark-sql:

spark-sql --name “$0”
–master yarn --deploy-mode client --queue deve
–driver-memory 4g --executor-memory 6g --num-executors 50 --executor-cores 3
–conf spark.dynamicAllocation.enabled=true
–conf spark.shuffle.service.enabled=true
–conf spark.dynamicAllocation.minExecutors=20
–conf spark.dynamicAllocation.maxExecutors=56
–conf spark.sql.adaptive.enabled=true
–conf spark.sql.adaptive.maxNumPostShufflePartitions=500
–conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=256000000
–conf spark.yarn.executor.memoryOverhead=1200m
-i /opt/data/dev/util/spark_com.sql
–hiveconf hive.cli.print.header=true
–hiveconf hive.resultset.use.unique.column.names=false
–conf ‘spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/data/dev/spark/log4j.properties’
-v -e
" ${sql_query_insert} "

2.spark-submit:

spark-submit --master yarn --queue deve
–driver-memory 6G --executor-memory 7G --num-executors 32 --executor-cores 3
–conf spark.yarn.executor.memoryoverhead=8096M
–conf spark.sql.shuffle.partitions=1000
–conf spark.default.parallelism=150
–conf spark.shuffle.service.enabled=true
–conf spark.shuffle.service.port=
–class com.ecnomic.test
/package/package.jar 2 2020-10-01 2020-10-01 > /log.log 2>&1

3.udf加载方式:(hive的udf不用考虑线程安全,而spark的udf需考虑线程安全)

方式1.初始化文件: spark-sql -i /opt/data/dev/util/spark_com.sql
方式2.source: source /opt/data/dev/util/spark_com.sql;

such:
add jar /opt/data/lib/udf.jar;
create temporary function udf_date_format as ‘com.hive.udf.DateFormat’; spark/hive -e " source /opt/data/dev/util/spark_com.sql; select * from table_test limit 5;"


二.资源调整

mapreduce.map.memory.mb=3000  指定这个mapreduce任务运行时内存的大小
mapreduce.reduce.memory.mb=6000  
spark.yarn.executor.memoryoverhead=6000     解决OOM,调节对外内存大小,以满足JVM自身的开销
spark.shuffle.service.enabled=true          NodeManager中一个长期运行的辅助服务,用于提升Shuffle计算性能。默认为false,表示不启用该功能。
    (1).Spark系统在运行含shuffle过程的应用时,Executor进程除了运行task,还要负责写shuffle数据,给其他Executor提供shuffle数据。
        当Executor进程任务过重,导致GC而不能为其他Executor提供shuffle数据时,会影响任务运行。
    (2).External shuffle Service是长期存在于NodeManager进程中的一个辅助服务。通过该服务来抓取shuffle数据,减少了Executor的压力,
        在Executor GC的时候也不会影响其他Executor的任务运行。
        
参考: https://blog.csdn.net/zuodaoyong/article/details/107172810 Spark之Shuffle参数调优解析

1.自适应框架

spark.sql.adaptive.enabled 自适应执行框架的开关,默认 false,启用 Adaptive Execution ,从而启用自动设置 Shuffle Reducer 特性
spark.sql.adaptive.minNumPostShufflePartitions 默认 1,reduce个数区间最小值
spark.sql.adaptive.maxNumPostShufflePartitions 默认 500,reduce个数区间最大值
spark.sql.adaptive.shuffle.targetPostShuffleInputSize 默认为67108864(64MB),动态调整reduce个数的partition大小依据,为每个Reducer读取的目标数据量,如设置64MB则reduce阶段每个task最少处理64MB的数据,一般改成集群块大小
spark.sql.adaptive.shuffle.targetPostShuffleRowCount 默认为20000000 动态调整reduce个数的partition条数依据,如设置20000000则reduce阶段每个task最少处理20000000条的数据
参考:https://blog.csdn.net/qq_14950717/article/details/105302842 Spark-SQL adaptive 自适应框架

2.动态资源 :

spark.dynamicAllocation.enabled 是否开启动态资源配置,根据工作负载来衡量是否应该增加或减少executor,默认false
spark.shuffle.service.enabled=true **
spark.dynamicAllocation.minExecutors 动态分配最小executor个数,在启动时就申请好的,默认0,初始executor数量
spark.dynamicAllocation.maxExecutors 动态分配最大executor个数,(默认infinity,默认是无限制的。## 待验证)
spark.dynamicAllocation.initialExecutors 动态分配初始executor个数默认值=spark.dynamicAllocation.minExecutors,如果–num-executors设置的值比这个值大,那么将使用–num-executors设置的值作为初始executor数量。
spark.dynamicAllocation.executorIdleTimeout 当某个executor空闲超过这个设定值,就会被kill,默认60s
spark.dynamicAllocation.cachedExecutorIdleTimeout 如果executor内有缓存数据(cache data),并且空闲了N秒。则remove该executor。默认值无限制。
spark.dynamicAllocation.schedulerBacklogTimeout 任务队列非空,资源不够,申请 executor的时间间隔,默认1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 同schedulerBacklogTimeout,是申请了新executor之后继续申请的间隔,默认=schedulerBacklogTimeout
参考: https://blog.csdn.net/zyzzxycj/article/details/82256893

3.数据倾斜

spark.sql.adaptive.enabled 默认
false,自适应执行框架的开关 spark.sql.adaptive.skewedJoin.enabled
默认 false 倾斜处理开关 spark.sql.adaptive.skewedPartitionFactor
默认 10 当一个partition的size大小 大于 该值乘以所有parititon大小的中位数 且
大于spark.sql.adaptive.skewedPartitionSizeThreshold,或者parition的条数大于该值乘以所有parititon条数的中位数且
大于 spark.sql.adaptive.skewedPartitionRowCountThreshold,
才会被当做倾斜的partition进行相应的处理
spark.sql.adaptive.skewedPartitionSizeThreshold 默认 67108864
倾斜的partition大小不能小于该值,该值还需要参照HDFS使用的压缩算法以及存储文件类型(如ORC、Parquet等)
spark.sql.adaptive.skewedPartitionRowCountThreshold 默认 10000000
倾斜的partition条数不能小于该值 spark.shuffle.statistics.verbose
默认 false 打开后MapStatus会采集每个partition条数的信息,用于倾斜处理

参考:https://blog.csdn.net/qq_14950717/article/details/105302842 Spark-SQL adaptive 自适应框架

4. 内存管理

参见:https://www.iteblog.com/archives/2342.html
https://blog.csdn.net/zyzzxycj/article/details/81011540
https://my.oschina.net/freelili/blog/1853714
https://blog.yoodb.com/sugarliny/article/detail/1307

三.ERROR

问题2:

WARN TaskSetManager: Lost task 90.0 in stage 17.0 (TID 8770, n20p191,
executor 136): FetchFailed(BlockManagerId(65, n20p193, 7337, None),
shuffleId=3, mapId=247, reduceId=90, message=
org.apache.spark.shuffle.FetchFailedException: Connection reset by
peer
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:554)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:485)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
at scala.collection.IteratorKaTeX parse error: Can't use function '$' in math mode at position 5: anon$̲12.nextCur(Iter…anon 12. h a s N e x t ( I t e r a t o r . s c a l a : 441 ) a t s c a l a . c o l l e c t i o n . I t e r a t o r 12.hasNext(Iterator.scala:441) at scala.collection.Iterator 12.hasNext(Iterator.scala:441)atscala.collection.Iterator$anon 11. h a s N e x t ( I t e r a t o r . s c a l a : 409 ) a t o r g . a p a c h e . s p a r k . u t i l . C o m p l e t i o n I t e r a t o r . h a s N e x t ( C o m p l e t i o n I t e r a t o r . s c a l a : 31 ) a t o r g . a p a c h e . s p a r k . I n t e r r u p t i b l e I t e r a t o r . h a s N e x t ( I n t e r r u p t i b l e I t e r a t o r . s c a l a : 37 ) a t s c a l a . c o l l e c t i o n . I t e r a t o r 11.hasNext(Iterator.scala:409) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator 11.hasNext(Iterator.scala:409)atorg.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)atorg.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)atscala.collection.Iterator$anon 11. h a s N e x t ( I t e r a t o r . s c a l a : 409 ) a t o r g . a p a c h e . s p a r k . s q l . c a t a l y s t . e x p r e s s i o n s . G e n e r a t e d C l a s s 11.hasNext(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass 11.hasNext(Iterator.scala:409)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage2.sort_addToSorter_0 ( U n k n o w n S o u r c e ) a t o r g . a p a c h e . s p a r k . s q l . c a t a l y s t . e x p r e s s i o n s . G e n e r a t e d C l a s s (Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass (UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage2.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExecKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲13anon 1. h a s N e x t ( W h o l e S t a g e C o d e g e n E x e c . s c a l a : 636 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . R o w I t e r a t o r F r o m S c a l a . a d v a n c e N e x t ( R o w I t e r a t o r . s c a l a : 83 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . S o r t M e r g e J o i n S c a n n e r . a d v a n c e d S t r e a m e d ( S o r t M e r g e J o i n E x e c . s c a l a : 811 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . S o r t M e r g e J o i n S c a n n e r . f i n d N e x t O u t e r J o i n R o w s ( S o r t M e r g e J o i n E x e c . s c a l a : 770 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . O n e S i d e O u t e r I t e r a t o r . a d v a n c e S t r e a m ( S o r t M e r g e J o i n E x e c . s c a l a : 934 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . j o i n s . O n e S i d e O u t e r I t e r a t o r . a d v a n c e N e x t ( S o r t M e r g e J o i n E x e c . s c a l a : 970 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . R o w I t e r a t o r T o S c a l a . h a s N e x t ( R o w I t e r a t o r . s c a l a : 68 ) a t o r g . a p a c h e . s p a r k . s q l . c a t a l y s t . e x p r e s s i o n s . G e n e r a t e d C l a s s 1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:811) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:770) at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:934) at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:970) at org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68) at org.apache.spark.sql.catalyst.expressions.GeneratedClass 1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:811)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:770)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:934)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:970)atorg.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage6.sort_addToSorter_0 ( U n k n o w n S o u r c e ) a t o r g . a p a c h e . s p a r k . s q l . c a t a l y s t . e x p r e s s i o n s . G e n e r a t e d C l a s s (Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass (UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage6.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExecKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲13anon 1. h a s N e x t ( W h o l e S t a g e C o d e g e n E x e c . s c a l a : 636 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . a g g r e g a t e . S o r t A g g r e g a t e E x e c 1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.aggregate.SortAggregateExec 1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.aggregate.SortAggregateExec a n o n f u n anonfun anonfundoExecute 1 1 1$anonfun 3. a p p l y ( S o r t A g g r e g a t e E x e c . s c a l a : 80 ) a t o r g . a p a c h e . s p a r k . s q l . e x e c u t i o n . a g g r e g a t e . S o r t A g g r e g a t e E x e c 3.apply(SortAggregateExec.scala:80) at org.apache.spark.sql.execution.aggregate.SortAggregateExec 3.apply(SortAggregateExec.scala:80)atorg.apache.spark.sql.execution.aggregate.SortAggregateExec a n o n f u n anonfun anonfundoExecute 1 1 1$anonfun 3. a p p l y ( S o r t A g g r e g a t e E x e c . s c a l a : 77 ) a t o r g . a p a c h e . s p a r k . r d d . R D D 3.apply(SortAggregateExec.scala:77) at org.apache.spark.rdd.RDD 3.apply(SortAggregateExec.scala:77)atorg.apache.spark.rdd.RDD a n o n f u n anonfun anonfunmapPartitionsWithIndexInternal 1 1 1$anonfun 13. a p p l y ( R D D . s c a l a : 845 ) a t o r g . a p a c h e . s p a r k . r d d . R D D 13.apply(RDD.scala:845) at org.apache.spark.rdd.RDD 13.apply(RDD.scala:845)atorg.apache.spark.rdd.RDD a n o n f u n anonfun anonfunmapPartitionsWithIndexInternal 1 1 1$anonfun 13. a p p l y ( R D D . s c a l a : 845 ) a t o r g . a p a c h e . s p a r k . r d d . M a p P a r t i t i o n s R D D . c o m p u t e ( M a p P a r t i t i o n s R D D . s c a l a : 52 ) a t o r g . a p a c h e . s p a r k . r d d . R D D . c o m p u t e O r R e a d C h e c k p o i n t ( R D D . s c a l a : 346 ) a t o r g . a p a c h e . s p a r k . r d d . R D D . i t e r a t o r ( R D D . s c a l a : 310 ) a t o r g . a p a c h e . s p a r k . r d d . M a p P a r t i t i o n s R D D . c o m p u t e ( M a p P a r t i t i o n s R D D . s c a l a : 52 ) a t o r g . a p a c h e . s p a r k . r d d . R D D . c o m p u t e O r R e a d C h e c k p o i n t ( R D D . s c a l a : 346 ) a t o r g . a p a c h e . s p a r k . r d d . R D D . i t e r a t o r ( R D D . s c a l a : 310 ) a t o r g . a p a c h e . s p a r k . s c h e d u l e r . S h u f f l e M a p T a s k . r u n T a s k ( S h u f f l e M a p T a s k . s c a l a : 99 ) a t o r g . a p a c h e . s p a r k . s c h e d u l e r . S h u f f l e M a p T a s k . r u n T a s k ( S h u f f l e M a p T a s k . s c a l a : 55 ) a t o r g . a p a c h e . s p a r k . s c h e d u l e r . T a s k . r u n ( T a s k . s c a l a : 123 ) a t o r g . a p a c h e . s p a r k . e x e c u t o r . E x e c u t o r 13.apply(RDD.scala:845) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor 13.apply(RDD.scala:845)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)atorg.apache.spark.scheduler.Task.run(Task.scala:123)atorg.apache.spark.executor.ExecutorTaskRunner$$anonfun 10. a p p l y ( E x e c u t o r . s c a l a : 408 ) a t o r g . a p a c h e . s p a r k . u t i l . U t i l s 10.apply(Executor.scala:408) at org.apache.spark.util.Utils 10.apply(Executor.scala:408)atorg.apache.spark.util.Utils.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor T a s k R u n n e r . r u n ( E x e c u t o r . s c a l a : 414 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r . r u n W o r k e r ( T h r e a d P o o l E x e c u t o r . j a v a : 1149 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor TaskRunner.run(Executor.scala:414)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
… 1 more

解决:

deep sleep

你可能感兴趣的:(spark,hive,大数据,spark,hive)