spark on yarn关键配置

 配置面向节点情况:每台42G内存 12物理核 1.6T磁盘  总计1.3T数据

spark1.6版本

yarn.scheduler.minimum-allocation-mb  * yarn.nodemanager.vmem-pmem-ratio = 虚拟内存的总量
yarn.scheduler.minimum-allocation-mb 默认为1G
yarn.nodemanager.vmem-pmem-ratio 默认为2.1G

120g 72核
spark-submit \
--class com.cloudbase.od.runner.PositionTraceWithLaccellDataGroup \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--driver-cores 1 \
--executor-memory 6g \
--executor-cores 4 \
--num-executors 17 \
--conf spark.storage.memoryFraction=0.2 \
--conf spark.shuffle.memoryFraction=0.5 \
--conf spark.sql.shuffle.partitions=272 \
--conf spark.default.parallelism=272 \
--conf spark.shuffle.file.buffer=512 \
--conf spark.reducer.maxSizeInFlight=96m \
--conf spark.locality.wait=20 \
--conf spark.core.connection.act.wait.timeout=300 \
--conf spark.executor-memoryOverhead=1g \
--conf spark.driver.memoryOverhead=1g \
hdfs://CXGHDSJFXJM-10-242-24-1.domain.localdomain:8020/input/jars/1.6.jar \
hdfs://CXGHDSJFXJM-10-242-24-1.domain.localdomain:8020/input/conf/*    
 

120g 72核
spark-submit \
--class com.cloudbase.od.main.K_Train \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--driver-cores 1 \
--executor-memory 6g \
--executor-cores 4 \
--num-executors 17 \
--conf spark.storage.memoryFraction=0.2 \
--conf spark.shuffle.memoryFraction=0.5 \
--conf spark.sql.shuffle.partitions=272 \
--conf spark.default.parallelism=272 \
--conf spark.shuffle.file.buffer=512 \
--conf spark.reducer.maxSizeInFlight=96m \
--conf spark.locality.wait=20 \
--conf spark.core.connection.act.wait.timeout=300 \
--conf spark.executor-memoryOverhead=1g \
--conf spark.driver.memoryOverhead=1g \
--jars /jxdsj/opt/modules/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.46-bin.jar,/jxdsj/opt/modules/spark-1.6.3-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar,/jxdsj/opt/modules/spark-1.6.3-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar,/jxdsj/opt/modules/spark-1.6.3-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar \
--files /jxdsj/opt/modules/spark-1.6.3-bin-hadoop2.6/conf/hive-site.xml \
hdfs://CXGHDSJFXJM-10-242-24-1.domain.localdomain:8020/input/jars/1.6.jar \
hdfs://CXGHDSJFXJM-10-242-24-1.domain.localdomain:8020/input/params_k_conf/

hive-site.xml



	javax.jdo.option.ConnectionURL
	jdbc:mysql://CXGHDSJFXJM-10-242-24-3.domain.localdomain:3306/metastore?useSSL=false


	javax.jdo.option.ConnectionDriverName
	com.mysql.jdbc.Driver


	javax.jdo.option.ConnectionUserName
	jzserver


	javax.jdo.option.ConnectionPassword
	123456

    
    hive.cli.print.header
    true


    hive.cli.print.current.db
    true


    hive-metastore.uris
    thrift://CXGHDSJFXJM-10-242-24-3.domain.localdomain:9083


    hive.exec.dynamic.partition
    true


    hive.exec.dynamic.partition.mode
    nonstrict



capacity-scheduler.xml


    yarn.scheduler.capacity.resource-calculator
    org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
    DefaultResourceCalculator只能计算内存相关的信息,DominantResourceCalculator可以统计core和内存的信息


yarn-site.xml 






	yarn.resourcemanager.hostname
	CXGHDSJFXJM-10-242-24-3.domain.localdomain




	yarn.nodemanager.aux-services
	mapreduce_shuffle,spark_shuffle



        yarn.nodemanager.aux-services.spark_shuffle.class
        org.apache.spark.network.yarn.YarnShuffleService



	spark.shuffle.service.port
	7337

 


	yarn.log-aggregation-enable
	true

 


	yarn.log-aggregation.retain-seconds
	86400


  
        yarn.resourcemanager.address  
        CXGHDSJFXJM-10-242-24-3.domain.localdomain:8032  
  

  
        yarn.resourcemanager.scheduler.address  
        CXGHDSJFXJM-10-242-24-3.domain.localdomain:8030  
  

  
        yarn.resourcemanager.resource-tracker.address  
        CXGHDSJFXJM-10-242-24-3.domain.localdomain:8031  



        yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
        1000



        yarn.resourcemanager.webapp.address
        CXGHDSJFXJM-10-242-24-3.domain.localdomain:8088


  
	yarn.nodemanager.vmem-check-enabled  
	false  

 

	yarn.nodemanager.pmem-check-enabled
	false


 
	yarn.nodemanager.resource.cpu-vcores 
	24 
 


        yarn.nodemanager.resource.memory-mb
        34816



        yarn.scheduler.maximum-allocation-mb
        34816



        yarn.scheduler.minimum-allocation-mb
        3072



        yarn.nodemanager.vmem-pmem-ratio
        2.1



        yarn.scheduler.minimum-allocation-vcores
        1



        yarn.scheduler.maximum-allocation-vcores
        24



        yarn.scheduler.increment-allocation-mb
        1024





spark提交应用总内存=(driver-memory+driver.memoryOverhead)+(executor-memory+executor-memoryOverhead) * num-executors (yarn.scheduler.increment-allocation-mb最小单位值,不足则补足)

spark提交应用总核数=driver-cores+executor-cores*num-executors

spark.default.parallelism=executor-cores*num-executors*3||*4
 

你可能感兴趣的:(spark,Hadoop,spark)