spark-default.sh:
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir 集群所有机器可以访问的地方,或者所有机器都一样的目录下
# spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 默认的driver启动内存设置
spark.cores.max 单个application最大的核数
spark.yarn.archive 指定yarn模式 spark jar包
spark.yarn.submit.waitAppCompletion false 提交任务不等待运行完成/离线
spark.driver.extraClassPath=driver依赖路径
spark.executor.extraClassPath=executor依赖路径
spark.ui.port 4042
spark-env.sh:
HADOOP_CONF_DIR=hadoop环境目录
SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
SPARK_EXECUTOR_INSTANCES, Number of executors to start (Default: 2)
SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors
SPARK_WORKER_DIR, to set the working directory of worker processes
SPARK_HISTORY_OPTS="-Dspark.history.ui.port=
SSH端口不是默认端口22:
hadoop
如果ssh端口不是默认的22,在etc/Hadoop/hadoop-env.sh里改下。如:
export HADOOP_SSH_OPTS="-p 18921"
spark
在spark-env.sh添加
export SPARK_SSH_OPTS="-p 37294"
相关链接:
http://spark.apache.org/docs/2.1.0/submitting-applications.html
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
http://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory
http://blog.csdn.net/le119126/article/details/51891656