spark基本配置

spark-default.sh:

spark.master                    spark://master:7077

spark.eventLog.enabled          true

spark.eventLog.dir             集群所有机器可以访问的地方,或者所有机器都一样的目录下

# spark.serializer              org.apache.spark.serializer.KryoSerializer

spark.driver.memory             默认的driver启动内存设置

spark.cores.max                 单个application最大的核数

spark.yarn.archive              指定yarn模式 spark jar包

spark.yarn.submit.waitAppCompletion false  提交任务不等待运行完成/离线

spark.driver.extraClassPath=driver依赖路径

spark.executor.extraClassPath=executor依赖路径

spark.ui.port 4042


spark-env.sh:

HADOOP_CONF_DIR=hadoop环境目录

SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data

SPARK_EXECUTOR_INSTANCES, Number of executors to start (Default: 2)

SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).

SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)

SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)

SPARK_MASTER_HOST, to bind the master to a different IP address or hostname

SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors

SPARK_WORKER_DIR, to set the working directory of worker processes

SPARK_HISTORY_OPTS="-Dspark.history.ui.port= -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory="


SSH端口不是默认端口22:

hadoop

如果ssh端口不是默认的22,在etc/Hadoop/hadoop-env.sh里改下。如:

export HADOOP_SSH_OPTS="-p 18921"

spark

在spark-env.sh添加

export SPARK_SSH_OPTS="-p 37294"


相关链接:

http://spark.apache.org/docs/2.1.0/submitting-applications.html

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

http://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory

http://blog.csdn.net/le119126/article/details/51891656

你可能感兴趣的:(spark基本配置)