Spark2.3.0 Standalone Mode

参看文档:http://spark.apache.org/docs/latest/spark-standalone.html

Spark Standalone Mode

  • Installing Spark Standalone to a Cluster
  • Starting a Cluster Manually
  • Cluster Launch Scripts
  • Connecting an Application to the Cluster
  • Launching Spark Applications
  • Resource Scheduling
  • Executors Scheduling
  • Monitoring and Logging
  • Running Alongside Hadoop
  • Configuring Ports for Network Security
  • High Availability
    • Standby Masters with ZooKeeper
    • Single-Node Recovery with Local File System

In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided launch scripts. It is also possible to run these daemons on a single machine for testing.

Spark独立模式

  • 将Spark独立安装到集群
  • 手动启动一个集群
  • 集群启动脚本
  • 将应用程序连接的集群
  • 启动Spark应用程序
  • 资源调度
  • 执行器调度
  • 监控和日志
  • 运行在Hadoop旁边
  • 配置网络安全
  • 高可用
  •     使用Zookeeper实现主备
  •     本地文件系统的单节点恢复

除了运行在Mesos或YARN集群管理器外,Spark也提供了一个简单的独立部署模式。你可以手动启动一个独立的集群,通过手动启动一个master和workers或者使用我们提供的启动脚本。也可以在一台机器上运行这些守护进程进行测试。

Installing Spark Standalone to a Cluster

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself.

为了安装Spark独立模式,你只需在集群的每个节点上放置一个编译版本的Spark。你可以在每个版本中获得预先构建的版本或者自己构建。

Starting a Cluster Manually

You can start a standalone master server by executing:

./sbin/start-master.sh

Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default.

Similarly, you can start one or more workers and connect them to the master via:

./sbin/start-slave.sh 

Once you have started a worker, look at the master’s web UI (http://localhost:8080 by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).

Finally, the following configuration options can be passed to the master and worker:

Argument Meaning
-h HOST--host HOST Hostname to listen on
-i HOST--ip HOST Hostname to listen on (deprecated, use -h or --host)
-p PORT--port PORT Port for service to listen on (default: 7077 for master, random for worker)
--webui-port PORT Port for web UI (default: 8080 for master, 8081 for worker)
-c CORES--cores CORES Total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker
-m MEM--memory MEM Total amount of memory to allow Spark applications to use on the machine, in a format like 1000M or 2G (default: your machine's total RAM minus 1 GB); only on worker
-d DIR--work-dir DIR Directory to use for scratch space and job output logs (default: SPARK_HOME/work); only on worker
--properties-file FILE Path to a custom Spark properties file to load (default: conf/spark-defaults.conf)

你可以启动一个独立的master,通过执行:

./sbin/start-master.sh







你可能感兴趣的:(Spark)