基于docker搭建跨主机的spark集群并配置notebook(二)

上一篇文章制作好了docker镜像,现在在两台服务器上部署spark集群

宿主机1IP:192.168.0.21 ;宿主机2IP:192.168.0.30

(一)在宿主机安装weave,使不同宿主机的容器之间能够互相通信

参考: Docker学习笔记 — Weave实现跨主机容器互联

(二)实现宿主机1到宿主机2的ssh免密登录

参考:  CentOS下SSH无密码登录的配置

(三)在宿主机1编写启动12个容器的脚本并启动

#vim docker_start.sh
#!/bin/bash
  weave run 172.16.0.2/24 -itd --name master -p 8090:18090 -p 8998:18998 -p 23233:23233 -p 50070:50070 -p 8080:8080 -p 9000:19000 -p 8088:8088 -p 4040:4040 -p 6066:16066 -p 7077:17077 orient/spark:1.0.2 /root/run.sh
  weave run 172.16.0.3/24 -itd --name node1 orientsoft/spark:1.0 /root/run.sh
  weave run 172.16.0.4/24 -itd --name node2 orientsoft/spark:1.0 /root/run.sh
  weave run 172.16.0.5/24 -itd --name node3 orientsoft/spark:1.0 /root/run.sh
  weave run 172.16.0.6/24 -itd --name node4 orientsoft/spark:1.0 /root/run.sh
  weave run 172.16.0.7/24 -itd --name node5 orientsoft/spark:1.0 /root/run.sh
  ssh [email protected] "weave run 172.16.0.8/24 -itd --name node6 orientsoft/spark:1.0 /root/run.sh"
  ssh [email protected] "weave run 172.16.0.9/24 -itd --name node7 orientsoft/spark:1.0 /root/run.sh"
  ssh [email protected] "weave run 172.16.0.10/24 -itd --name node8 orientsoft/spark:1.0 /root/run.sh"
  ssh [email protected] "weave run 172.16.0.11/24 -itd --name node9 orientsoft/spark:1.0 /root/run.sh"
  ssh [email protected] "weave run 172.16.0.12/24 -itd --name node10 orientsoft/spark:1.0 /root/run.sh"
  ssh [email protected] "weave run 172.16.0.13/24 -itd --name node11 orientsoft/spark:1.0 /root/run.sh"

第一行的-p参数将master节点的相关端口向宿主机端口进行了映射,后面会有相关解释。

执行该脚本,12个容器将会在两台机器上开启

(四)启动spark集群

1、启动zookeeper
      在node4~node5中修改myid
#echo 2 > ~/zookeeper/tmp/myid
#echo 3 > ~/zookeeper/tmp/myid
2、启动zookeeper集群(node3-node5):
#~/zookeeper/bin/zkServer.sh start
查看是否启动
#~/zookeeper/bin/zkServer.sh status
3、在master启动所有journalnode:
#~/hadoop/sbin/hadoop-daemons.sh start journalnode
(在node3、node4、node5多了Journalnode进程)
4、格式化HDFS,在master执行
#/hadoop/bin/hdfs namenode -format
5、格式化ZK(在master的hadoop的bin目录下)
#/hadoop/bin/hdfs zkfc -formatZK
6、启动HDFS(在master上执行)
#~/hadoop/sbin/start-dfs.sh
7、#在node2上执行start-yarn.sh
#~/hadoop/sbin/start-yarn.sh
8、在master启动spark集群
#~/spark/sbin/start-all.sh 
启动成功,可以在宿主机的浏览器中访问:
HDFS:master:50070 
YARN:node2:8088 
SPARK:master:8080 

(五)实现远程监控与调试

参考文章:http://dockone.io/article/1047,这篇文章揭示了为什么要对master相关端口进行映射以及实现远程监控核调试的方法

(六)配置notebook

参考:http://www.tuicool.com/articles/nmquqi3
1、Create an ipython profile
#vim ~/.bashrc
export PYSPARK_DRIVER_PYTHON=ipython2 # As pyspark only works with python2 and not python3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_SUBMIT_ARGS="--master spark://172.17.0.2:7077  pyspark-shell"
$ ipython profile create pyspark

# Possible outputs
# [ProfileCreate] Generating default config file: u'/Users/lim/.ipython/profile_pyspark/ipython_config.py'
# [ProfileCreate] Generating default config file: u'/Users/lim/.ipython/profile_pyspark/ipython_kernel_config.py'
2. Step 2: Create a startup file for this profile
$ touch ~/.ipython/profile_pyspark/startup/00-spark-setup.py
import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))#注意替换为自己系统里的
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
3、启动ipython pyspark
$ ipython --profile=spark
可以看到启动了spark的ipython界面
4、配置notebook界面的pyspark
$ mkdir -p ~/.ipython/kernels/pyspark
$ touch ~/.ipython/kernels/pyspark/kernel.json
{
    "display_name": "PySpark (Spark 2.1.0)",
    "language": "python",
    "argv": [
        "/root/anaconda2/bin/python",
        "-m",
        "ipykernel",
        "--profile=pyspark",
        "-f",
        "{connection_file}"
    ]
}

$ jupyter notebook --ip=172.17.0.2
# ipython notebook works too
5、在局域网内访问notebook
将notebook的8888端口向宿主机映射:
$iptables -t nat -A DOCKER ! -i docker0 -p tcp -m tcp --dport 8888 -j DNAT --to-destination 172.17.0.2:8888
$iptables -t nat -A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 8888 -j  MASQUERADE
$iptables -t filter -A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 8888 -j ACCEPT









你可能感兴趣的:(spark,spark,集群,docker)