CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署

        在Hadoop集群swarm部署的基础上,我们更进一步,把Spark也拉进来。相对来说,在Hadoop搞定的情况下,Spark就简单多了。

        一、下载Spark   CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第1张图片

         之所以把这件事还要拿出来讲……当然是因为掉过坑。我安装的时候,hadoop是3.3.5,所以spark下载这个为hadoop 3.3 预编译的版本就好——一定要把版本对准了。然而,这个版本实际是打包了hadoop环境的,spark还提供了不包含hadoop环境的预编译包,也就是所谓 user provide hadoop的版本。

        由于我之前已经安装好了hadoop,所以为了避免jar包使用可能造成的冲突,我在spark大部分的配置完成的情况下又切换到了no hadoop版本。然后就是spark HA启动时,所有的master都处于standby状态。故障现场是懒得恢复了。总之,检查后发现是pignode没有办法连接上zookeeper。首先当然排除是网络及zookeeper的问题,因为hadoop HA工作正常。经过一番折腾,发现原因在spark访问zookeeper的方式,是通过curator封装的,然后这个curator在hadoop中似乎是没有提供的-或者版本不对,总之就是会出现java找不到函数的错误——java.lang.NoSuchMethodError。

        估计是因为我选择的spark木有自带的java环境,而使用hadoop的java环境又没有spark想要的东西。总之一通折腾也没解决,最后还是换回包含hadoop环境的版本,就完美了。暂时也没发现又啥冲突的地方,就先用着吧。

        二、Spark的配置文件

        spark的配置要相对简单很多,要求不高的话,只需要配置一下全局环境变量,和spark-env.sh文件即可。

        (一)全局环境变量 

#定义SPARK_HOME环境变量,并把bin目录加到目录中,其实最好sbin也加进去的好,到处都会用到
RUN  echo -e "export SPARK_HOME=/root/spark \nexport PATH=\$PATH:\$SPARK_HOME/bin">>/root/.bashrc\
#如果不定义如下LD_LIBRARY_PATH中的Java环境,在启动spark-shell的时候会出现警告,当然心大不搭理也可以
&&  echo -e "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HADOOP_HOME/lib/native">>/root/.bashrc\

        (二)spark配置项

        在$SPARK_HOME/conf中的spark-env.conf文件中,定义如下配置项,主要就是数据存放目录,配置文件存放目录,和java类库之类的存放目录。

        当然,对于HA部署,重要的时指定zookeeper的地址。如果不用HA模式的话,指定SPARK_MASTER_HOST就行。

#!/usr/bin/env bash

# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
#指向HADOOP的配置文件所在的目录
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program

# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
#指向spark本地数据文件的存储地,因为我们使用容器,需要映射,所以把它定义出来
SPARK_LOCAL_DIRS=/sparkdata
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos

# Options read in any mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)

# Options read in any cluster manager using HDFS
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files

# Options read in YARN client/cluster mode
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
#指向YARN的配置文件所在目录,其实也就是Hadoop的配置文件目录
YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
# 该参数用来定义spark HA。只需要将zookeeper.url指向我们自己的zookeeper集群地址就好
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 -Dspark.deploy.zookeeper.dir=/spark"

# Options for launcher
# - SPARK_LAUNCHER_OPTS, to set config properties and Java options for the launcher (e.g. "-Dx=y")

# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_LOG_MAX_FILES Max log files of Spark daemons can rotate to. Default is 5.
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS

# Options for beeline
# - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli (e.g. "-Dx=y")
# - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)

#这个参数在使用no hadoop版本的spark时会用到,也就是会造成curator找不到函数的地方
#export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)

        三、Dockerfile文件

        由于所有的工作都建立在前面已经构建好的pig/hadoop:ha镜像的基础上:CENTO OS上的网络安全工具(二十一)Hadoop HA swarm容器化集群部署,所以也就没啥多说的,贴吧:        

FROM pig/hadoop:ha

#将spark安装包释放到容器中,并修改一个好用的名字
#PS:所谓改名,到后面出问题的时候感觉其实不改更好,能够更直观的看到版本号
ADD spark-3.4.0-bin-hadoop3.tgz /root
RUN mv /root/spark-3.4.0-bin-hadoop3 /root/spark

#设置Spark的全局变量
RUN echo -e "export SPARK_HOME=/root/spark \nexport PATH=\$PATH:\$SPARK_HOME/bin">>/root/.bashrc
RUN echo -e "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HADOOP_HOME/lib/native">>/root/.bashrc

#离线安装python3,离线方法就不多赘述了,把所有rpm整下来,一把yum localinstall了
COPY ./pythonrpm /root/pythonrpm
RUN yum localinstall -y /root/pythonrpm/*.rpm \
&&  rm -rf /root/pythonrpm

#拷贝初始化脚本,后来为了方便修改调试起见,改成了映射的版本
COPY ./init-spark.sh /root/.

##默认启动脚本
CMD ["/root/init-spark.sh"]

        四、初始化脚本

#! /bin/bash
# NODE_COUT和ZOOKEEPER_COUNT两个参数通过swarm config yml文件设置(只在第一个节点有用,用于判断是否所有节点都联通), 在yml文件中的endpoint_environment标识中定义。
NODECOUNT=$NODE_COUNT
TRYLOOP=50
ZOOKEEPERNODECOUNT=$ZOOKEEPER_COUNT

############################################################################################################
##   1. 使全局变量生效
############################################################################################################
source /etc/profile
source /root/.bashrc

############################################################################################################
##   2. 启动所有节点的SSH服务,其实也可以直接调用pig/hadoop:ha中的init-hadoop.sh
,不过我们都快写完了才反应过来############################################################################################################
/sbin/sshd -D &

############################################################################################################
##   3. 内部函数定义
############################################################################################################

#FUNCTION:PING 测试是否所有节点都已经上线准备好------------------------------------------------------------
#param1: node's hostname prefix
#param2: node count
#param3: how many times the manager node try connect
isAllNodesConnected(){
	PIGNODE_PRENAME=$1
	PIGNODE_COUNT=$2
	TRYLOOP_COUNT=$3
	tryloop=0
	ind=1
	#init pignode hostname array,and pignode status array
	while(( $ind <= $PIGNODE_COUNT ))
	do
		pignodes[$ind]="$PIGNODE_PRENAME$ind"
		pignodes_stat[$ind]=0
		let "ind++"
	done
	
	#check wether all the pignodes can be connected
	noactivecount=$PIGNODE_COUNT
	while(( $noactivecount > 0 ))
	do
		noactivecount=$PIGNODE_COUNT
		ind=1
		while(( $ind <= $PIGNODE_COUNT ))
		do
			if (( ${pignodes_stat[$ind]}==0 ))
			then
				ping -c 1 ${pignodes[$ind]} > /dev/null
				if (($?==0))
				then
					pignodes_stat[$ind]=1
					let "noactivecount-=1"
					echo "Try to connect ${pignodes[$ind]}:successed." >>init.log
				else
					echo "Try to connect ${pignodes[$ind]}: failed." >>init.log
				fi
			else
				let "noactivecount-=1"
			fi
			let "ind++"
		done
		if (( ${noactivecount}>0 ))
		then
			let "tryloop++"
			if (($tryloop>$TRYLOOP_COUNT))
			then
				echo "ERROR Tried ${TRYLOOP_COUNT} loops. ${noactivecount} nodes failed, exit." >>init.log
				break;
			fi
			echo "${noactivecount} left for ${PIGNODE_COUNT} nodes not connected, waiting for next try">>init.log
			sleep 5
		else
			echo "All nodes are connected.">>init.log
		fi
	done
	return $noactivecount
}
#----------------------------------------------------------------------------------------------------------

#FUNCTION:从配置文件中读取Hadoop的数据文件目录,用于判断节点是否已经格式化--------------------------------------------------------------------
getDataDirectory(){
#when use tmp data directory
#        configfiledir=`echo "${HADOOP_HOME}/etc/hadoop/core-site.xml"`
#        datadir=`cat ${configfiledir} | grep -A 2 'hadoop.tmp.dir' | grep '' | sed 's/^[[:blank:]]*//g' | sed 's/<\/value>$//g'`
#        echo $datadir

#when use namenode.name.dir direcotry
	datadir=`cat ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml|grep -A 2 "dfs.namenode.name.dir"|grep ""|sed -e "s///g"|sed -e "s/<\/value>//g"`
	echo $datadir
}
#---------------------------------------------------------------------------------------------------------

#FUNCTION:如果DFS文件系统尚未格式化,启用格式化的初始化流程.------------------------------------------------------------
initHadoop_format(){
	#首先需要启动Journalnode,提供名字服务器同步元数据的通道
	echo 'start all Journalnode' >> init.log
	journallist=`cat $HADOOP_HOME/etc/hadoop/hdfs-site.xml |grep -A 2 'dfs.namenode.shared.edits.dir'|grep ''|sed -e "s/qjournal:\/\/\(.*\)\/.*<\/value>/\1/g"|sed "s/;/ /g"|sed -e "s/:[[:digit:]]\{2,5\}/ /g"`
	for journalnode in $journallist;do
		ssh root@${journalnode} "hdfs --daemon start journalnode"
	done

	#等待15秒,之前没有等,容易导致journalnode还未打开端口就开始格式化,导致格式化失败,浪费一个周六……
	echo 'waiting 15 seconds for journal nodes started. or format will fail as journalnode can not be connected.'>>init.log
	wait 15s
	echo 'format and start namenode 1'>>init.log
	hdfs namenode -format
	if (( $?!=0 )); then
		echo 'format namenode 1 error'>>init.log
		return 1
	fi
    #启动主节点上的名字服务器(一共3个,启动第1个)
	wait 3s
	hdfs --daemon start namenode
	if (( $?!=0 )); then
		echo 'start namenode 1 error'>>init.log
		return 1
	fi

	#必须等第1个名字服务器启动了,才能将其上的数据向其它名字服务器同步
	echo 'sync and start others.'>>init.log
	wait 3s
	dosyncid=2
	while (($dosyncid<=3));do
        #依次同步2、3号名字服务器
		ssh root@$nodehostnameprefix$dosyncid "hdfs namenode -bootstrapStandby"
		if (( $?!=0 )); then
			echo 'namenode bootstrap standby error'>>init.log
			return 1
		fi
        #同步完成及时启动
		ssh root@$nodehostnameprefix$dosyncid "hdfs --daemon start namenode"
		if (( $?!=0 )); then
			echo 'other namenodes start error'>>init.log
			return 1
		fi
		let "dosyncid++"
	done
	
	wait 3s
	#格式化zookeeper上的目录,在没启动这一步时,所有的hadoop节点都是standby
	hdfs zkfc -formatZK
	return 0
}
#---------------------------------------------------------------------------------------------------------

#FUNCTION:如果DFS已经格式化,只需要启动各类服务就行-----------------------------------------------------------------
initHadoop_noformat(){
    #直接启动所有与dfs有关的服务,基于官方脚本可以启动所有节点上的hdfs服务
	echo 'name node formatted. go on to start dfs related nodes and service'>>init.log
	sbin/start-dfs.sh
	if (( $?!=0 )); then
		echo 'start dfs error'>>init.log
		return 1
	fi

    #直接启动所有与yarn有关的服务,基于官方脚本可以启动所有节点上的yarn服务
	wait 5s
	echo 'start yarn resourcemanager and node manager'>>init.log
	sbin/start-yarn.sh
	if (( $?!=0 )); then
		echo 'start yarn error'>>init.log
		return 1
	fi

    #获取history server节点hostname,远程启动history server
	wait 3s
	echo 'start mapreduce history server'>>init.log
	historyservernode=`cat $HADOOP_HOME/etc/hadoop/mapred-site.xml |grep -A 2 'mapreduce.jobhistory.address'|grep '' |sed -e "s/^.*//g"|sed -e "s/<\/value>//g"|sed -e "s/:[[:digit:]]*//g"`
	ssh root@$historyservernode "mapred --daemon start historyserver"
	if (( $?!=0 )); then
		echo 'start mapreduce history server error'>>init.log
		return 1
	fi
	return 0
}
#----------------------------------------------------------------------------------------------------------

#FUNCTION:退出初始化程序,使用挂住线程的方法,防止swarm shutdown--------------------------------------------------------------------------------
exitinit()
{
	tail -f /dev/null
}
#----------------------------------------------------------------------------------------------------------

############################################################################################################
##   节点初始化程序                                                               ##
############################################################################################################
#获取节点hostname,hostname的前缀和节点序号
#这里刚开始没有考虑好,应该直接从配置文件中获取相应角色的hostname,可以使程序更健壮些
#以后有时间再迭代吧,先这么着了
nodehostname=`hostname`
nodehostnameprefix=`echo $nodehostname|sed -e 's|[[:digit:]]\+$||g'`
nodeindex=`hostname | sed "s/${nodehostnameprefix}//g"`

#从配置文件中获取zookeeper集群hostname前缀,从yarn-site.xml,用来调用测试是否所有节点上线的函数
zookeepernameprefix=`cat ${HADOOP_HOME}/etc/hadoop/yarn-site.xml |grep -A 2 'yarn.resourcemanager.zk-address'|grep ''|sed -e "s/[[:blank:]]\+\([[:alpha:]]\+\)[[:digit:]]\+:.*/\1/g"`


#1.切换到工作目录下.
cd $HADOOP_HOME
    #如果节点总数低于3则无法启动HA模式,所以低于3的情况什么也不做,直接退出
if (($NODECOUNT<=3));then
	echo "Nodes count must more than 3.">>init.log
	exitinit
fi

#如果不是第一个节点,则等待可能的初始化过程5分钟;如果是节点错误后被swarm重新启动
#则,等待5分钟后自行调用start-dfs.sh start-yarn.sh start-master.sh start-worker.sh脚本
#依靠官方脚本,可以确保及时服务被主节点启动过,也不会因为重复启动出现错误
if (($nodeindex!=1));then
	echo $nodehostname waiting for init...>>init.log
	sleep 5m
	cd $HADOOP_HOME
	echo "try to start dfs and yarn again.">>init.log
	sbin/start-dfs.sh
	sbin/start-yarn.sh
	if (($nodeindex==3));then
		echo "try to start historyserver again">>init.log
		mapred --daemon start historyserver
	fi

    #判断是否前3个节点,如果是前3个节点,启动master,否则启动worker
	echo "try to start spark again">>init.log
	if (($nodeindex>3));then
		$SPARK_HOME/sbin/start-worker.sh
	else
		$SPARK_HOME/sbin/start-master.sh
	fi
	exitinit
fi


#2.如果是主节点,则需要考虑进行格式化初始化,及所有节点的初始化
#  事实上,可以在各子节点上采取循环启动服务的方式,隔一段时间启动依次服务,直到主节点初始化完,各节点上服务能够随之启动成功为止
#  这里实现比较复杂,完全依靠主节点进行集群初始化,造成从节点重复启动服务,不太漂亮,以后再改吧
echo $nodehostname is the init manager nodes...>>init.log
#  等待所有节点和ZOOKEEPER集群都连接上
isAllNodesConnected $nodehostnameprefix $NODECOUNT $TRYLOOP
isHadoopOK=$?
isAllNodesConnected $zookeepernameprefix $ZOOKEEPERNODECOUNT $TRYLOOP
isZookeeperOK=$?
#  如果连接失败则退出
if ([ $isHadoopOK != 0 ] || [ $isZookeeperOK != 0 ]);then
	echo "Not all the host nodes or not all the zookeeper nodes actived. exit 1">>init.log
	exitinit
fi


#3. 判断DFS是否已经格式化,通过获取DFS目录并ls目录中是否有文件的方式
datadirectory=`echo $(getDataDirectory)`
if [ $datadirectory ];then
        datadircontent=`ls -A ${datadirectory}`
        if [ -z $datadircontent ];then
        	echo "dfs is not formatted.">>init.log
		isDfsFormat=0
	else
		echo "dfs is already formatted.">>init.log
		isDfsFormat=1
        fi
else
        echo "ERROR:Can not get hadoop tmp data directory.init can not be done. ">>init.log
	exitinit
fi

#4. 如果没有格式化,则需要先格式化,和HA同步等操作
if (( $isDfsFormat == 0 ));then 
	initHadoop_format
fi
if (( $? != 0 ));then
	echo "ERROR:Init Hadoop interruptted...">>init.log
	exitinit
fi

#5. 格式化完成后启动dfs,yarn和history server
initHadoop_noformat
if (( $? != 0 ));then
	echo "ERROR:Init Hadoop interruptted...">>init.log
	exitinit
fi

echo "hadoop init work has been done. spark init start.">>init.log

#5. 启动spark,不要使用start-all.sh,因为使用了HA模式,SPARK_MASTER_HOST没有设置,spark并不知到谁是master,使用start-all.sh会将所有节点当作master启动。采用脚本仅启动前3个节点作为master;其余使用start-workers.sh脚本启动。
echo "start masters">>init.log
$SPARK_HOME/sbin/start-master.sh
masterindex=2
while (( ${masterindex} <= 3 ));do
	echo "ssh root@${nodehostnameprefix}${masterindex} '$SPARK_HOME/sbin/start-master.sh'">>init.log
	ssh root@${nodehostnameprefix}${masterindex} '$SPARK_HOME/sbin/start-master.sh'
	let "masterindex++"
done
$SPARK_HOME/sbin/start-workers.sh
if (( $? != 0 ));then
	echo "ERROR:Init spark interruptted...">>init.log
	exitinit
fi
echo "spark init work has been done.">>init.log

tail -f /dev/null

        五、docker-compose.yml文件

version: "3.7"
services:
   pignode1:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         constraints:
           - node.hostname==pighost1
     hostname: pignode1
     environment:
       - NODE_COUNT=12
       - ZOOKEEPER_COUNT=3
     networks:
       - pig
     ports:
       - target: 22
         published: 9011
         protocol: tcp
         mode: host
       - target: 9000
         published: 9000
         protocol: tcp
         mode: host
       - target: 9870
         published: 9870
         protocol: tcp
         mode: host
       - target: 8088
         published: 8088
         protocol: tcp
         mode: host
       - target: 8080
         published: 8080
         protocol: tcp
         mode: host
       - target: 4040
         published: 4040
         protocol: tcp
         mode: host
       - target: 7077
         published: 7077
         protocol: tcp
         mode: host
     volumes:
       # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/1:/hadoopdata:wr
       - /sparkdata/1:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode2:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Second Namenode限制部署在第二个节点上
         constraints:
           - node.hostname==pighost2
     networks:
       - pig
     hostname: pignode2
     ports:
       # 第二名字服务器接口
       - target: 22
         published: 9012
         protocol: tcp
         mode: host
       - target: 9890
         published: 9890
         protocol: tcp
         mode: host
       - target: 9870
         published: 9871
         protocol: tcp
         mode: host
       - target: 8088
         published: 8089
         protocol: tcp
         mode: host
       - target: 8080
         published: 8081
         protocol: tcp
         mode: host
       - target: 4040
         published: 4041
         protocol: tcp
         mode: host
       - target: 7077
         published: 7078
         protocol: tcp
         mode: host
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/2:/hadoopdata:wr
       - /sparkdata/2:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode3:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           - node.hostname==pighost3
     networks:
       - pig
     hostname: pignode3
     ports:
       - target: 22
         published: 9013
         protocol: tcp
         mode: host
       - target: 9870
         published: 9872
         protocol: tcp
         mode: host
       - target: 8088
         published: 8087
         protocol: tcp
         mode: host
       - target: 8090
         published: 8090
         protocol: tcp
         mode: host
       - target: 10020
         published: 10020
         protocol: tcp
         mode: host
       - target: 19888
         published: 19888
         protocol: tcp
         mode: host
       - target: 8080
         published: 8082
         protocol: tcp
         mode: host
       - target: 4040
         published: 4042
         protocol: tcp
         mode: host
       - target: 7077
         published: 7079
         protocol: tcp
         mode: host
     volumes:
       # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/3:/hadoopdata:wr
       - /sparkdata/3:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

#------------------------------------------------------------------------------------------------
#以下均为工作节点,可在除leader以外的主机上部署

   pignode4:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           # node.role==worker
           - node.hostname==pighost3
     networks:
       - pig
     ports:
       - target: 22
         published: 9014
         protocol: tcp
         mode: host
     hostname: pignode4
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/4:/hadoopdata:wr
       - /sparkdata/4:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode5:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost3
     networks:
       - pig
     ports:
       - target: 22
         published: 9015
         protocol: tcp
         mode: host
     hostname: pignode5
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/5:/hadoopdata:wr
       - /sparkdata/5:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode6:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost3
     networks:
       - pig
     ports:
       - target: 22
         published: 9016
         protocol: tcp
         mode: host
     hostname: pignode6
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/6:/hadoopdata:wr
       - /sparkdata/6:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r
       
   pignode7:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost4
     networks:
       - pig
     ports:
       - target: 22
         published: 9017
         protocol: tcp
         mode: host
     hostname: pignode7
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/7:/hadoopdata:wr
       - /sparkdata/7:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode8:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost4
     networks:
       - pig
     ports:
       - target: 22
         published: 9018
         protocol: tcp
         mode: host
     hostname: pignode8
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/8:/hadoopdata:wr
       - /sparkdata/8:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode9:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost4
     networks:
       - pig
     ports:
       - target: 22
         published: 9019
         protocol: tcp
         mode: host
     hostname: pignode9
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/9:/hadoopdata:wr
       - /sparkdata/9:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r
     
   pignode10:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost5
     networks:
       - pig
     ports:
       - target: 22
         published: 9020
         protocol: tcp
         mode: host
     hostname: pignode10
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/10:/hadoopdata:wr
       - /sparkdata/10:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   pignode11:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost5
     networks:
       - pig
     ports:
       - target: 22
         published: 9021
         protocol: tcp
         mode: host
     hostname: pignode11
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/11:/hadoopdata:wr
       - /sparkdata/11:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r
 
   pignode12:
     image: pig/spark
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         # 将Mapreduce限制部署在第三个节点上
         constraints:
           # node.role==manager
           - node.hostname==pighost5
     networks:
       - pig
     ports:
       - target: 22
         published: 9022
         protocol: tcp
         mode: host
     hostname: pignode12
     volumes:
      # 映射xml配置文件
       - ./config/core-site.xml:/root/hadoop/etc/hadoop/core-site.xml:r
       - ./config/hdfs-site.xml:/root/hadoop/etc/hadoop/hdfs-site.xml:r
       - ./config/yarn-site.xml:/root/hadoop/etc/hadoop/yarn-site.xml:r
       - ./config/mapred-site.xml:/root/hadoop/etc/hadoop/mapred-site.xml:r
       # 映射workers文件
       - ./config/workers:/root/hadoop/etc/hadoop/workers:r
       # 映射spark配置文件
       - ./sparkconf/spark-env.sh:/root/spark/conf/spark-env.sh:r
       - ./sparkconf/spark-defaults.conf:/root/spark/conf/spark-defaults.conf:r
       - ./sparkconf/workers:/root/spark/conf/workers:r
       # 映射数据目录
       - /hadoopdata/12:/hadoopdata:wr
       - /sparkdata/12:/sparkdata:wr
       # 映射初始化脚本
       - ./init-spark.sh:/root/init-spark.sh:r

   zookeeper1:
     image: zookeeper:latest
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         constraints:
           - node.hostname==pighost1
     networks:
       - pig
     ports:
       - target: 2181
         published: 2181
         protocol: tcp
         mode: host
     hostname: zookeeper1
     environment:
         - ZOO_MY_ID=1
         - ZOO_SERVERS=server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
     volumes:
         - /hadoopdata/zoo/1/data:/data
         - /hadoopdata/zoo/1/datalog:/datalog
         - /hadoopdata/zoo/1/logs:/logs

   zookeeper2:
     image: zookeeper:latest
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         constraints:
           - node.hostname==pighost2
     networks:
       - pig
     ports:
       - target: 2181
         published: 2182
         protocol: tcp
         mode: host
     hostname: zookeeper2
     environment:
         - ZOO_MY_ID=2
         - ZOO_SERVERS=server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
     volumes:
         - /hadoopdata/zoo/2/data:/data
         - /hadoopdata/zoo/2/datalog:/datalog
         - /hadoopdata/zoo/2/logs:/logs

   zookeeper3:
     image: zookeeper:latest
     deploy:
       endpoint_mode: dnsrr
       restart_policy:
         condition: on-failure
       placement:
         constraints:
           - node.hostname==pighost3
     networks:
       - pig
     ports:
       - target: 2181
         published: 2183
         protocol: tcp
         mode: host
     hostname: zookeeper3
     environment:
         - ZOO_MY_ID=3
         - ZOO_SERVERS=server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
     volumes:
         - /hadoopdata/zoo/3/data:/data
         - /hadoopdata/zoo/3/datalog:/datalog
         - /hadoopdata/zoo/3/logs:/logs
networks:
  pig:

        为了修改方便,把初始化脚本给映射上去了,后期可以拿掉。另外,就是数据目录千万不要嵌套映射。中间偷懒把spark目录映射到了hadoop已经映射过的目录下面。不报任何错误,只是所有被映射目录均为空,导致排查好长时间……

        六、运行

        使用swarm,一旦完全配置好了以后还是很简单的。为了表达一下开心,上图:

       (一)名字服务器 CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第2张图片

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第3张图片 

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第4张图片 

         可以看到,HA下3个NameNode,第2个名字服务器现在是活跃的

        (二)数据服务器DataNode

         9个数据节点也挺健康

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第5张图片

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第6张图片

        (三)Yarn Resource Manager HA

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第7张图片

 

        前面提到过,只有active的那个yarn会提供web UI

        (四)Job History Server      

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第8张图片

        (五)Spark Master

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第9张图片

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第10张图片

 CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第11张图片

         可以看到,Stanby的master节点,是不管理workers的。

(六)Spark交互式界面

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第12张图片

CENTO OS上的网络安全工具(二十二)Spark HA swarm容器化集群部署_第13张图片

         spark-shell和pyspark。虽然python是相当大众化了,但是scala也不错,尤其是它的lamda语法,简直感觉这个语言就是为spark而生,给用户白送的,入门门槛相当低。我已经迫不及待打算开始搓一搓了。

 

你可能感兴趣的:(Linux学习,大数据,网络安全,spark,大数据,分布式)