使用目的
原先pyspark是跑在yarn上面的,有以下缺点
1.用户使用很多python的包,并且随时需要改动,每台机器手工部署很难维护
2.原先hadoop集群的系统版本比较低,centos6.5,gcc版本也比较低4.4.7,不宜升级
3.资源不好管理
过程
做要是用spark on mesos 的docker支持,需要每台机器上面安装mesos,docker,以及一个docker私有仓库。使用系统版本cenots7.2.1511,mesos版本1.4.1
参看官网链接:mesos官网,docker官网,docker 社区版安装文档
docker 安装
每台机器都需要安装docker,按照官网方法使用yum安装
1. 安装相关依赖工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
2. 添加docker的yum源
sudo yum-config-manager --add-repohttps://download.docker.com/linux/centos/docker-ce.repo
3.安装docker社区版
sudo yum install docker-ce
4.启动docker
sudo systemctl start docker
5.使用hello-world测试
sudo docker run hello-world
建立docker私有仓库
1. 生成证书文件
因为docker默认支持https连接,所以需要生产证书。
cd /etc/docker
mkdir certs
openssl req -newkey rsa:4096 -nodes -sha256 -keyout certs/node01.key -x509 -days 365 -out certs/node01.crt
填写主机名称试,要填写对应主机名称
ZXCommon Name (eg, your name or your server's hostname) []:node01
2.registry容器使用
下载registry容器
sudo docker pull registry
运行registry容器
docker run -d -P -it -p5000:5000 --name registry_https01 -v `pwd`/certs:/etc/docker/certs/ -e REGISTRY_HTTP_TLS_CERTIFICATE=/etc/docker/certs/node01.crt -e REGISTRY_HTTP_TLS_KEY=/etc/docker/certs/node01.key registry
docker ps 一下,检查正在run即可
3.证书客户端配置
把证书文件传输到客户端
scp certs/node01.crt node02://etc/pki/ca-trust/source/anchors
客户端更新证书
cd /etc/pki/ca-trust/source/anchors
update-ca-trust
重启客户端docker
systemctl restart docker
4.上传image测试
先更新tag到对应私有仓库
docker tag centos:7.2.1511 node01:5000/centos:7.2.1511
上传到私有仓库
docker push node01:5000/centos:7.2.1511
从私有仓库下载image
docker pull node01:5000/centos:7.2.1511
mesos 安装
按照官网文档编译安装1.4.1版本
1.系统需求
最低linux kernel>=2.6.23,全部功能linux kernel>=3.10
gcc版本 4.8.1+
2.下载mesos
wget http://www.apache.org/dist/mesos/1.4.1/mesos-1.4.1.tar.gz
tar -zxf mesos-1.4.1.tar.gz
3.配置maven的repo
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache maven.repo
sudo yum install -y epel-release
4.安装subversion(mesos依赖1.8版本以上的subversion)
sudo bash -c 'cat > /etc/yum.repos.d/wandisco-svn.repo <
[WANdiscoSVN]
name=WANdisco SVN Repo 1.9
enabled=1
baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/\$basearch/
gpgcheck=1
gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
EOF'
5.升级相关
sudo yum updated systemd
6.安装开发工具
sudo yum groupinstall -y "Development Tools"
7. 安装其他mesos依赖
sudo yum install -y apache-maven python-devel python-six python-virtualenv java-1.8.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel
8. 编译
cd mesos
./bootstrap
mkdir build
../configure --prefix=PATH
make -j4
make check
make install
9. 启动mesos
启动mater
nohup bin/mesos-master.sh --ip=192.168.0.19 --work_dir=/var/lib/meoso &
启动agent(docker容器支持,以及私有仓库配置)
nohup /opt/download/mesos-1.4.1-bin/bin/mesos-agent.sh --master=192.168.0.19:5050 --work_dir=/var/lib/mesos --executor_registration_timeout=5mins --containerizers=docker,mesos --docker_config="{ \
\"auths\": { \
\"https://node01/v2/\": { \
\"auth\": \"\", \
\"email\": \"\" \
} \
} \
}"\ &
work_dir 需要提前建立目录
10.mesos webui
mater的5050端口
spark on mesos docker支持
需要先创建docker镜像,然后配置spark的docker配置即可
创建docker镜像,这个基于最新的centos7.4.1708
docker pull centos:7.4.1708
mkdir spark-docker
cd spark-docker
将spark路径,python安装程序等需要的包安装到镜像的文件拷贝spark-docker目录
编辑Dockerfile
FROM centos
MAINTAINER peng23
#install system tools
RUN yum update -y
RUN yum upgrade -y
RUN yum install -y mesa-libGL-devel mesa-libGLU-devel bzip2 mysql-devel python-devel python-six java-1.8.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel blas blas-devel lapack lapack-devel atlas atlas-devel gcc gcc-c++
RUN yum clean all
#install anaconda2.4.4.0
RUN mkdir -p /opt/download/
ADD xgboost.tar.gz /opt/download/
ADD Anaconda2-4.4.0-Linux-x86_64.tar.gz /opt/download/
RUN chmod 777 /opt/download/Anaconda2-4.4.0-Linux-x86_64.sh
RUN cd /opt/download/ && /opt/download/Anaconda2-4.4.0-Linux-x86_64.sh -p /opt/anaconda2 -b
RUN yum install -y make
RUN cd /opt/download/xgboost && make -j4
RUN /opt/anaconda2/bin/pip install theano==0.9
RUN /opt/anaconda2/bin/pip install keras==2.0.4
RUN /opt/anaconda2/bin/pip install chaid==4.0.0
RUN /opt/anaconda2/bin/pip install MySQL-python
RUN /opt/anaconda2/bin/pip install DBUtils
RUN /opt/anaconda2/bin/pip install pathos==0.2
RUN /opt/anaconda2/bin/pip uninstall -y dill
RUN /opt/anaconda2/bin/pip install dill==0.2.6
RUN /opt/anaconda2/bin/pip install pykalman==0.9.5
RUN /opt/anaconda2/bin/pip install fbprophet==0.2
RUN cd /opt/download/xgboost/python-package/ && /opt/anaconda2/bin/python /opt/download/xgboost/python-package/setup.py install
RUN yum remove -y bzip2 make
run yum clean all
RUN $BOOTSTRAP
# update boot script
COPY bootstrap.sh /etc/bootstrap.sh
RUN chown root.root /etc/bootstrap.sh
RUN chmod 700 /etc/bootstrap.sh
ENTRYPOINT ["/etc/bootstrap.sh"]
编辑bootstrap.sh
#!/bin/bash
service sshd start
bash -c 'cat >> /etc/hosts <
10.120.193.8 node8.test
10.120.193.5 node5.test
10.120.193.6 node6.test
10.120.193.7 node7.test
10.120.193.15 node15.test
10.120.193.16 node16.test
EOF
'
CMD=${1:-"exit 0"}
if [[ "$CMD" == "-d" ]];
then
service sshd stop
/usr/sbin/sshd -D -d
else
/bin/bash -c "$*"
fi
若生成docker image过大(可能大于1GB)需要手工在每台机器上面pull一下。
生成docker image
docker image --rm -t node01:5000/spark-mesos:2.2.0 ./
上传到私有仓库
docker push node01:5000/spark-mesos:2.2.0
测试image
docker run -it -p 4040:4040 -h spark-test node01:50000/spark-mesos:2.2.0 bash
可以run起来local模式的spark即可
修改spark-env.sh,配置mesoslib以及python地址
export MESOS_NATIVE_JAVA_LIBRARY=/opt/download/mesos-1.4.1-bin/src/.libs/libmesos.so
PYSPARK_PYTHON=/opt/anaconda2/bin/python
PYTHON_HOME=/opt/anaconda2
PATH=$PYTHON_HOME/bin:$PATH
修改spark-default.conf
spark.master mesos://192.168.0.19:5050
spark.driver.memory 4g
spark.executor.cores 2
spark.cores.max 10
spark.mesos.executor.docker.image node01:5000/spark-mesos:2.2.0
spark on mesos的资源配置和yarn不同。
spark on yarn要设置每个executor的cores,memory,以及executor的个数。
spark on mesos要设置总共的cores(spark.cores.max),每个executor的cores(spark.executor.cores),memory(spark.driver.memory)。