Spark StandAlone 部署
1. 配置host文件(所有节点)
vi /etc/hosts
192.168.xxx.xxx master
192.168.xxx.xxx slave1
192.168.xxx.xxx slave2
2. 配置ssh
- 关闭防火墙(所有节点)
service iptables stop
chkconfig iptables off
- 生成分发ssh
ssh-keygen -t rsa
#一路回车
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@slave1:/root/.ssh/
scp /root/.ssh/authorized_keys root@slave2:/root/.ssh/
- 设置ssh目录权限(所有节点)
chmod 700 /root/.ssh
chmod 600 /root/.ssh/authorized_keys
3. 安装java,scala(所有节点)
- 卸载open jdk
rpm -qa | grep java
显示如下信息(或其它):
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
tzdata-java-2013g-1.el6.noarch
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
卸载:
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
rpm -e --nodeps tzdata-java-2013g-1.el6.noarch
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
- 解压
tar -zxvf jdk-8u101-linux-x64.tar.gz -C /home/cloud
tar -zxvf scala-2.10.6.tgz -C /home/cloud
- 配置环境变量
vi /etc/profile
export JAVA_HOME=/home/cloud/jdk1.8.0_101
export SCALA_HOME=/home/cloud/scala-2.10.6
export JRE_HOME=/home/cloud/jdk1.8.0_101/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$PATH
- 使环境变量生效
source /etc/profile
- 查看是否安装成功,版本号是否与安装一致
java -version
scala -version
4. 安装spark
- 解压安装包到指定目录
tar -zxvf spark-1.4.0-bin-hadoop2.6.tgz -C /home/cloud
- 修改配置文件
cd /home/cloud/spark-1.4.0-bin-hadoop2.6/conf
# 配置从节点
cp slaves.template slaves
vi slaves
slave
# 配置默认环境
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
export JAVA_HOME=/home/cloud/jdk1.8.0_101
export SCALA_HOME=/home/cloud/scala-2.10.6
export HADOOP_HOME=/home/cloud/hadoop-2.6.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true
-Dspark.worker.cleanup.interval=864000 -Dspark.worker.cleanup.appDataTtl=864000"
export SPARK_LOCAL_HOSTNAME=`hostname`
- 分发spark
scp -r /home/cloud/spark-1.4.0-bin-hadoop2.6 root@slave1:/home/cloud/
scp -r /home/cloud/spark-1.4.0-bin-hadoop2.6 root@slave2:/home/cloud/
5. 启动spark
cd /home/cloud/spark-1.4.0-bin-hadoop2.6/sbin/
./start-all.sh
查看spark web ui界面:
localhost:8080
6. 测试spark
$ cd /home/cloud/spark-1.4.0-bin-hadoop2.6/bin
$ ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 ../lib/spark-examples-1.4.0-hadoop2.6.0.jar 4
7. 关闭spark
cd /home/cloud/spark-1.4.0-bin-hadoop2.6/sbin/
./stop-all.sh
Spark程序环境
1. 运行环境
网络环境:能连接到集群
系统配置:hosts文件中配置hbase集群,spark集群的ip到主机名映射
语言环境: jdk 1.8, scala 2.10.6(如果用scala写程序)
2. 工程导入依赖jar包
hbase jar包:hbase安装目录下的lib文件夹中
guava-12.0.1.jar
hbase-client-1.0.3.jar
hbase-common-1.0.3.jar
hbase-prefix-tree-1.0.3.jar
hbase-protocol-1.0.3.jar
hbase-server-1.0.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.1.3.jar
zookeeper-3.4.6.jar
spark jar包:spark安装目录下的lib文件夹中
spark-assembly-1.4.0-hadoop2.6.0.jar
scala sdk (如果用scala写程序,导入时选择scala安装目录)