Liunx搭建Spark开发环境
1.Spark
2.Spark开发环境搭建
【1】Spark开发环境搭建需要Hadoop,Java,Scala环境的支持,由于本机已经默认安装Java 和Hadoop开发环境,这里也不再赘述
【2】搭建Scala开发环境和SBT开发环境:
(2.1)下载Scala和SBT安装包:
Scala官网: http://www.scala-lang.org/
Sbt官网: https://www.scala-sbt.org/download.html
(2.2)环境安装Scala和Sbt:
Scala安装:
利用Xftp5工具把[scala-2.12.5.tgz]上传到Liunx服务器:/usr/local/scala
利用Xshell5工具登录到服务器,并进入到/usr/local/scala目录解压Scala,输入:tar -xvf scala-2.12.5.tgz
Last login: Sat Apr 7 07:22:36 2018 from 192.168.3.4
[root@marklin ~]# cd /usr/local/scala
[root@marklin scala]# ll
total 19832
-rw-r--r--. 1 root root 20303983 Apr 7 10:10 scala-2.12.5.tgz
[root@marklin scala]# tar -xvf scala-2.12.5.tgz
配置环境变量,输入:vim /etc/profile
#Setting SCALA_HOME PATH
export SCALA_HOME=/usr/local/scala/scala-2.12.5
export PATH=${PATH}:${SCALA_HOME}/bin
输入:source /etc/profile使得环境变量生效
【3】官网下载【 http://spark.apache.org/】安装包:spark-2.3.0-bin-hadoop2.7.tgz
【4】把安装包:spark-2.3.0-bin-hadoop2.7.tgz上传到:/usr/local/spark
【5】进入到:/usr/local/spark,解压
spark-2.3.0-bin-hadoop2.7.tgz,输入:
tar -xvf spark-2.3.0-bin-hadoop2.7.tgz
[root@marklin scala]# cd /usr/local/spark
[root@marklin spark]# ll
total 220832
-rw-r--r--. 1 root root 226128401 Apr 7 10:38 spark-2.3.0-bin-hadoop2.7.tgz
[root@marklin spark]# tar -xvf spark-2.3.0-bin-hadoop2.7.tgz
【6】配置环境变量:vim /etc/profile
#Setting SPARK_HOME PATH
export SPARK_HOME=/usr/local/spark/spark-2.3.0
export PATH=${PATH}:${SPARK_HOME}/bin
输入:source /etc/profile使得环境变量生效
【7】修改配置文件:
进入到:cd /usr/local/spark/spark-2.3.0/conf,修改Spark conf目录下的slaves文件,
修改前先备份并重命名cp slaves.template slaves,将slaves文件中的localhost修改为主机名,我的是marklin.com:
[root@marklin conf]# cp slaves.template slaves
[root@marklin conf]# ll
total 40
-rw-r--r--. 1 1311767953 1876110778 996 Feb 22 14:42 docker.properties.template
-rw-r--r--. 1 1311767953 1876110778 1105 Feb 22 14:42 fairscheduler.xml.template
-rw-r--r--. 1 1311767953 1876110778 2025 Feb 22 14:42 log4j.properties.template
-rw-r--r--. 1 1311767953 1876110778 7801 Feb 22 14:42 metrics.properties.template
-rw-r--r--. 1 root root 865 Apr 7 10:54 slaves
-rw-r--r--. 1 1311767953 1876110778 865 Feb 22 14:42 slaves.template
-rw-r--r--. 1 1311767953 1876110778 1292 Feb 22 14:42 spark-defaults.conf.template
-rwxr-xr-x. 1 1311767953 1876110778 4221 Feb 22 14:42 spark-env.sh.template
[root@marklin conf]# chmod +x slaves
[root@marklin conf]# ll
total 40
-rw-r--r--. 1 1311767953 1876110778 996 Feb 22 14:42 docker.properties.template
-rw-r--r--. 1 1311767953 1876110778 1105 Feb 22 14:42 fairscheduler.xml.template
-rw-r--r--. 1 1311767953 1876110778 2025 Feb 22 14:42 log4j.properties.template
-rw-r--r--. 1 1311767953 1876110778 7801 Feb 22 14:42 metrics.properties.template
-rwxr-xr-x. 1 root root 865 Apr 7 10:54 slaves
-rw-r--r--. 1 1311767953 1876110778 865 Feb 22 14:42 slaves.template
-rw-r--r--. 1 1311767953 1876110778 1292 Feb 22 14:42 spark-defaults.conf.template
-rwxr-xr-x. 1 1311767953 1876110778 4221 Feb 22 14:42 spark-env.sh.template
[root@marklin conf]# vim slaves
[root@marklin conf]#
修改spark-env.sh文件:修改前先备份并重命名cp spark-env.sh.tempalte spark-env.sh
[root@marklin conf]# cp spark-env.sh.template spark-env.sh
[root@marklin conf]#
然后打开spark-env.sh文件,追加内容:
export JAVA_HOME=/usr/local/java/jdk1.8.0_162
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.5
export SCALA_HOME=/usr/local/scala/scala-2.12.5
export SPARK_HOME=/usr/local/spark/spark-2.3.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_LOCAL_IP=marklin.com
export SPARK_MASTER_HOST=marklin.com
export SPARK_WORKER_MEMORY=512M
export SPARK_CONF_DIR=${SPARK_HOME}/conf
export SPARK_LOG_DIR=/usr/local/spark/repository/logs
export SPARK_PID_DIR=/usr/local/spark/repository/pids
export SPARK_LIBARY_PATH=.:${JAVA_HOME}/lib:${JAVA_HOME}/jre/lib:${HADOOP_HOME}/lib/native
export SPARK_WORKER_DIR=/usr/local/spark/repository/worker
export SPARK_MASTER_PORT=8188
export SPARK_MASTER_WEBUI_PORT=8180
export SPARK_WORKER_PORT=8181
export SPARK_WORKER_WEBUI_PORT=8182
开放端口:
[root@marklin ~]# systemctl start firewalld.service
[root@marklin ~]# firewall-cmd --zone=public --add-port=8180/tcp --permanent
success
[root@marklin ~]# firewall-cmd --zone=public --add-port=8188/tcp --permanent
success
[root@marklin ~]# firewall-cmd --zone=public --add-port=8181/tcp --permanent
success
[root@marklin ~]# firewall-cmd --zone=public --add-port=8182/tcp --permanent
success
[root@marklin ~]# firewall-cmd --reload
success
[root@marklin ~]# systemctl stop firewalld.service
[root@marklin ~]#
【8】启动测试
进入到:cd /usr/local/spark/spark-2.3.0/sbin ,输入:start-master.sh
[root@marklin sbin]# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/repository/logs/ spark-root-org.apache.spark.deploy.master.Master-1-marklin.com.out
输入:http://192.168.3.4:8180/#running-app
输入:
cd bin ,输入:spark-shell
[root@marklin sbin]# cd ..
[root@marklin spark-2.3.0]# cd bin
[root@marklin bin]# spark-shell
2018-04-07 11:43:08 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://marklin.com:4040
Spark context available as 'sc' (master = local[*], app id = local-1523115824100).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_162)
Type in expressions to have them evaluated.
Type :help for more information.
scala>