在spark集群详细搭建过程及遇到的问题解决(三)中,我们将讲述了hadoop的安装过程,在本文中将主要讲述spark的安装配置过程。
spark@master:~/spark$ cd hadoop spark@master:~/spark/hadoop$ cd $SPARK_HOME/conf spark@master:~/spark/spark/conf$ cp slaves.template slaves spark@master:~/spark/spark/conf$ vim slaves
添加以下内容
spark@master:~/spark/spark/conf$ cp spark-env.sh.template spark-env.sh
spark-env.sh 为Spark进程启动时需要加载的配置
改模板配置中有选项的具体说明,此处参考稍微加入了一些配置:
spark@master:~/spark/spark/conf$ vim spark-env.sh
添加以下内容
export SPARK_PID_DIR=/home/spark/spark/spark/tmp/pid export SCALA_HOME=/home/spark/spark/scala export JAVA_HOME=/home/spark/spark/jdk export HADOOP_HOME=/home/spark/spark/hadoop export SPARK_MASTER_IP=master export SPARK_MASTER_PORT=7077 export SPARK_WORKER_MERMORY=2G export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
注意更改相应的目录,并保存。
spark@master:~/spark/spark/conf$ hadoop fs -mkdir hdfs://master:9000/sparkHistoryLogs mkdir: Cannot create directory /sparkHistoryLogs. Name node is in safe mode.
发现不能创建,并提示Name node处于安全模式,因此我们先关掉安全模式
spark@master:~/spark/spark/conf$ hdfs dfsadmin -safemode leave Safe mode is OFF
重新建立
spark@master:~/spark/spark/conf$ hadoop fs -mkdir hdfs://master:9000/sparkHistoryLogs
spark@master:~/spark/spark/conf$ cp spark-defaults.conf.template spark-defaults.conf
配置spark-defaults.conf,该文件为spark提交任务时默认读取的配置文件
spark@master:~/spark/spark/conf$ vim spark-defaults.conf
添加以下内容
spark.master spark://master:7077 spark.eventLog.enabled true spark.eventLog.dir hdfs://master:9000/sparkHistoryLogs spark.eventLog.compress true spark.history.updateInterval 5 spark.history.ui.port 7777 spark.history.fs.logDirectory hdfs://master:9000/sparkHistoryLogs
将配置好的spark文件复制到worker1、worker2节点中
切换到worker1节点中,执行
spark@worker1:~/spark$ scp -r spark@master:/home/spark/spark/spark ./spark
注意复制的目录是放在spark目录下
切换到worker2节点中,执行
spark@worker2:~/spark$ scp -r spark@master:/home/spark/spark/spark ./spark
注意复制的目录是放在spark目录下
切换到master中
接着启动spark
spark@master:~/spark/spark/conf$ $SPARK_HOME/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/spark/spark/spark/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out master: starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark/spark/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-master.out worker2: starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark/spark/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-worker2.out worker1: starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark/spark/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-worker1.out
可以看到启动成功,
停止spark 使用
$SPARK_HOME/sbin/stop-all.sh
启动Spark历史任务记录:
spark@master:~/spark/spark/conf$ $SPARK_HOME/sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/spark/spark/spark/logs/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-master.out
查看Spark和Hadoop相关的所有进程
spark@master:~/spark/spark/conf$ jps -l
6711 org.apache.hadoop.hdfs.server.namenode.NameNode 18863 org.apache.spark.deploy.master.Master 7053 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 18966 org.apache.spark.deploy.worker.Worker 19122 sun.tools.jps.Jps 19070 org.apache.spark.deploy.history.HistoryServer 15529 org.apache.hadoop.hdfs.server.datanode.DataNode 7352 org.apache.hadoop.yarn.server.nodemanager.NodeManager 7222 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
至此Spark集群也已经运行成功。
Spark-shell测试Spark集群:
需要先执行
hdfs dfsadmin -safemode leave
将安全模式关闭
spark@master:~/spark/spark/conf$ $SPARK_HOME/bin/spark-shell --master spark://master:7077
可以看到启动成功
测试一下:
一些web浏览界面:
集群节点信息:http://master:8080,可以将master换成ip地址
历史任务:http://master:7777,因为没有执行任务,所以看不到
Hadoop 集群信息: http://master:50070/
图中显示安全模式已经关闭,接下来重新打开hadoop集群,则需要执行下列命令
spark@master:~/spark/spark/conf$ hdfs dfsadmin -safemode enter
至此,已经全部安装完毕。
若想停止spark集群则执行
spark@master:~/spark/spark/conf$ $SPARK_HOME/sbin/stop-all.sh
master: stopping org.apache.spark.deploy.worker.Worker worker2: stopping org.apache.spark.deploy.worker.Worker worker1: stopping org.apache.spark.deploy.worker.Worker stopping org.apache.spark.deploy.master.Master
若想停止hadoop集群则执行
spark@master:~/spark/spark/conf$ $HADOOP_HOME/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh Stopping namenodes on [master] master: no namenode to stop master: stopping datanode worker1: stopping datanode worker2: stopping datanode Stopping secondary namenodes [0.0.0.0] 0.0.0.0: no secondarynamenode to stop stopping yarn daemons no resourcemanager to stop master: no nodemanager to stop worker1: no nodemanager to stop worker2: no nodemanager to stop no proxyserver to stop
最后附一些常用检测命令:来自(http://ciscolinux.blog.51cto.com/746827/1313110)
1.查看端口是否开启
netstat -tupln | grep 9000
netstat -tupln | grep 9001
2.访问master(NameNode)和slave(JobTracker)启动是否正常http://192.168.0.202:50070和50030
3.jps查看守护进程是否运行
master显示:Job TrackerJpsSecondaryNameNod NameNode
slave显示:DataNode JpsTaskTracker
4.查看集群状态统计信息(hadoopdfsadmin -report)
master和slave输入信息:
九、常用命令
hadoop dfs -ls #列出HDFS下文件
hadoop dfs -ls in #列出HDFS下某个文档中的文件
hadoop dfs -put test.txt test #上传文件到指定目录并且重新命名,只有所有的DataNode都接收完数据才算成功
hadoop dfs -get in getin #从HDFS获取文件并且重新命名为getin,同put一样可操作文件也可操作目录
hadoop dfs -rmr out #删除HDFS上的out目录
hadoop dfs -cat in/* #查看HDFS上in目录的内容
hadoop dfsadmin -safemode leave #退出安全模式
hadoop dfsadmin -safemode enter #进入安全模式
添加一个新的节点
请按照worker1和worker2的配置步骤,即可