准备3台虚拟机
一、安装Linux虚拟机
如无特殊说明,以下步骤每台机器上都执行
使用ubuntukylin-14.04.2-desktop-amd64安装包,安装ubuntu系统。
1.1 为了避免权限问题,启用root用户。
参考 http://jingyan.baidu.com/article/148a1921a06bcb4d71c3b1af.html
1.2 安装vim
apt-get install vim
1.3 修改主机名
vi /etc/hostname
三台机器上分别改为:spark-master、spark-worker1、spark-worker2
1.4 修改/etc/hosts文件
192.168.255.129 spark-master 192.168.255.130 spark-worker1 192.168.255.131 spark-worker2
1.5 安装ssh
root@spark-master:~# apt-get install ssh
要使root可以使用ssh,作如下修改
vi /etc/ssh/sshd_config PermitRootLogin without-password --> 改为 PermitRootLogin yes
重启机器
1.6 配置ssh免密码登陆
root@spark-worker2:~# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: a5:ee:f8:34:be:a2:da:05:f2:20:ae:04:8f:ce:77:b3 root@spark-worker2 The key's randomart p_w_picpath is: +--[ RSA 2048]----+ | | | | | . | | o | |o o . S | |o+ + . . | |.o. . . + | |= o +.= . | |.oo.+E+o=. | +-----------------+
复制公钥到master的authorized_keys文件中
root@spark-worker2:~# ssh-copy-id spark-master
三台都执行完上述操作后,将生成的authorized_keys文件copy到worker1和worker2上
scp authorized_keys spark-worker1:/root/.ssh scp authorized_keys spark-worker2:/root/.ssh
二、安装jdk (3台机器)
2.1 下载jdk-8u60-linux-x64.tar ,并上传至服务器
root@rich:~/桌面# cd /tools/ root@rich:/tools# ls jdk-8u60-linux-x64.tar.gz VMware Tools root@rich:/tools# root@rich:/tools# root@rich:/tools# mkdir /usr/lib/java root@rich:/tools# tar -zxvf jdk-8u60-linux-x64.tar.gz -C /usr/lib/java/
2.2 添加环境变量
root@rich:/usr/lib/java# ls jdk1.8.0_60 root@rich:/usr/lib/java# vi /root/.bashrc
添加如下内容
export JAVA_HOME=/usr/lib/java/jdk1.8.0_60 export JRE_HOME=$JAVA_HOME/jre/ export CLASS_PATH=$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$JAVA_HOME/bin
查看java版本
root@rich:~# java -version java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
三、 安装hadoop
3.1 上传hadoop-2.6.0.tar.gz 至master服务器,加压至/usr/local/hadoop/目录
root@spark-master:/tools# mkdir /usr/local/hadoop root@spark-master:/tools# tar -zxvf hadoop-2.6.0.tar.gz -C /usr/local/hadoop/
3.2 配置hadoop集群
编辑 core-site.xml
root@spark-master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vi core-site.xml ## 添加如下内容fs.defaultFS hdfs://spark-master HDFS文件系统的地址
编辑 hdfs-site.xml
root@spark-master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vi hdfs-site.xml #添加如下内容dfs.namenode.name.dir /usr/local/hadoop/hadoop-2.6.0/dns/name dfs.datanode.data.dir /usr/local/hadoop/hadoop-2.6.0/dns/data
编辑mapred-site
root@spark-master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vi mapred-site.xmlmapreduce.framework.name yarn
编辑yarn-site.xml
yarn.resourcemanager.hostname spark-master
编辑hadoop-env.sh
#修改JAVA_HOME的值 export JAVA_HOME=/usr/lib/java/jdk1.8.0_60
编辑slave文件,添加从节点
root@spark-master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vi slaves spark-worker1 spark-worker2
编辑root的.bashrc文件
#添加 export HADOOP_HOME=//usr/local/hadoop/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native/ export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export PATH=$HADOOP_HOME/sbin:$PATH
3.3 将hadoop安装目录copy到worker1和worker2
root@spark-master:/usr/local# cd /usr/local/ root@spark-master:/usr/local# scp -r hadoop/ spark-worker1:/usr/local/ root@spark-master:/usr/local# scp -r hadoop/ spark-worker2:/usr/local/
四、启动集群
4.1 格式化hdfs
因为hdfs是一种文件系统,所以使用之前要对系统进行格式化
root@spark-master:/usr/local/hadoop/hadoop-2.6.0/bin# ./hdfs namenode -format
4.2 启动dfs
root@spark-master:~# start-dfs.sh 16/02/17 15:38:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [spark-master] spark-master: starting namenode, logging to //usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-spark-master.out spark-worker1: starting datanode, logging to //usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-spark-worker1.out spark-worker2: starting datanode, logging to //usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-spark-worker2.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to //usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-spark-master.out 16/02/17 15:39:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable root@spark-master:~# jps 14292 Jps 13995 NameNode 14190 SecondaryNameNode
root@spark-worker1:/usr/lib/java/jdk1.8.0_60# jps 11465 DataNode 11532 Jps
root@spark-worker2:~# jps 11282 Jps 11210 DataNode
4.3 登录web集群管理界面
http://spark-master:50070
4.4 启动yarn
root@spark-master:~# start-yarn.sh starting yarn daemons starting resourcemanager, logging to //usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-spark-master.out spark-worker1: starting nodemanager, logging to //usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-spark-worker1.out spark-worker2: starting nodemanager, logging to //usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-spark-worker2.out root@spark-master:~# jps 14601 ResourceManager 13995 NameNode 14844 Jps 14190 SecondaryNameNode
查看yarn的web管理界面
http://spark-master:8088/
五、安装scala
5.1 解压软件
root@spark-master:/tools# cd /tools/ root@spark-master:/tools# mkdir /usr/local/scala ; tar -zxvf scala-2.10.4.tar.gz -C /usr/local/scala/
5.2 配置环境变量
export SCALA_HOME=/usr/local/scala/scala-2.10.4/ export PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$PATH
六、安装spark1.6.0
6.1 上传安装包spark-1.6.0-bin-hadoop2.6.gz至master机器
解压:
root@spark-master:~# cd /tools/ root@spark-master:/tools# mkdir /usr/local/spark root@spark-master:/tools# tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz -C /usr/local/spark/
6.2 配置spark-env.sh
root@spark-master:/tools# cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6/ root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6# ls bin CHANGES.txt conf data ec2 examples lib LICENSE licenses NOTICE python R README.md RELEASE sbin root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6# cd conf/ root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# ls docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh.template root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# cp spark-env.sh.template spark-env.sh
root@spark-master:~# cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/ root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vi spark-env.sh #添加如下内容 export JAVA_HOME=/usr/lib/java/jdk1.8.0_60 export SCALA_HOME=/usr/local/scala/scala-2.10.4/ export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0 export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.6.0/etc/hadoop export SPARK_MASTER_IP=spark-master export SPARK_WORKER_MEMORY=512M export SPARK_EXECUTOR_MEMORY=512M export SPARK_DRIVE_MEMORY=512M export SPARK_WORKER_CORES=8
6.3 编辑slaves文件
root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# cp slaves.template slaves root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vi slaves ##添加所有worker节点 spark-worker1 spark-worker2
6.4 编辑.bashrc
#添加 export SPARK_HOME=/usr/local/spark/spark-1.6.0-bin-hadoop2.6/
6.5 将spark目录同步到worker机器上
root@spark-master:/usr/local# scp -r spark spark-worker1:/usr/local
root@spark-master:/usr/local# scp -r spark spark-worker2:/usr/local
6.6 启动spark
root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-all.sh starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-spark-master.out spark-worker2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark-worker2.out spark-worker1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark-worker1.out root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin#
6.7 测试
启动spark-shell
root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6# cd bin/ root@spark-master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin# ./spark-shell
运行一个简单的wordcount程序
scala> val text_file = sc.textFile(" scala> val counts = text_file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) scala> counts.saveAsTextFile("file:///tmp/wordcount")
查看运行结果
root@spark-master:~# cd /tmp/wordcount/ root@spark-master:/tmp/wordcount# ls part-00000 _SUCCESS
part-00000为结果文件,_SUCCESS 为运行状态文件