Ubuntu 16.04下实现spark2.0.1的完全分布式安装

电脑三台(局域网中): 

名字 IP
Master 192.168.1.183
Slave1 192.168.1.193
Slave2 192.168.1.184

1 . 为每台机器配置一个名为spark用户,用户密码自己记住
2. 安装ssh(三台)
2.1, sudo apt-get install ssh
2.2,安装完成后,执行ssh-keygen -t rsa -P “”(一路回车即可)
2.3 ,转到.ssh文件中,执行cat id_rsa.pub >> authorized_keys,测试ssh Master看是否可以无密码登录
2.4,在Master下执行scp ~/.ssh/authorized_keys spark@Slave1:~/.ssh/来实现Master可以无密码登录Slave这些节点中。
3. 安装Java(Master)
3.1 下载java linux版本,在/home/spark下建立java文件,将文件解压到这tar -xvf jdk-8u111-linux-x64.tar.gz
3.2 配置环境变量 sudo gedit ~/.bashrc 在最下方写入

exportJAVA_HOME=/home/spark/java/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

3.3 java -version 测试是否成功
3.4 scp /home/spark/java spark@Slave1:~/java/将文件发給子节点,并照上面方法配java 环境。
4. 安装hadoop2.7(Master)
4.1 下载并解压到/home/spark/hadoop中
4.2 配置hadoop的环境变量

exportJAVA_HOME=/home/spark/java/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/spark/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

4.3 配置core-site.xml

<configuration>
       <property>
                <name>fs.defaultFSname>
                <value>hdfs://Master:8020value>
       property>
       <property>
                <name>io.file.buffer.sizename>
                <value>131072value>
        property>
       <property>
               <name>hadoop.tmp.dirname>
               <value>file:/home/spark/hadoop/hadoop-2.7.3/tmpvalue>
               <description>Abase for other temporary   directories.description>
       property>
        <property>
               <name>hadoop.proxyuser.hadoop.hostsname>
               <value>*value>
       property>
       <property>
               <name>hadoop.proxyuser.hadoop.groupsname>
               <value>*value>
       property>
configuration>

4.4配置hdfs-site.xml(需要新建dfs/name和dfs/data两个文件。

<configuration>
       <property>
                <name>dfs.namenode.secondary.http-addressname>
               <value>Master:9001value>
       property>
     <property>
             <name>dfs.namenode.name.dirname>
             <value>file:/home/spark/hadoop/hadoop-2.7.3/dfs/namevalue>
       property>
      <property>
              <name>dfs.datanode.data.dirname>
              <value>file:/home/spark/hadoop/hadoop-2.7.3/dfs/datavalue>
       property>
       <property>
               <name>dfs.replicationname>
               <value>2value>
        property>
        <property>
                 <name>dfs.webhdfs.enabledname>
                  <value>truevalue>
         property>
configuration>

4.5 配置yarn-site.xml

 <configuration>
        <property>
               <name>yarn.nodemanager.aux-servicesname>
               <value>mapreduce_shufflevalue>
        property>
        <property>                                                               
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>
               <value>org.apache.hadoop.mapred.ShuffleHandlervalue>
        property>
        <property>
               <name>yarn.resourcemanager.addressname>
               <value>Master:8032value>
       property>
       <property>
               <name>yarn.resourcemanager.scheduler.addressname>
               <value>Master:8030value>
       property>
       <property>
            <name>yarn.resourcemanager.resource-tracker.addressname>
             <value>Master:8031value>
      property>
      <property>
              <name>yarn.resourcemanager.admin.addressname>
               <value>Master:8033value>
       property>
       <property>
               <name>yarn.resourcemanager.webapp.addressname>
               <value>Master:8088value>
       property>
configuration>

4.6 配置mapred-site.xml.templet

<configuration>         
<property> 
      <name>mapreduce.framework.namename>
          <value>yarnvalue>
           property>
          <property>
                  <name>mapreduce.jobhistory.addressname>
                  <value>master:10020value>
          property>
          <property>
                <name>mapreduce.jobhistory.webapp.addressname>
                <value>master:19888value>
       property>
configuration>

执行 cp mapred-site.xml.templet mapred-site.xml
4.7 修改 slaves
删除localhost
添加Slave1和Slave2
4.8 执行scp /home/spark/hadoop spark@Slave1:~/ 主要将hadoop发给子节点
在Master 执行sudo cd $HADOOP_HOME
./bin/hadoop namenode -format
./sbin/start-all.sh
如果出错记得要学会看日志,在此hadoop集群就可以了
5. 安装scala 2.11
5.1 下载并解压到/home/spark/scala中
5.2 sudo gedit ~/.bashrc 配置环境变量

exportJAVA_HOME=/home/spark/java/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/spark/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SCALA_HOME=/home/spark/scala/scala-2.11.6
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:$PATH

执行 scala -version 可以检查
6. 安装spark 2.0.1
6.1下载并解压到/home/spark/spark中
6.2 配环境变量

export SPARK_MASTER_IP=Master
export SPARK_WORKER_MEMORY=1g
exportJAVA_HOME=/home/spark/java/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/home/spark/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SCALA_HOME=/home/spark/scala/scala-2.11.6
export SPARK_HOME=/home/spark/spark/spark-2.0.0-bin-hadoop2.7
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PAT

**
6.3 修改spark-env.sh.templet

export JAVA_HOME=/home/spark/java/jdk1.8.0_111
export HADOOP_HOME=/home/spark/hadoop/hadoop-2.7.3
export SCALA_HOME=/home/spark/scala/scala-2.11.6
export SPARK_MASTER_IP=Master
export SPARK_WORKER_MEMORY=1g
export MASTER=spark://Master:7077

执行cp spark-env.sh.templet hadoop-env.sh
6.4 修改 slaves
删除localhost
添加Slave1和Slave2
6.5 修改spark-defaults.conf.templet

spark.master                     spark://Master:7077
    spark.executor.extraJavaOptions    -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"  
    spark.eventLog.enabled    true  
    spark.eventLog.dir    hdfs://Master:9000/filename 
    spark.yarn.historyServer.address    hdfs://Master:18080  
    spark.history.fs.logDirectory    hdfs://Master:9000/filename

执行cp spark-defaults.conf.templet spark-defaults.conf
6.6 scp home/spark/spark spark@Slave1:~/ 将spark 发給子节点
6.7 (Master)执行 sudo cd $SPARK_HOME
./sbin/start-all
结果如下:Master
Ubuntu 16.04下实现spark2.0.1的完全分布式安装_第1张图片>
还有个namenode进程
Slave2中有
Ubuntu 16.04下实现spark2.0.1的完全分布式安装_第2张图片
还可以登录http://master:507000 和http://master:8080查看,到此spark安装就成功了。共学习……。

你可能感兴趣的:(Spark学习)