Spark 分布式环境搭建

Spark 分布式环境搭建

1. scala环境搭建

1)下载scala安装包scala2.12.10.tgz安装到 /usr/scala

[root@hadoop001 scala]# tar -zxvf scala-2.12.10.tgz
[root@hadoop001 scala]# ln -s scala-2.12.10.tgz scala

2)添加Scala环境变量,在/etc/profile中添加:

export SCALA_HOME=/usr/scala/scala
export PATH=$SCALA_HOME/bin:$PATH

3)保存后刷新

[root@hadoop001 scala]:~# source /etc/profile

4)使用scala -version命令确认

[root@hadoop001 scala]# scala -version

2. Spark安装

2.1 解压

[hadoop@hadoop001 software]$ tar -zxvf spark-2.4.6-bin-2.6.0-cdh5.16.2.tgz -C ~/app/

软连接

[hadoop@hadoop001 app]$ ln -s spark-2.4.6-bin-2.6.0-cdh5.16.2/ spark

2.2 修改环境配置文件

[hadoop@hadoop001 app]$ vi /home/hadoop/.bashrc

#spark

export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$SPARK_HOME/bin

----------------------------------------local部署模式 spark-env.sh

[hadoop@hadoop001 conf]$ cp spark-env.sh.template spark-env.sh


 export JAVA_HOME=/usr/java/jdk
 export SCALA_HOME=/usr/scala/scala
 export HADOOP_HOME=/data/app/hadoop
 export HADOOP_CONF_DIR=/data/app/hadoop/etc/hadoop
 #export SPARK_MASTER_IP=192.168.1.148
 #export SPARK_MASTER_HOST=192.168.1.148
 #export SPARK_LOCAL_IP=11.24.24.112
 #export SPARK_LOCAL_IP=11.24.24.113
 #export SPARK_LOCAL_IP=0.0.0.0
 export SPARK_WORKER_MEMORY=1g
 export SPARK_WORKER_CORES=2
 export SPARK_HOME=/data/app/spark
 export SPARK_DIST_CLASSPATH=$(/data/app/hadoop/bin/hadoop classpath)

----------------------------------------on yarn部署模式 – spark-default.conf

spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs:spark/logs
spark.driver.memory 1g
spark.executor.memory 1g

#spark.shuffle.service.enabled true
#spark.shuffle.service.port 7337
#spark.dynamicAllocation.enabled true
#spark.dynamicAllocation.minExecutors 1
#spark.dynamicAllocation.maxExecutors 6
#spark.dynamicAllocation.schedulerBacklogTimeout 1s
#spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s

spark.submit.deployMode client
spark.yarn.jars hdfs:///spark/jars/*
spark.serializer org.apache.spark.serializer.KryoSerializer

将jar包传到hdfs上

hdfs dfs -put /data/app/spark/jars/*

2.3 修改slaves

[hadoop@hadoop001 conf]$ mv slaves.template slaves
[hadoop@hadoop001 conf]$ vim slaves
删除localhost
hadoop001
hadoop002
hadoop003

2.4 配置hadoop002 hadoop003 的配置文件

#spark
export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$SPARK_HOME/bin

source .bashrc

2.5 scp到hadoop002 hadoop003

[hadoop@hadoop001 ~]$ scp -r /home/hadoop/app/spark-2.4.6-bin-2.6.0-cdh5.16.2 hadoop002:/home/hadoop/app/
软连接
[hadoop@hadoop003 app]$ ln -s spark-2.4.6-bin-2.6.0-cdh5.16.2/ spark

2.6 配置hadoop002 hadoop003 spark 的配置文件

[hadoop@hadoop002 conf]$ pwd
/home/hadoop/app/spark/conf
[hadoop@hadoop002 conf]$ vim spark-env.sh
配置成他们自己的ip


export SPARK_LOCAL_IP=192.168.1.183
export SPARK_LOCAL_IP=192.168.1.175

3. Scala分发

[root@hadoop001 usr]# scp -r /usr/scala/ hadoop002:/usr/

[root@hadoop001 usr]# scp -r /usr/scala/ hadoop003:/usr/

[root@hadoop001 usr]# scp /etc/profile hadoop002:/etc/
profile                                                                                                                   100% 2016   890.7KB/s   00:00
[root@hadoop001 usr]# scp /etc/profile hadoop003:/etc/
profile       

[root@hadoop002 ~]# source /etc/profile            
[root@hadoop003 ~]# source /etc/profile  

4. 启动

[hadoop@hadoop001 spark]$ sbin/start-all.sh     

可以去hadoop001:8081 查看
也可以
spark-shell --master yarn
启动
去yarn hadoop001:7776查看

Spark IDEA 配置

官网查看spark版本与scala版本相匹配的版本

idea创建spark module 然后配置pom文件


    
        org.apache.spark
        spark-core_2.12
        2.4.5
    


    
        
        
            net.alchim31.maven
            scala-maven-plugin
            3.2.2
            
                
                    
                    
                        testCompile
                    
                
            
        
        
            org.apache.maven.plugins
            maven-assembly-plugin
            3.0.0
            
                
                    jar-with-dependencies
                
            
            
                
                    make-assembly
                    package
                    
                        single
                    
                
            
        
    

import之后下载安装scala

https://www.scala-lang.org/download/

然后在idea的setting里下载scala插件
打开Setting 里的Plugins 搜索scala 然后下载

如果提示安装不成功 选择本地安装 开启下载更快

https://plugins.jetbrains.com/plugin/1347-scala

在setting的右上角选择 设置纽 install from disk

选好与idea 想匹配的版本

然后配置scala的jdk

ctrl+shift+alt + S

打开Project structure
然后配置Global Libraries里的scala jdk

你可能感兴趣的:(spark,分布式,scala)