Apache Spark-2.4.2-编译与安装

文章目录

  • 下载
  • 软件环境
  • 编译与配置
    • 1. 解压Spark源码
    • 2. 修改版本号为固定版本,避免编译时脚本自动获取
    • 3. 修改pom文件
      • 在编译的过程中如果出现以下报错的处理方法
    • 4. 编译命令
  • 解压部署
  • 启动Spark

下载

  • 百度云下载:链接:https://pan.baidu.com/s/1IvKxR-dx1MgGcaxtEHUVTQ 提取码:8icm
  • 官方下载:https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz

软件环境

软件 Hadoop scala maven JDK
版本 2.6.0-cdh5.7.0 2.11.12 3.6.1 jdk1.8.0_45

编译与配置

1. 解压Spark源码

[hadoop@hadoop614 Demonstration]$ ll spark-2.4.2.tgz 
-rw-r--r--. 1 hadoop hadoop 16165557 4月  28 04:41 spark-2.4.2.tgz
[hadoop@hadoop614 Demonstration]$ tar -zxvf spark-2.4.2.tgz 
[hadoop@hadoop614 Demonstration]$ cd spark-2.4.2

2. 修改版本号为固定版本,避免编译时脚本自动获取

[hadoop@hadoop614 spark-2.4.2]$ vim dev/make-distribution.sh
**修改**
VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | fgrep --count "hive";\
    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
    # because we use "set -o pipefail"
    echo -n)

**修改为**
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

3. 修改pom文件

  • 块中添加一下内容,central部分的内容地须在第一个位置
[hadoop@hadoop614 spark-2.4.2]$ vim pom.xml 

  
 .......
  
       cloudera
       https://repository.cloudera.com/artifactory/cloudera-repos/
   
  

在编译的过程中如果出现以下报错的处理方法

[ERROR] Plugin org.codehaus.mojo:build-helper-maven-plugin:3.0.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.codehaus.mojo:build-helper-maven-plugin:jar:3.0.0: Could not transfer artifact org.codehaus.mojo:build-helper-maven-plugin:pom:3.0.0 from/to central (http://maven.aliyun.com/nexus/content/groups/public): maven.aliyun.com:80 failed to respond -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException

请在pom文件中添加一下配置


    org.codehaus.mojo
    build-helper-maven-plugin
    3.0.0

4. 编译命令

编译时间很长,我是用的时阿里云私有maven仓库,所有编译下载过程大概在40分钟左右。

[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ ./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn -Pkubernetes 

解压部署

  • 解压
[hadoop@hadoop614 spark-2.4.2]$ ll spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz 
-rw-rw-r--. 1 hadoop hadoop 231193116 4月  28 06:32 spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ tar -zxvf spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz -C ~/app
[hadoop@hadoop614 spark-2.4.2]$ cd ~/app
[hadoop@hadoop614 app]$ ls -ld spark-2.4.2-bin-2.6.0-cdh5.7.0/
drwxrwxr-x. 11 hadoop hadoop 4096 4月  28 06:31 spark-2.4.2-bin-2.6.0-cdh5.7.0/
  • 配置环境变量
[hadoop@hadoop614 app]$ vim ~/.bash_profile 

export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0
export PATH=${SPARK_HOME}/bin:$PATH

[hadoop@hadoop614 app]$ source ~/.bash_profile 

启动Spark

[hadoop@hadoop614 app]$ spark-shell 
19/04/28 06:44:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop614:4040
Spark context available as 'sc' (master = local[*], app id = local-1556405067469).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

你可能感兴趣的:(大数据学习-高级)