【Spark】Apache 及 CDH Spark 源码编译

1、Apache Spark 源码编译

软件版本:
JDK:1.7.0_67
Scala:2.10.4
Hadoop:2.5.0
Spark:1.6.1
Maven:3.3.3
Zinc:0.3.5.3

(1)搭建Maven环境
1)解压Maven安装包
maven安装包下载地址:http://archive.apache.org/dist/maven/maven-3/3.3.3/binaries/
cd /opt/softres/
softwares]$ tar -zxf apache-maven-3.3.3-bin.tar.gz -C /opt/modules/
2)修改配置文件
apache-maven-3.3.3]$ cd conf
conf]$ vim settings.xml


    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

3)配置MAVEN_HOME环境变量
$ vim /etc/profile

# MAVEN_HOME
export MAVEN_HOME=/opt/modules/apache-maven-3.3.3
export PATH=$PATH:$MAVEN_HOME/bin

(2)搭建Spark环境
1)解压spark源码包
spark源码包下载地址:http://archive.apache.org/dist/spark/spark-1.6.1/
softwares]$ tar -zxf spark-1.6.1.tgz -C /opt/modules/
2)修改配置文件/opt/modules/spark-1.6.1/make-distribution.sh

# Figure out where the Spark framework is installed
SPARK_HOME=/opt/modules/spark-1.6.1
DISTDIR="$SPARK_HOME/dist"

VERSION=1.6.1
SCALA_VERSION=2.10.4
SPARK_HADOOP_VERSION=2.5.0
SPARK_HIVE=1

3)修改配置文件/opt/modules/spark-1.6.1/pom.xml中的hadoop.version和scala.version

  
    UTF-8
    UTF-8
    com.typesafe.akka
    2.3.11
    1.7
    3.3.3
    spark
    0.21.1
    shaded-protobuf
    1.7.10
    1.2.17
    2.5.0
    2.5.0
    ${hadoop.version}
    0.98.7-hadoop2
    hbase
    1.6.0
    3.4.5
    2.4.0
    org.spark-project.hive
    
        1.2.1.spark
    
    1.2.1
    10.10.1.1
    1.7.0
    1.6.0
    1.2.4
    8.1.14.v20131031
    3.0.0.v201112011016
    0.5.0
    2.4.0
    2.0.8
    3.1.2
    1.7.7
    hadoop2
    0.7.1
    1.4.0
    
    0.10.1
    
    4.3.2
        
    3.1
    3.4.1
    
    3.2.2
    2.10.4
    2.10
    ${scala.version}
    org.scala-lang
    1.9.13
    2.4.4
    1.1.2
    1.1.2
    1.2.0-incubating
    1.10
    
    2.6
    
    3.3.2
    3.2.10
    2.7.8
    1.9
    2.9
        3.5.2
    1.3.9
    0.9.2

    ${java.home}
    

4)将编译用到的scala-2.10.4.zip和zinc-0.3.5.3.tgz文件解压到/opt/modules/spark-1.6.1/build目录下(zinc-0.3.5.3.tgz文件下载地址:http://downloads.typesafe.com/zinc/0.3.5.3/zinc-0.3.5.3.tgz)。
build]$ unzip /opt/softwares/scala-2.10.4.zip
build]$ tar -zxf /opt/softwares/zinc-0.3.5.3.tgz
(3)编译Spark
spark-1.6.1]$ ./make-distribution.sh --tgz -Phadoop-2.4 -Dhadoop.version=2.5.0 -Pyarn -Phive -Phive-thriftserver
(4)编译过程及编译成功截图如下
【Spark】Apache 及 CDH Spark 源码编译_第1张图片
【Spark】Apache 及 CDH Spark 源码编译_第2张图片
Apache Hadoop-2.5.0-Spark-1.6.1编译成功。

2、Apache Spark 源码编译过程中报错总结

【报错1】

Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn
[ERROR] Error executing Maven.
[ERROR] 1 problem was encountered while building the effective settings
[FATAL] Non-parseable settings /opt/modules/apache-maven-3.3.3/conf/settings.xml: Duplicated tag: 'mirrors' (position: START_TAG seen ...\n\n  ... @161:12)  @ /opt/mules/apache-maven-3.3.3/conf/settings.xml, line 161, column 12

【解决方案】报错标签重复,将 /opt/modules/apache-maven-3.3.3/conf/settings.xml文件中的已有部分去掉,保留我配置的aliyun镜像即可。
原settings.xml文件:


    
  

  
    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

改后的settings.xml文件:

  
    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

3、CDH Spark 源码编译

软件版本:
JDK:1.7.0_67
Scala:2.10.4
Hadoop:2.5.0-cdh5.3.6
Spark:1.6.1
Maven:3.3.3
Zinc:0.3.5.3

(1)备份MAVEN环境
cd /home/beifeng
~]$ mkdir m2-apache-apark-backup
~]$ cp -r ./.m2/* m2-apache-apark-backup/
cd /home/beifeng/.m2
.m2]$ rm -rf ./*
(2)搭建Spark环境
1)解压spark源码包
softwares]$ tar -zxf spark-1.6.1.tgz
softwares]$ mv spark-1.6.1 /opt/modules/spark-1.6.1-cdh5.3.6
2)修改配置文件/opt/modules/spark-1.6.1/pom.xml中的组件为cdh-5.3.6版本

  
    UTF-8
    UTF-8
    com.typesafe.akka
    2.3.11
    1.7
    3.3.3
    spark
    0.21.1
    shaded-protobuf
    1.7.10
    1.2.17
    2.5.0-cdh5.3.6
    2.5.0
    ${hadoop.version}
        0.98.7-hadoop2
    hbase
    1.6.0-cdh5.3.6
    3.4.5-cdh5.3.6
    2.4.0
    org.spark-project.hive
    
    1.2.1.spark
    
    1.2.1
    10.10.1.1
    1.7.0
    1.6.0
    1.2.4
    8.1.14.v20131031
    3.0.0.v201112011016
    0.5.0
    2.4.0
    2.0.8
    3.1.2
    1.7.7
    hadoop2
    0.7.1
    1.4.0
        
    0.10.1
    
    4.3.2
    
    3.1
    3.4.1
    
    3.2.2
    2.10.4
    2.10
    ${scala.version}
    org.scala-lang
    1.9.13
    2.4.4
    1.1.2
    1.1.2
    1.2.0-incubating
    1.10
    
    2.6
    
    3.3.2
        3.2.10
    2.7.8
    1.9
    2.9
    3.5.2
    1.3.9
    0.9.2

    ${java.home}
    

3)修改配置文件/opt/modules/spark-1.6.1/make-distribution.sh

VERSION=1.6.1
SCALA_VERSION=2.10.4
SPARK_HADOOP_VERSION=2.5.0-cdh5.3.6
SPARK_HIVE=1

4)修改配置文件/opt/modules/apache-maven-3.3.3/conf/settings.xml

  
    
      cloudera-repo
      central
      Cloudera Repository
      https://repository.cloudera.com/artifactory/cloudera-repos
      
          true
      
      
          false
      
    
    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

5)将scala和zinc的安装包解压到/opt/modules/spark-1.6.1-cdh5.3.6/build/目录下
build]$ unzip /opt/softwares/scala-2.10.4.zip -d /opt/modules/spark-1.6.1-cdh5.3.6/build/
build]$ tar -zxf /opt/softwares/zinc-0.3.5.3.tgz -C .
(3)编译Spark
spark-1.6.1-cdh5.3.6]$ ./make-distribution.sh --tgz -Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.3.6 -Pyarn -Phive -Phive-thriftserver
(4)编译失败,报错未能解决

4、CDH Spark 源码编译过程中报错总结

【报错1】

Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn
[ERROR] Error executing Maven.
[ERROR] 1 problem was encountered while building the effective settings
[FATAL] Non-parseable settings /opt/modules/apache-maven-3.3.3/conf/settings.xml: end tag name  must match start tag name  from line 154 (position: TEXT seen ...\n  ... @164:13)  @ /opt/modules/apache-maven-3.3.3/conf/settings.xml, line 164, column 13

【解决方案】

    
      cloudera-repo
      Cloudera Repository
      https://repository.cloudera.com/artifactory/cloudera-repos
      
          true
      
      
          false
      
    

【报错2】

Using `mvn` from path: /opt/modules/apache-maven-3.3.3/bin/mvn
[ERROR] Error executing Maven.
[ERROR] 2 problems were encountered while building the effective settings
[WARNING] Unrecognised tag: 'releases' (position: START_TAG seen ...\n      ... @158:17)  @ /opt/modules/apache-maven-3.3.3/conf/settings.xml, line 158, column 17
[ERROR] 'mirrors.mirror.mirrorOf' for cloudera-repo is missing @ /opt/modules/apache-maven-3.3.3/conf/settings.xml

【解决方案】

  
    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
    
      cloudera-repo
      central
      Cloudera Repository
      https://repository.cloudera.com/artifactory/cloudera-repos
      
          true
      
      
          false
      
    
  

【报错3】

[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project spark-launcher_2.10: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.10:jar:1.6.1: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.5.0-cdh5.3.6 in aliyun (http://maven.aliyun.com/nexus/content/groups/public/) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-launcher_2.10

【解决方案】

  
    
      cloudera-repo
      central
      Cloudera Repository
      https://repository.cloudera.com/artifactory/cloudera-repos
      
          true
      
      
          false
      
    
    
      aliyun
      central
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

【报错4】

[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-remote-resources-plugin:1.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-remote-resources-plugin:jar:1.5: Could not transfer artifact org.apache.maven.plugins:maven-remote-resources-plugin:pom:1.5 from/to cloudera-repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connection during handshake: SSL peer shut down incorrectly -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException

【解决方案】

  
    
      cloudera-repo
      central
      Cloudera Repository
      https://repository.cloudera.com/artifactory/cloudera-repos
    
    
      aliyun
      *
      aliyun repository
      http://maven.aliyun.com/nexus/content/groups/public/
    
  

该报错依然未能解决,Hadoop-2.5.0-CDH-5.3.6-Spark-1.6.1编译失败。

你可能感兴趣的:(BigData,Components,BigData)