spark 2.2.0 编译 及遇到的一些坑 以及解决方式

前期准备工作参考 http://blog.csdn.net/wjl7813/article/details/79157148  

spark官方文档 编译参考 http://spark.apache.org/docs/2.2.0/building-spark.html   

 tar xf spark-2.2.0
 tar xf spark-2.2.0.tgz 
 cd spark-2.2.0/dev/

修改 make-distribution.sh  文件 ,  内容如下
VERSION=2.2.0
SCALA_VERSION=2.11.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

[hadoop@node1 dev]$ pwd
/home/hadoop/source/spark-2.2.0/dev
[hadoop@node1 dev]$ ./make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0  

(1) 坑一 会报如下的错误

[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.2.0: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.7.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException


编辑 pom.xml  文件 ,添加如下的内容 蓝色字体需要添加的,   默认就行了  


     cloudera
    cloudera repository
    https://repository.cloudera.com/artifactory/cloudera-repos/
   
 

有条件的话 推荐VPN(可以访问谷歌等网站)

坑二 : 有时编译的机器内存不够
注意编辑机器的内存 一般推荐4G以上的内存
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

坑三:
如果编译的是Scala版本是2.10 需要先执行如下的脚本
./dev/change-scala-version.sh 2.10

编译完成后 会打包生成一个tar包

spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz










你可能感兴趣的:(spark)