Spark3.0.1 结合CDH6.1.0 编译打包

0.下载spark代码

git clone https://github.com/apache/spark.git

cd spark

git checkout -b v3.0.1_cdh6.1.0 v3.0.1 # 新开一个分支

1.添加Cloudera maven镜像 及 Hadoop3.0 profile

在spark的pom文件中添加 CDH的maven镜像[1],并添加 Hadoop 3.0 的profile

    cloudera

    https://repository.cloudera.com/artifactory/cloudera-repos/

    Cloudera Repositories

    

      true

    

 

 

    cloudera

    Cloudera Repositories

    https://repository.cloudera.com/artifactory/cloudera-repos/

 

 

    hadoop-3.0

    

      3.0.0-cdh6.1.0

    

具体添加配置的位置可参考这个commit

https://github.com/yangrong688/spark/commit/13c322ee32daaae7d4505fa676396be5254ecddf

2.使用命令进行编译打包[2]

编译打包过程中需要下载很多jar包,可以配置一下梯子,加速下载jar包。

cd spark

./dev/make-distribution.sh --name hadoop-3.0.0-cdh6.1.0  --tgz  -Phadoop-3.0 -Pyarn -Phive-thriftserver -DskipTests

 

 

# 完成后的输出结果

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary for Spark Project Parent POM 3.0.1:

[INFO]

[INFO] Spark Project Parent POM ........................... SUCCESS [  8.397 s]

[INFO] Spark Project Tags ................................. SUCCESS [ 18.853 s]

[INFO] Spark Project Sketch ............................... SUCCESS [ 16.176 s]

[INFO] Spark Project Local DB ............................. SUCCESS [  3.727 s]

[INFO] Spark Project Networking ........................... SUCCESS [  8.747 s]

[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  2.958 s]

[INFO] Spark Project Unsafe ............................... SUCCESS [ 19.984 s]

[INFO] Spark Project Launcher ............................. SUCCESS [  4.830 s]

[INFO] Spark Project Core ................................. SUCCESS [05:56 min]

[INFO] Spark Project ML Local Library ..................... SUCCESS [01:16 min]

[INFO] Spark Project GraphX ............................... SUCCESS [01:45 min]

[INFO] Spark Project Streaming ............................ SUCCESS [03:06 min]

[INFO] Spark Project Catalyst ............................. SUCCESS [07:11 min]

[INFO] Spark Project SQL .................................. SUCCESS [10:29 min]

[INFO] Spark Project ML Library ........................... SUCCESS [07:36 min]

[INFO] Spark Project Tools ................................ SUCCESS [ 18.832 s]

[INFO] Spark Project Hive ................................. SUCCESS [04:56 min]

[INFO] Spark Project REPL ................................. SUCCESS [01:14 min]

[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 18.784 s]

[INFO] Spark Project YARN ................................. SUCCESS [02:46 min]

[INFO] Spark Project Hive Thrift Server ................... SUCCESS [02:40 min]

[INFO] Spark Project Assembly ............................. SUCCESS [  8.246 s]

[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [01:19 min]

[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:51 min]

[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [03:03 min]

[INFO] Spark Project Examples ............................. SUCCESS [02:20 min]

[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 11.549 s]

[INFO] Spark Avro ......................................... SUCCESS [02:39 min]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time:  01:02 h

[INFO] Finished at: 2020-09-28T17:11:13+08:00

[INFO] ------------------------------------------------------------------------

参考文章

[1]Using the CDH 6 Maven Repository

https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_maven_repo.html

[2] Building Spark

https://spark.apache.org/docs/latest/building-spark.html

你可能感兴趣的:(Spark3.0,spark)