maven编译spark1.2 on hadoop-2.6.0

1、安装maven

(1)设置MAVEN_HOME

(2)将$MAVEN_HOME/bin参加PATH变量。

(3)设置maven_opts内存参数

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
若不运行,编译时必定出现如下错误,因为spark编译需要很大的内存

[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes...

[ERROR] PermGen space -> [Help 1]



[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes...

[ERROR] Java heap space -> [Help 1]
2、编译spark

(1)下载spark

http://spark.apache.org/downloads.html

(2)解压下载的文件

(3)进入根目录

修改源码:mllib\src\main\scala\org\apache\spark\mllib\optimization\Gradient.scala

[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle execution           : You have 1 Scalastyle violation(s). -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn -rf :spark-mllib_2.10

将带Our loss function的两行删除掉,否则在编译的时候报错

(4)在根目录下执行如下命令编译

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
当yarn与hadoop版本不一致时分别指定版本号
mvn -Pyarn-alpha -Phadoop-2.6 -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -DskipTests clean package
编译时间较长要耐心等待
(5) 可以跳过(4)使用./make-distribution.sh --name hadoop2.6 --tgz -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0  -DskipTests
编译加打包 


你可能感兴趣的:(spark,spark)