要学习一个框架最好的方式就是调试其源代码。
编译Spark 0.81 with hadoop2.2.0
本机环境:
1.eclipse kepler
2.maven3.1
3.scala2.9.3
4.ubuntu12.04
步骤:
1. 先从网上下载spark0.81的源代码. 下载方式:_
2. upzip v0.8.1-incubating.zip
3. export MAVEN_OPTS="-Xmx1g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" //这里-Xmx自己设置,我设置的是1G,机子比较旧。。。。推荐2G,如果jvm挂了,还是设置为1g把,慢就慢点了。
victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ export MAVEN_OPTS="-Xmx1g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
4. maven就是好用,mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn -DskipTests package
victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn -DskipTests package
5. ....最终编译成功。
[INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .......................... SUCCESS [5.742s] [INFO] Spark Project Core ................................ SUCCESS [6:55.638s] [INFO] Spark Project Bagel ............................... SUCCESS [57.687s] [INFO] Spark Project Streaming ........................... SUCCESS [1:59.625s] [INFO] Spark Project ML Library .......................... SUCCESS [1:12.154s] [INFO] Spark Project Examples ............................ SUCCESS [4:01.735s] [INFO] Spark Project Tools ............................... SUCCESS [18.163s] [INFO] Spark Project REPL ................................ SUCCESS [59.977s] [INFO] Spark Project YARN Support ........................ SUCCESS [1:24.402s] [INFO] Spark Project Assembly ............................ SUCCESS [47.046s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18:42.710s [INFO] Finished at: Fri Mar 28 00:47:06 CST 2014 [INFO] Final Memory: 64M/560M [INFO] ------------------------------------------------------------------------
victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true ./sbt/sbt assembly Getting org.scala-sbt sbt 0.12.4 ...
[info] Checking every *.class/*.jar file's SHA-1. [info] SHA-1: 040d65230771f2da5c90328a4e4ea844a489f39e [info] Packaging /home/victor/software/incubator-spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ... [info] Done packaging. [info] Done packaging. [success] Total time: 4488 s, completed Mar 28, 2014 2:18:46 AM
victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating/assembly/target/scala-2.9.3$ ll total 90504 drwxrwxr-x 3 victor victor 4096 3月 28 21:43 ./ drwxrwxr-x 9 victor victor 4096 3月 28 01:27 ../ drwxrwxr-x 3 victor victor 4096 3月 28 01:27 cache/ -rw-rw-r-- 1 victor victor 92659663 3月 28 02:06 spark-assembly-0.8.1-incubating-hadoop2.2.0.jar
victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating/examples/target/scala-2.9.3$ ll total 179004 drwxrwxr-x 5 victor victor 4096 3月 28 01:59 ./ drwxrwxr-x 8 victor victor 4096 3月 28 01:26 ../ drwxrwxr-x 3 victor victor 4096 3月 28 01:23 cache/ drwxrwxr-x 4 victor victor 4096 3月 28 00:40 classes/ -rw-rw-r-- 1 victor victor 59982904 3月 28 00:43 spark-examples_2.9.3-assembly-0.8.1-incubating.jar -rw-rw-r-- 1 victor victor 123286056 3月 28 02:18 spark-examples-assembly-0.8.1-incubating.jar drwxrwxr-x 3 victor victor 4096 3月 28 00:41 test-classes/
将以下文件夹放到一个文件夹spark_client作为客户端。conf/assembly/target/scala-2.9.3/ 只需拷贝jar包examples/target/scala-2.9.3/只需拷贝jar包spark-class文件
保证:conf目录、spark-class文件,assembly目录(内部有target目录)、examples目录(内部有target目录)要写一个脚本来运行spark程序,就用example的例子把。详见我的下一篇,运行篇---->Spark学习笔记2--计算Pi
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling:
SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
In addition, if you wish to run Spark on YARN, set SPARK_YARN to true:
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
Note that on Windows, you need to set the environment variables on separate lines, e.g., set SPARK_HADOOP_VERSION=1.2.1.
For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to build Spark and publish it locally. See Launching Spark on YARN. This is needed because Hadoop 2.2 has non backwards compatible API changes.
<原创,转载请注明出处http://blog.csdn.net/oopsoom/article/details/22345777>