spark 1.0 编译支持 hadoop 2.4.0

 

 

 

 

北京时间5月30日20点多。Spark1.0.0发布。详见  http://spark.apache.org/releases/spark-release-1-0-0.html

在官方发布的版本中,默认支持的是hadoop 2.2,不是最新的hadoop 2.4.0

作为尝鲜一族,现在尝试将spark 进行编译,支持hadoop 2.4.0。

 

1、源码编译环境

 

软件

版本

地址

Centos

6.4 X64

http://mirrors.163.com/centos/6.4/isos/x86_64/CentOS-6.4-x86_64-bin-DVD1.iso

maven  

3.2.1

 http://mirrors.hust.edu.cn/apache/maven/maven-3/3.2.1/binaries/apache-maven-3.2.1-bin.tar.gz

java

1.7

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Hadoop

2.4.0

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.4.0/

Spark

1.0.0

http://mirrors.cnnic.cn/apache/spark/spark-1.0.0/spark-1.0.0.tgz

 

2、环境设置

 /etc/profile

 

#set java environment

JAVA_HOME=/opt/jdk1.7.0_55

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME CLASSPATH PATH

 

#set hadoop

export HADOOP_HOME=/opt/hadoop-2.4.0

export PATH=$PATH:$HADOOP_HOME/bin

  

#set MAVEN

export MAVEN_HOME=/opt/apache-maven-3.2.1

export PATH=${PATH}:${MAVEN_HOME}/bin

export MAVEN_CMD=$MAVEN_HOME/bin/mvn

 

#set SCALA

export SCALA_HOME=/opt/scala-2.10.4

export PATH=$PATH:$SCALA_HOME/bin

3、编译

解压源码,在根去根目录下执行以下命令

./make-distribution.sh --hadoop 2.4.0--with-yarn --tgz --with-hive

几个重要参数

--hadoop :指定Hadoop版本

--with-yarn yarn支持是必须的

--with-hive 读取hive数据也是必须的,反正我很讨厌Shark,以后开发们可以在Spark上自己封装SQL&HQL客户端,也是个不错的选择。

#     --tgz: Additionally creates spark-$VERSION-bin.tar.gz

#     --hadoop VERSION: Builds againstspecified version of Hadoop.

#     --with-yarn: Enables support forHadoop YARN.

#     --with-hive: Enable support forreading Hive tables.

#     --name: A moniker for the releasetarget. Defaults to the Hadoopverison.

 

 

 

经过漫长的等待,在源码跟目录下会生成一个tgz压缩包    

 编译成功

[WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .......................... SUCCESS [  1.505 s]
[INFO] Spark Project Core ................................ SUCCESS [01:52 min]
[INFO] Spark Project Bagel ............................... SUCCESS [ 13.727 s]
[INFO] Spark Project GraphX .............................. SUCCESS [03:42 min]
[INFO] Spark Project ML Library .......................... SUCCESS [06:31 min]
[INFO] Spark Project Streaming ........................... SUCCESS [ 47.049 s]
[INFO] Spark Project Tools ............................... SUCCESS [  7.437 s]
[INFO] Spark Project Catalyst ............................ SUCCESS [ 35.608 s]
[INFO] Spark Project SQL ................................. SUCCESS [01:08 min]
[INFO] Spark Project Hive ................................ SUCCESS [04:23 min]
[INFO] Spark Project REPL ................................ SUCCESS [ 31.167 s]
[INFO] Spark Project YARN Parent POM ..................... SUCCESS [ 34.463 s]
[INFO] Spark Project YARN Stable API ..................... SUCCESS [ 18.475 s]
[INFO] Spark Project Assembly ............................ SUCCESS [01:06 min]
[INFO] Spark Project External Twitter .................... SUCCESS [ 14.859 s]
[INFO] Spark Project External Kafka ...................... SUCCESS [01:28 min]
[INFO] Spark Project External Flume ...................... SUCCESS [ 19.153 s]
[INFO] Spark Project External ZeroMQ ..................... SUCCESS [ 24.138 s]
[INFO] Spark Project External MQTT ....................... SUCCESS [ 22.316 s]
[INFO] Spark Project Examples ............................ SUCCESS [03:14 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 27:58 min
[INFO] Finished at: 2014-06-08T13:07:47+08:00
[INFO] Final Memory: 100M/914M
[INFO] ------------------------------------------------------------------------
You have new mail in /var/spool/mail/root



查看一下 


[root@NameNode spark-1.0.0]# ll
total 183784
drwxrwxr-x 4 1000 1000      4096 Jun  8 13:00 assembly
drwxrwxr-x 4 1000 1000      4096 Jun  8 12:41 bagel
drwxrwxr-x 2 1000 1000      4096 May 26 14:47 bin
-rw-rw-r-- 1 1000 1000    281471 May 26 14:47 CHANGES.txt
drwxrwxr-x 2 1000 1000      4096 May 26 14:47 conf
drwxrwxr-x 4 1000 1000      4096 Jun  8 12:39 core
drwxrwxr-x 3 1000 1000      4096 May 26 14:47 data
drwxrwxr-x 4 1000 1000      4096 May 26 14:47 dev
drwxr-xr-x 9 root root      4096 Jun  8 13:07 dist
drwxrwxr-x 3 1000 1000      4096 May 26 14:47 docker
drwxrwxr-x 7 1000 1000      4096 May 26 14:47 docs
drwxrwxr-x 4 1000 1000      4096 May 26 14:47 ec2
drwxrwxr-x 4 1000 1000      4096 Jun  8 13:07 examples
drwxrwxr-x 7 1000 1000      4096 May 26 14:47 external
drwxrwxr-x 4 1000 1000      4096 May 26 14:47 extras
drwxrwxr-x 5 1000 1000      4096 Jun  8 12:45 graphx
drwxr-xr-x 3 root root      4096 Jun  8 12:59 lib_managed
-rw-rw-r-- 1 1000 1000     29983 May 26 14:47 LICENSE
-rwxrwxr-x 1 1000 1000      8126 May 26 14:47 make-distribution.sh
drwxrwxr-x 5 1000 1000      4096 Jun  8 12:51 mllib
-rw-rw-r-- 1 1000 1000     22559 May 26 14:47 NOTICE
-rw-rw-r-- 1 1000 1000     35121 May 26 14:47 pom.xml
drwxrwxr-x 4 1000 1000      4096 May 26 14:47 project
drwxrwxr-x 6 1000 1000      4096 Jun  8 12:08 python
-rw-rw-r-- 1 1000 1000      4221 May 26 14:47 README.md
drwxrwxr-x 4 1000 1000      4096 Jun  8 12:59 repl
drwxrwxr-x 2 1000 1000      4096 May 26 14:47 sbin
drwxrwxr-x 2 1000 1000      4096 May 26 14:47 sbt
-rw-rw-r-- 1 1000 1000      7703 May 26 14:47 scalastyle-config.xml
-rw-r--r-- 1 root root 187677812 Jun  8 13:07 spark-1.0.0-bin-2.4.0.tgz
drwxrwxr-x 5 1000 1000      4096 May 26 14:47 sql
drwxrwxr-x 4 1000 1000      4096 Jun  8 12:52 streaming
drwxr-xr-x 5 root root      4096 Jun  8 12:39 target
drwxrwxr-x 4 1000 1000      4096 Jun  8 12:52 tools
-rw-rw-r-- 1 1000 1000       805 May 26 14:47 tox.ini
drwxrwxr-x 6 1000 1000      4096 Jun  8 13:00 yarn
You have new mail in /var/spool/mail/root
[root@NameNode spark-1.0.0]# vi /etc/profile

把这个包 spark-1.0.0-bin-2.4.0.tgz 复制到你想部署的目录并解压。

特别注意:只需要把解压包copy到yarn集群中的任意一台。一个节点就够了,不需要在所有节点都部署,除非你需要多个Client节点调用spark作业。


在这里直接给出已编译好的 版本,方便使用。

http://pan.baidu.com/s/1dD9udET 


你可能感兴趣的:(Hadoop)