编译Spark源码支持Hive并部署

1、Spark源码下载

Spark官网提供了预编译版本的Spark,但是要获得hive支持的Spark版本必须自己下载Spark源码进行编译加入hive支持。

编译Spark源码支持Hive并部署_第1张图片
屏幕快照 2018-07-14 08.18.38.png

编译Spark源码支持Hive并部署_第2张图片
屏幕快照 2018-07-14 08.41.22.png

笔者下载了Spark-2.3.1,用Maven-3.5.4进行编译,最后打包编译好的Spark进行集群部署。Hadoop版本为2.7.6,hive版本为1.2.2,Scala版本为2.11.8。操作系统为macOS,如果在ubuntu下编译原理一样。Spark-2.3.1 官网编译要求中提到要使用Maven构建Spark需要Maven 3.3.9或更高版本以及Java 8+。

2、Maven的安装配置

QiColindeMacBook-Air:~ Colin$ mvn -version
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00)
Maven home: /usr/local/Cellar/maven/3.5.4/libexec
Java version: 1.8.0_73, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_73.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "10.13.5", arch: "x86_64", family: "mac"

vim ~/.bash_profile配置Maven_HOME, PATH, MAVEN_OPTS,然后Source ~/.bash_profile使环境变量生效。

export MAVEN_HOME=/usr/local/Cellar/maven/3.5.4
export PATH=$PATH:$MAVEN/bin
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

3、编译Spark源码

进入Spark-2.3.1源码文件目录下的dev文件夹,执行change-scala-version.sh 2.11命令。

QiColindeMacBook-Air:~ Colin$ cd /Users/Colin/Downloads/spark-2.3.1 
QiColindeMacBook-Air:spark-2.3.1 Colin$ ./dev/change-scala-version.sh 2.11
./dev/../resource-managers/yarn/pom.xml
./dev/../resource-managers/mesos/pom.xml
./dev/../resource-managers/kubernetes/core/pom.xml
./dev/../launcher/pom.xml
./dev/../tools/pom.xml
./dev/../core/pom.xml
./dev/../hadoop-cloud/pom.xml
./dev/../assembly/pom.xml
./dev/../graphx/pom.xml
./dev/../mllib/pom.xml
./dev/../repl/pom.xml
./dev/../pom.xml
./dev/../streaming/pom.xml
./dev/../mllib-local/pom.xml
./dev/../common/network-yarn/pom.xml
./dev/../common/network-common/pom.xml
./dev/../common/network-shuffle/pom.xml
./dev/../common/tags/pom.xml
./dev/../common/unsafe/pom.xml
./dev/../common/kvstore/pom.xml
./dev/../common/sketch/pom.xml
./dev/../examples/pom.xml
./dev/../external/kafka-0-10-assembly/pom.xml
./dev/../external/kinesis-asl-assembly/pom.xml
./dev/../external/flume/pom.xml
./dev/../external/flume-sink/pom.xml
./dev/../external/kafka-0-10-sql/pom.xml
./dev/../external/kafka-0-10/pom.xml
./dev/../external/kinesis-asl/pom.xml
./dev/../external/flume-assembly/pom.xml
./dev/../external/docker-integration-tests/pom.xml
./dev/../external/spark-ganglia-lgpl/pom.xml
./dev/../external/kafka-0-8-assembly/pom.xml
./dev/../external/kafka-0-8/pom.xml
./dev/../sql/core/pom.xml
./dev/../sql/catalyst/pom.xml
./dev/../sql/hive/pom.xml
./dev/../sql/hive-thriftserver/pom.xml
./dev/../docs/_plugins/copy_api_dirs.rb
QiColindeMacBook-Air:spark-2.3.1 Colin$

执行mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests clean package命令开始编译Spark源码,参数表示生成的版本支持yarn,hadoop,hive。 最后生成源码包目录下生成jar包。Hadoop的版本一定要与集群安装的Hadoop版本对应,编译时间大概会持续半个小时,编译期间会出现很多警告,不用担心。

QiColindeMacBook-Air:spark-2.3.1 Colin$ mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6  -Phive -Phive-thriftserver -DskipTests clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] Spark Project Parent POM                                           [pom]
[INFO] Spark Project Tags                                                 [jar]
[INFO] Spark Project Sketch                                               [jar]
[INFO] Spark Project Local DB                                             [jar]
[INFO] Spark Project Networking                                           [jar]
[INFO] Spark Project Shuffle Streaming Service                            [jar]
[INFO] Spark Project Unsafe                                               [jar]
[INFO] Spark Project Launcher                                             [jar]
[INFO] Spark Project Core                                                 [jar]
[INFO] Spark Project ML Local Library                                     [jar]
[INFO] Spark Project GraphX                                               [jar]
[INFO] Spark Project Streaming                                            [jar]
[INFO] Spark Project Catalyst                                             [jar]
[INFO] Spark Project SQL                                                  [jar]
[INFO] Spark Project ML Library                                           [jar]
[INFO] Spark Project Tools                                                [jar]
[INFO] Spark Project Hive                                                 [jar]
[INFO] Spark Project REPL                                                 [jar]
[INFO] Spark Project YARN Shuffle Service                                 [jar]
[INFO] Spark Project YARN                                                 [jar]
[INFO] Spark Project Hive Thrift Server                                   [jar]
[INFO] Spark Project Assembly                                             [pom]
[INFO] Spark Integration for Kafka 0.10                                   [jar]
[INFO] Kafka 0.10 Source for Structured Streaming                         [jar]
[INFO] Spark Project Examples                                             [jar]
[INFO] Spark Integration for Kafka 0.10 Assembly                          [jar]
[INFO] 
[INFO] -----------------< org.apache.spark:spark-parent_2.11 >-----------------
[INFO] Building Spark Project Parent POM 2.3.1                           [1/26]
[INFO] --------------------------------[ pom ]---------------------------------

编译完成的会显示编译成功的结果。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM 2.3.1 ..................... SUCCESS [ 10.405 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 18.758 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 15.612 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  7.689 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 14.848 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  7.291 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 23.968 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 15.745 s]
[INFO] Spark Project Core ................................. SUCCESS [05:25 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 46.166 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:05 min]
[INFO] Spark Project Streaming ............................ SUCCESS [01:59 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [03:51 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:46 min]
[INFO] Spark Project ML Library ........................... SUCCESS [07:09 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.955 s]
[INFO] Spark Project Hive ................................. SUCCESS [05:05 min]
[INFO] Spark Project REPL ................................. SUCCESS [01:04 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 33.794 s]
[INFO] Spark Project YARN ................................. SUCCESS [02:29 min]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [02:16 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 12.661 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [01:41 min]
[INFO] Spark Project Examples ............................. SUCCESS [01:33 min]
[INFO] Spark Integration for Kafka 0.10 Assembly 2.3.1 .... SUCCESS [  9.186 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 46:42 min
[INFO] Finished at: 2018-07-14T09:51:39+08:00
[INFO] ------------------------------------------------------------------------
QiColindeMacBook-Air:spark-2.3.1 Colin$ 

4、打包部署

Maven编译成功后在源码包下找到脚本make-distribution.sh执行执行./dev/./make-distribution.sh --name 2.7.6hive --tgz -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests将编译结果打包,最终在源码包目录下生成spark-2.3.1-bin-2.7.6hive.tgz编译包,可以直接部署,整个过程也差不多半小时。

QiColindeMacBook-Air:spark-2.3.1 Colin$ ./dev/./make-distribution.sh  --name 2.7.6hive --tgz -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
+++ dirname ./dev/./make-distribution.sh
++ cd ./dev/./..
++ pwd
+ SPARK_HOME=/Users/Colin/Downloads/spark-2.3.1
+ DISTDIR=/Users/Colin/Downloads/spark-2.3.1/dist
+ MAKE_TGZ=false
+ MAKE_PIP=false
+ MAKE_R=false
+ NAME=none
+ MVN=/Users/Colin/Downloads/spark-2.3.1/build/mvn
+ ((  9  ))
+ case $1 in
+ NAME=2.7.6hive
+ shift
+ shift
+ ((  7  ))
+ case $1 in
+ MAKE_TGZ=true
+ shift
+ ((  6  ))
+ case $1 in
+ break
+ '[' -z /Library/Java/JavaVirtualMachines/jdk1.8.0_73.jdk/Contents/Home ']'
+ '[' -z /Library/Java/JavaVirtualMachines/jdk1.8.0_73.jdk/Contents/Home ']'
++ command -v git
+ '[' /usr/local/bin/git ']'
++ git rev-parse --short HEAD
++ :
+ GITREV=
+ '[' '!' -z '' ']'
+ unset GITREV
++ command -v /Users/Colin/Downloads/spark-2.3.1/build/mvn
+ '[' '!' /Users/Colin/Downloads/spark-2.3.1/build/mvn ']'
++ /Users/Colin/Downloads/spark-2.3.1/build/mvn help:evaluate -Dexpression=project.version -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
++ grep -v INFO
++ tail -n 1
+ VERSION=2.3.1
++ /Users/Colin/Downloads/spark-2.3.1/build/mvn help:evaluate -Dexpression=scala.binary.version -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
++ grep -v INFO
++ tail -n 1
+ SCALA_VERSION=2.11
++ /Users/Colin/Downloads/spark-2.3.1/build/mvn help:evaluate -Dexpression=hadoop.version -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
++ grep -v INFO
++ tail -n 1
+ SPARK_HADOOP_VERSION=2.7.6
++ /Users/Colin/Downloads/spark-2.3.1/build/mvn help:evaluate -Dexpression=project.activeProfiles -pl sql/hive -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
++ grep -v INFO
++ fgrep --count 'hive'
++ echo -n
+ SPARK_HIVE=1
+ '[' 2.7.6hive == none ']'
+ echo 'Spark version is 2.3.1'
Spark version is 2.3.1
+ '[' true == true ']'
+ echo 'Making spark-2.3.1-bin-2.7.6hive.tgz'
Making spark-2.3.1-bin-2.7.6hive.tgz
+ cd /Users/Colin/Downloads/spark-2.3.1
+ export 'MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=512m'
+ MAVEN_OPTS='-Xmx2g -XX:ReservedCodeCacheSize=512m'
+ BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@)
+ echo -e '\nBuilding with...'

Building with...
+ echo -e '$ /Users/Colin/Downloads/spark-2.3.1/build/mvn' -T 1C clean package -DskipTests -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver '-DskipTests\n'
$ /Users/Colin/Downloads/spark-2.3.1/build/mvn -T 1C clean package -DskipTests -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests

+ /Users/Colin/Downloads/spark-2.3.1/build/mvn -T 1C clean package -DskipTests -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6 -Phive -Phive-thriftserver -DskipTests
Using `mvn` from path: /usr/local/bin/mvn
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] Spark Project Parent POM                                           [pom]
[INFO] Spark Project Tags                                                 [jar]
[INFO] Spark Project Sketch                                               [jar]
[INFO] Spark Project Local DB                                             [jar]
[INFO] Spark Project Networking                                           [jar]
[INFO] Spark Project Shuffle Streaming Service                            [jar]
[INFO] Spark Project Unsafe                                               [jar]
[INFO] Spark Project Launcher                                             [jar]
[INFO] Spark Project Core                                                 [jar]
[INFO] Spark Project ML Local Library                                     [jar]
[INFO] Spark Project GraphX                                               [jar]
[INFO] Spark Project Streaming                                            [jar]
[INFO] Spark Project Catalyst                                             [jar]
[INFO] Spark Project SQL                                                  [jar]
[INFO] Spark Project ML Library                                           [jar]
[INFO] Spark Project Tools                                                [jar]
[INFO] Spark Project Hive                                                 [jar]
[INFO] Spark Project REPL                                                 [jar]
[INFO] Spark Project YARN Shuffle Service                                 [jar]
[INFO] Spark Project YARN                                                 [jar]
[INFO] Spark Project Hive Thrift Server                                   [jar]
[INFO] Spark Project Assembly                                             [pom]
[INFO] Spark Integration for Kafka 0.10                                   [jar]
[INFO] Kafka 0.10 Source for Structured Streaming                         [jar]
[INFO] Spark Project Examples                                             [jar]
[INFO] Spark Integration for Kafka 0.10 Assembly                          [jar]
[INFO] 
[INFO] Using the MultiThreadedBuilder implementation with a thread count of 4
[INFO] 
[INFO] -----------------< org.apache.spark:spark-parent_2.11 >-----------------
[INFO] Building Spark Project Parent POM 2.3.1                           [1/26]
[INFO] --------------------------------[ pom ]---------------------------------

编译打包成功。

+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/aircompressor-0.8.jar
+ name=aircompressor-0.8.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/aircompressor-0.8.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/aircompressor-0.8.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/commons-codec-1.10.jar
+ name=commons-codec-1.10.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/commons-codec-1.10.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/commons-codec-1.10.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/commons-lang-2.6.jar
+ name=commons-lang-2.6.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/commons-lang-2.6.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/commons-lang-2.6.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/kryo-shaded-3.0.3.jar
+ name=kryo-shaded-3.0.3.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/kryo-shaded-3.0.3.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/kryo-shaded-3.0.3.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/minlog-1.3.0.jar
+ name=minlog-1.3.0.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/minlog-1.3.0.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/minlog-1.3.0.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/objenesis-2.1.jar
+ name=objenesis-2.1.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/objenesis-2.1.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/objenesis-2.1.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/orc-core-1.4.4-nohive.jar
+ name=orc-core-1.4.4-nohive.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/orc-core-1.4.4-nohive.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/orc-core-1.4.4-nohive.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/orc-mapreduce-1.4.4-nohive.jar
+ name=orc-mapreduce-1.4.4-nohive.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/orc-mapreduce-1.4.4-nohive.jar ']'
+ rm /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/orc-mapreduce-1.4.4-nohive.jar
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/scopt_2.11-3.7.0.jar
+ name=scopt_2.11-3.7.0.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/scopt_2.11-3.7.0.jar ']'
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /Users/Colin/Downloads/spark-2.3.1/dist/examples/jars/spark-examples_2.11-2.3.1.jar
+ name=spark-examples_2.11-2.3.1.jar
+ '[' -f /Users/Colin/Downloads/spark-2.3.1/dist/jars/spark-examples_2.11-2.3.1.jar ']'
+ mkdir -p /Users/Colin/Downloads/spark-2.3.1/dist/examples/src/main
+ cp -r /Users/Colin/Downloads/spark-2.3.1/examples/src/main /Users/Colin/Downloads/spark-2.3.1/dist/examples/src/
+ cp /Users/Colin/Downloads/spark-2.3.1/LICENSE /Users/Colin/Downloads/spark-2.3.1/dist
+ cp -r /Users/Colin/Downloads/spark-2.3.1/licenses /Users/Colin/Downloads/spark-2.3.1/dist
+ cp /Users/Colin/Downloads/spark-2.3.1/NOTICE /Users/Colin/Downloads/spark-2.3.1/dist
+ '[' -e /Users/Colin/Downloads/spark-2.3.1/CHANGES.txt ']'
+ cp -r /Users/Colin/Downloads/spark-2.3.1/data /Users/Colin/Downloads/spark-2.3.1/dist
+ '[' false == true ']'
+ echo 'Skipping building python distribution package'
Skipping building python distribution package
+ '[' false == true ']'
+ echo 'Skipping building R source package'
Skipping building R source package
+ mkdir /Users/Colin/Downloads/spark-2.3.1/dist/conf
+ cp /Users/Colin/Downloads/spark-2.3.1/conf/docker.properties.template /Users/Colin/Downloads/spark-2.3.1/conf/fairscheduler.xml.template /Users/Colin/Downloads/spark-2.3.1/conf/log4j.properties.template /Users/Colin/Downloads/spark-2.3.1/conf/metrics.properties.template /Users/Colin/Downloads/spark-2.3.1/conf/slaves.template /Users/Colin/Downloads/spark-2.3.1/conf/spark-defaults.conf.template /Users/Colin/Downloads/spark-2.3.1/conf/spark-env.sh.template /Users/Colin/Downloads/spark-2.3.1/dist/conf
+ cp /Users/Colin/Downloads/spark-2.3.1/README.md /Users/Colin/Downloads/spark-2.3.1/dist
+ cp -r /Users/Colin/Downloads/spark-2.3.1/bin /Users/Colin/Downloads/spark-2.3.1/dist
+ cp -r /Users/Colin/Downloads/spark-2.3.1/python /Users/Colin/Downloads/spark-2.3.1/dist
+ '[' false == true ']'
+ cp -r /Users/Colin/Downloads/spark-2.3.1/sbin /Users/Colin/Downloads/spark-2.3.1/dist
+ '[' -d /Users/Colin/Downloads/spark-2.3.1/R/lib/SparkR ']'
+ '[' true == true ']'
+ TARDIR_NAME=spark-2.3.1-bin-2.7.6hive
+ TARDIR=/Users/Colin/Downloads/spark-2.3.1/spark-2.3.1-bin-2.7.6hive
+ rm -rf /Users/Colin/Downloads/spark-2.3.1/spark-2.3.1-bin-2.7.6hive
+ cp -r /Users/Colin/Downloads/spark-2.3.1/dist /Users/Colin/Downloads/spark-2.3.1/spark-2.3.1-bin-2.7.6hive
+ tar czf spark-2.3.1-bin-2.7.6hive.tgz -C /Users/Colin/Downloads/spark-2.3.1 spark-2.3.1-bin-2.7.6hive
+ rm -rf /Users/Colin/Downloads/spark-2.3.1/spark-2.3.1-bin-2.7.6hive
QiColindeMacBook-Air:spark-2.3.1 Colin$ 

编译Spark源码支持Hive并部署_第3张图片
屏幕快照 2018-07-14 10.28.43.png

用编译好的安装包安装Spark,配置环境变量后进入$SPARK_HOME/conf目录,编辑spark-env.sh文件,加入mySQL的JDBC驱动包路径。

export CLASSPATH=$CLASSPATH:/usr/local/Cellar/hive/lib
export HIVE_CONF_DIR=/usr/local/Cellar/hive/conf
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/Cellar/hive/lib/mysql-connector-java-5.1.46-bin.jar

export SPARK_DIST_CLASSPATH=$(/usr/local/Cellar/hadoop/2.7.6/bin/hadoop classpath)
export SPARK_MASTER_IP=localhost
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1G
export SPARK_WORKER_PORT=8888

将hive中的配置文件hive-site.xml拷贝到$SPARK_HOME/conf目录下。这样Spark的配置就完成了,启动Hadoop,hive,Spark。进入spark-shell,如果可以import org.apache.spark.sql.hive.HiveContext就说明该版本的Spark可以整合hive了。

scala>  import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> 

你可能感兴趣的:(编译Spark源码支持Hive并部署)