spark-sql部署实现与Hive交互

spark-sql部署

版本

Hadoop-2.5.0-cdh5.3.2 

Hive-0.13.1-cdh5.3.2

Spark-1.5.1

以CNSH001节点为例

spark master在CNSH001上:spark://CNSH001:7077

spark HistoryServer在CNSH001上:CNSH001:8032

spark eventLog在hdfs上:hdfs://testenv/spark/eventLog

分步指南

  

1. 拷贝$HIVE_HOME/conf/hive-site.xml, hive-log4j.properties 到 $SPARK_HOME/conf/目录

  

2.修改spark-defaults.conf

[plain]  view plain  copy
  1. spark.eventLog.enabled true  
  2. spark.eventLog.dir hdfs://testenv/spark/eventLog  
  3. spark.eventLog.compress true  
  4. spark.yarn.historyServer.address=CNSH001:8032  
  5. spark.sql.hive.metastore.version=0.13.1  
  6. spark.port.maxRetries=100  
  7.   
  8. spark.sql.hive.metastore.jars=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*  
  9. :/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*  
  10. spark.driver.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*  
  11. :/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*  
  12. spark.executor.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs  
  13. /*:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*  


 

3.修改spark-env.sh

[plain]  view plain  copy
  1. #set Hadoop path  
  2. export HDFS_YARN_LOGS_DIR=/data1/hadooplogs  
  3. export HADOOP_PREFIX=/opt/apps/hadoop  
  4. export HADOOP_HOME=$HADOOP_PREFIX  
  5. export HADOOP_COMMON_HOME=$HADOOP_HOME  
  6. export HADOOP_HDFS_HOME=$HADOOP_HOME  
  7. export HADOOP_MAPRED_HOME=$HADOOP_HOME  
  8. export HADOOP_MAPRED_PID_DIR=$HADOOP_HOME/pids  
  9. export HADOOP_YARN_HOME=$HADOOP_HOME  
  10. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  11. export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  12. export HADOOP_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs  
  13. export HADOOP_PID_DIR=$HADOOP_HOME/pids  
  14. export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR  
  15. export YARN_HOME=$HADOOP_HOME  
  16. export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  17. export YARN_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs  
  18. export YARN_PID_DIR=$HADOOP_HOME/pids  
  19. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin  
  20.   
  21. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_CONF_DIR:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*  
  22. export CLASSPATH=$HADOOP_CLASSPATH:$CLASSPATH  
[plain]  view plain  copy
  1. ### sparkSQL and hive  
  2. export HIVE_HOME=/opt/apps/hive  
  3. export SPARK_CLASSPATH=$SPARK_HOME/lib:$HIVE_HOME/lib:$HADOOP_CLASSPATH  

4.[ERROR] Terminal initialization failed; falling back to unsupported
解决问题java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

需要删除/opt/apps/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar文件,在/etc/profile中配置export HADOOP_USER_CLASSPATH_FIRST=true 并source操作

见:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started

 

5.使用spark-shell

[plain]  view plain  copy
  1. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)  
  2. sqlContext.sql("select * from test limit 2 ").collect().foreach(println)  

6.使用spark-sql启动local模式

里面可以直接输入HiveQL语句执行


7. 启动standalone集群模式的spark-sql 

[plain]  view plain  copy
  1. spark-sql --master spark://CNSH001:7077    


(该主机地址应该对应spark的ALIVE节点)


8.启动spark on yarn模式的spark-sql

[plain]  view plain  copy
  1. spark-sql --master yarn-client   
[plain]  view plain  copy
  1. 或者   
[plain]  view plain  copy
  1. spark-sql --master yarn-cluster(暂时还不支持)  

你可能感兴趣的:(spark-sql部署实现与Hive交互)