如果希望Maven编译Spark时支持Hive,需要给定-Phive -Phive-thriftserver。比如比如:mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-Phive -Phive-thriftserver -DskipTests clean package
1.1 将hive的配置文件hive-site.xml添加到spark应用的classpath中,即复制hive-site.xml到${SPARK_HOME}/conf
1.2 根据hive的配置参数hive.metastore.uris的情况,采用不同的模式:
# 如果没有配这个属性,即默认情况,spark sql通过hive配置的javax.jdo.option.xxx相关配置值直接连接metastore数据库直接获取元数据
# 如果配置了这个属性,则Spark SQL通过连接hive提供的metastore服务来获取hive表的元数据
1.3需要将连接数据库的驱动添加到spark应用的classpath中
1.4 启动hive metastore服务
/opt/app/hive --service metastore >/opt/app/hive/logs/metastore.log 2>&1 &
ThriftServer是一个JDBC/ODBC接口,用户可以通过JDBC/ODBC连接ThriftServer来访问Spark SQL数据。ThriftServer在启动的时候,会启动一个Spark SQL的应用程序。Spark SQL中使用的ThriftServer是org.apache.spark.sql.hive.thriftserver.HiveThriftServer2。
hive --service metastore >/opt/app/hive/logs/metastore.log 2>&1 &
在配置的${ hive.server2.thrift.bind.host}这个地址启动thrift server
/opt/app/spark/sbin/start-thriftserver.sh
可以启动一个hive beeline命令:
[hadoop@hadoop-all-02 ~]$ beeline
Beeline version 1.2.2 by Apache Hive
beeline> !connectjdbc:hive2://hadoop-all-02:10000
Connecting to jdbc:hive2://hadoop-all-02:10000
Enter username forjdbc:hive2://hadoop-all-02:10000: spark
Enter password forjdbc:hive2://hadoop-all-02:10000:
Connected to: Spark SQL (version 2.1.0)
Driver: Hive JDBC (version 1.2.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop-all-02:10000>
0: jdbc:hive2://hadoop-all-02:10000> SELECT *FROM hadoop.film limit 2;
+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+
| fid | fname | director | conutry | release_time | time | grade | comment_num | film_type | region |
+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+
| 1001 | 海边的曼彻斯特 | 肯尼思.洛纳根 | 美国 |2016-12 | 137 | 8.6 | 3002 | 爱情 | 欧美 |
| 1002 | 罗曼蒂克消亡史 | 程耳 | 中国 |2016-12 | 125 | 7.8 | 40001 | 爱情 | 大陆 |
+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+
2 rows selected (1.125 seconds)
object SparkSQLThriftServer extends App{
// 添加驱动
val driver = "org.apache.hive.jdbc.HiveDriver"
Class.forName(driver)
// 获取connection
val (url,username,password) = ("jdbc:hive2://hadoop-all-02:10000","hadoop","")
val connection= DriverManager.getConnection(url,username,password)
// 切换database,hive URL不支持给定数据库名称,需要我们手动通过执行SQL切换
connection.prepareStatement("usehadoop").execute();
val sql = "SELECT region,fname,director FROM film"
// 获取statement
val statement= connection.prepareStatement(sql)
// 获取结果
val res = statement.executeQuery()
while(res.next()){
println(s"${res.getString("fname")}:${res.getString("director")}:${res.getString("region")}")
}
// 关闭资源
res.close()
statement.close()
connection.close()
}