spark基础之Spark SQL和Hive的集成以及ThriftServer配置

如果希望Maven编译Spark时支持Hive,需要给定-Phive -Phive-thriftserver。比如比如:mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-Phive -Phive-thriftserver -DskipTests clean package

 

一 Spark SQL和Hive集成

1.1 将hive的配置文件hive-site.xml添加到spark应用的classpath中,即复制hive-site.xml到${SPARK_HOME}/conf

1.2 根据hive的配置参数hive.metastore.uris的情况,采用不同的模式:

# 如果没有配这个属性,即默认情况,spark sql通过hive配置的javax.jdo.option.xxx相关配置值直接连接metastore数据库直接获取元数据

# 如果配置了这个属性,则Spark SQL通过连接hive提供的metastore服务来获取hive表的元数据

1.3需要将连接数据库的驱动添加到spark应用的classpath中

1.4 启动hive metastore服务

/opt/app/hive --service metastore >/opt/app/hive/logs/metastore.log 2>&1 &

 

 

二 配置和使用ThriftServer

ThriftServer是一个JDBC/ODBC接口,用户可以通过JDBC/ODBC连接ThriftServer来访问Spark SQL数据。ThriftServer在启动的时候,会启动一个Spark SQL的应用程序。Spark SQL中使用的ThriftServer是org.apache.spark.sql.hive.thriftserver.HiveThriftServer2。

2.1 配置hive-site.xml

   

   

       hive.metastore.warehouse.dir

       hdfs://hdfs-cluster/user/hive/warehouse

   

   

   

       hive.exec.scratchdir

       hdfs://hdfs-cluster/user/hive/scratchdir

   

   

   

       hive.querylog.location

       /spark/hive-1.1.0/logs

   

       

       hive.cli.print.header

       true

   

       

       hive.cli.print.current.db

       true

   

   

   

       javax.jdo.option.ConnectionURL

       jdbc:mysql://hadoop-all-01:3306/metastore?createDatabaseIfNotExist=true&characterEncoding=utf-8

   

   

       javax.jdo.option.ConnectionDriverName

       com.mysql.jdbc.Driver

   

   

       javax.jdo.option.ConnectionUserName

       hive

   

   

       javax.jdo.option.ConnectionPassword

       hive

   

   

   

        hive.server2.thrift.min.worker.threads

       1

   

   

       hive.server2.thrift.max.worker.threads

       100

   

        hive.server2.thrift.port

       10000

   

   

       hive.server2.thrift.bind.host

       hadoop-all-02

   

   

   

        hive.metastore.uris

       thrift://hadoop-all-02:9083

   

 

 

2.2 启动hivemetastore服务

hive --service metastore >/opt/app/hive/logs/metastore.log 2>&1 &

 

2.3 启动ThriftServer

在配置的${ hive.server2.thrift.bind.host}这个地址启动thrift server

/opt/app/spark/sbin/start-thriftserver.sh

 

2.4 远程客户端连接

可以启动一个hive beeline命令:

[hadoop@hadoop-all-02 ~]$ beeline

Beeline version 1.2.2 by Apache Hive

beeline> !connectjdbc:hive2://hadoop-all-02:10000

Connecting to jdbc:hive2://hadoop-all-02:10000

Enter username forjdbc:hive2://hadoop-all-02:10000: spark

Enter password forjdbc:hive2://hadoop-all-02:10000:

Connected to: Spark SQL (version 2.1.0)

Driver: Hive JDBC (version 1.2.2)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://hadoop-all-02:10000>

0: jdbc:hive2://hadoop-all-02:10000> SELECT *FROM hadoop.film limit 2;

+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+

|  fid  | fname   | director  | conutry | release_time  | time  | grade | comment_num  | film_type  | region |

+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+

| 1001  | 海边的曼彻斯特  | 肯尼思.洛纳根   | 美国       |2016-12       | 137   | 8.6   | 3002         | 爱情         | 欧美      |

| 1002  | 罗曼蒂克消亡史  | 程耳        | 中国       |2016-12       | 125   | 7.8   | 40001        | 爱情         | 大陆      |

+-------+----------+-----------+----------+---------------+-------+--------+--------------+------------+---------+--+

2 rows selected (1.125 seconds)

 

2.5 通过jdbc连接

object SparkSQLThriftServer extends App{
    // 添加驱动
   
val driver = "org.apache.hive.jdbc.HiveDriver"
   
Class.forName(driver)

    // 获取connection
   
val (url,username,password) = ("jdbc:hive2://hadoop-all-02:10000","hadoop","")
    val connection= DriverManager.getConnection(url,username,password)

    // 切换databasehive URL不支持给定数据库名称,需要我们手动通过执行SQL切换
   
connection
.prepareStatement("usehadoop").execute();
    val sql = "SELECT region,fname,director FROM film"
   
// 获取statement
   
val statement= connection.prepareStatement(sql)

    // 获取结果
   
val res = statement.executeQuery()
    while(res.next()){
        println(s"${res.getString("fname")}:${res.getString("director")}:${res.getString("region")}")
    }
    // 关闭资源
   
res
.close()
    statement.close()
    connection.close()
}

你可能感兴趣的:(大数据/spark)