spark连接hive,使用sparksql处理hive中的数据

spark连接到hive首先要先配置3个文件,放到idea的resource目录下,如下:
core-site.xml
从集群环境中拉下来。
hdfs-site.xml
从环境中拉下来
hive-site.xml:


hive.exec.scratchdir
/user/hive/tmp


hive.metastore.warehouse.dir
/user/hive/warehouse


hive.querylog.location
/user/hive/log


hive.metastore.uris
thrift://knowyou-hdp-02:9083


    javax.jdo.option.ConnectionURL
    jdbc:mysql://knowyou-hdp-01:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false


    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver


    javax.jdo.option.ConnectionUserName
    hive


    javax.jdo.option.ConnectionPassword
    hive

pom文件配置,与集群环境一致



org.apache.spark
spark-sql_2.11
2.2.0

    
        org.apache.spark
        spark-hive_2.11
        2.2.0
        compile
    

    
    
        org.apache.spark
        spark-core_2.11
        2.2.0
    

程序测试:成功
object SparkHive {

def main(args: Array[String]): Unit = {

val spark = SparkSession
  .builder()
  .master("local[*]")
  .appName("aaa")
  .enableHiveSupport()
  .getOrCreate()
spark.sparkContext.setLogLevel(LogLevel.ERROR.toString)
val sql = "select * from default.sparkdemo"
spark.sql(sql).show()

}
}

你可能感兴趣的:(spark连接hive,使用sparksql处理hive中的数据)