Spark2.2(五)SparkSQL读写Hive

IDEA中使用SparkSQL读写Hive

添加依赖

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.2.0",
                            "org.apache.spark" %% "spark-sql" % "2.2.0",
                            "org.apache.spark" %% "spark-sql" % "2.2.0",
                            "com.databricks" %% "spark-csv" % "1.5.0",
                            "org.apache.spark" %% "spark-hive" % "2.2.0",
                            "mysql" % "mysql-connector-java" % "8.0.15")

复制hive-site.xml

复制hive-site.xml文件放在IDEA工程下的resources目录下,记录了元数据的地址,如果不添加这个配置,SparkSQL将走默认的配置路线,在项目本地创建一个spark-warehouse文件夹来放置我们的元数据

代码实例

package doc

import java.io.File

import org.apache.spark.sql.{SaveMode, SparkSession}

/**
  * @Author huangwei
  * @Date 19-9-20 
  * @Comments  Spark SQL 读写Hive
  **/
object HiveTable {

  def main(args: Array[String]): Unit = {
    val warehouseLocation = new File("spark-warehouse").getAbsolutePath
    val spark = SparkSession
      .builder()
      .appName("Spark Hive")
      .master("local")
      .config("spark.sql.warehouse.dir",warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

    import spark.sql

    sql("SHOW DATABASES").show()
//    +------------+
//    |databaseName|
//    +------------+
//    |     default|
//    |        test|
//    +------------+
    // 选择数据库
    sql("USE test")
    // 执行查询
    sql("SELECT * FROM score").show()
//    +----------+------------+-------+----+-------+
//    |student_id|student_name|chinese|math|english|
//    +----------+------------+-------+----+-------+
//    |         1|      ZhaoSi|     88|  98|     90|
//    |         2|    ZhangSan|     80|  89|     78|
//    |         3|    WangMaZi|     65|  70|     79|
//    |         4|     LiuNeng|     78|  80|     82|
//    |         5|        LiSi|     85|  90|     72|
//    +----------+------------+-------+----+-------+

    // 创建表格
    sql("CREATE TABLE IF NOT EXISTS persons (name STRING,gender String,age INT,from STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
    // 导入数据
    sql("LOAD DATA LOCAL INPATH '/home/huangwei/person.txt' INTO TABLE persons")
    sql("SELECT * FROM persons").show()
//    +-------+-------+---+-----+
//    |name   |gender|age| from|
//    +----+------+---+---------+
//    |   马云|    男| 55|浙江-杭州|
//    | 马化腾|     男| 48|广东-深圳|
//    | 李彦宏|     男| 51|山西-阳泉|
//    | 刘强东|     男| 46|江苏-宿迁|
//    |  雷军|     男| 50|湖北-仙桃|
//    +-------+-------+---+------+
    val result = sql("SELECT * FROM persons WHERE age < 50")
    // 查询结果导出
    result.repartition(1).write.format("csv").mode(SaveMode.Append)save("/home/huangwei/result")
  }

}

踩的一些坑

异常处理:File /hive/tmp does not exist

Spark2.2(五)SparkSQL读写Hive_第1张图片
找不到/hive/tmp目录(对应于hive-site.xml中),但是HDFS上是存在该目录的,这里需要修改hive-site.xml将/hive/tmp修改为hdfs://master:9000/hive/tmp.因为IDEA中是默认系统的文件系统路径,所以会存在找不到文件
解决方案:修改hive-site.xml配置


  hive.exec.scratchdir
  hdfs://localhost:9000/hive/tmp
 

  hive.metastore.warehouse.dir
  hdfs:localhist:9000/hive/warehouse
  

  hive.querylog.location
  hdfs:localhost:9000/hive/log

异常处理:Hive Schema version 2.3.0 does not match metastore’s schema version 1.2.0 Metastore is not

解决方案:修改hive-site.xml 关闭版本验证


  hive.metastore.schema.verification
  false

异常处理:java.lang.RuntimeException: java.lang.RuntimeException: Unable to create log directory /usr/local/hive2.3.6/tmp/huangwei;

本地tmp由于是在/目录下没有写权限
解决方案:修改hive-site.xml


  hive.exec.local.scratchdir
  /home/huangwei/tmp

还有一个警告,Loading class com.mysql.jdbc.Driver'. This is deprecated. The new driver class iscom.mysql.cj.jdbc.Driver’. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.

因为这里mysql的驱动我用的是MySQL8版本的,所以需要修改hive-site.xml,将com.mysql.cj.jdbc.Driver改为com.mysql.cj.jdbc.Driver


  javax.jdo.option.ConnectionDriverName
  com.mysql.cj.jdbc.Driver
  Driver class name for a JDBC metastore

你可能感兴趣的:(Spark)