spark读取HIVE数据

1.测试hive包是否能够运行成功

Xshell代码如下:

spark-shell --deploy-mode client --queue weimi.xxx --jars /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-hcatalog-core-1.1.0-cdh5.11.1.jar

spark读取HIVE数据_第1张图片

测试代码:

import org.apache.spark.sql.hive.HiveContext

val hiveContext = new HiveContext(sc)

hiveContext.sql("select item_id from user_recommend_log where dt='20190123' limit 5").show()

spark读取HIVE数据_第2张图片

2.Spark读取hive数据库代码如下:

(1)在IDEA编写程序

import org.apache.spark.sql.SparkSession

import scala.collection.mutable

object ColdStart{

  def main(args:Array[String]): Unit ={

    val sparkBuilder = SparkSession.builder()

    // init spark context

    val sparkSession = sparkBuilder.enableHiveSupport().getOrCreate()

    import sparkSession.implicits._

    val df = sparkSession.sql("select * from user_recommend_log where dt='20190123'limit 5").toDF()
df.coalesce(1).write.format("csv").save(args(0))
}
}

(2)Xshell中提交代码

spark2-submit --master yarn --deploy-mode cluster --class ColdStart --name ncf_data --queue weimi.supeihuang --driver-memory 4G --executor-memory 8G --executor-cores 8 --num-executors 4 ./programs/test2-1.0-SNAPSHOT.jar

(3)得到结果

(4)从hadoop中下载文件到本地

Hadoop fs –get  /hadoopFile   /localFile

你可能感兴趣的:(spark读取HIVE数据)