Spark 读写 Es

前言

有个新需求说来比较简单,就是spark读取hive中的数据,处理完后入es,这里就是简单整理一下流程

流程

伪代码

object Credit_User_Model_To_Es {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName(name = s"${this.getClass.getSimpleName}")
      .config("es.index.auto.create", "true")
      .config("es.nodes", "172.16.0.216:9200,172.16.0.217:9200,172.16.0.218:9200")
      .enableHiveSupport()
      .getOrCreate()
 
    import spark.sql
    import org.elasticsearch.spark.sql._
    sql(sqlText = "select * from ads.ads_credit_score_m")
      .saveToEs("credit/doc")
    spark.stop()
  }
}

需要注意是:

  1. 两个配置
 .config("es.index.auto.create", "true")
 .config("es.nodes", "172.16.0.216:9200,172.16.0.217:9200,172.16.0.218:9200")
  1. 导包
 import org.elasticsearch.spark.sql._
  1. 写入代码
.saveToEs("credit/doc")
  1. 读取代码
 spark.esDF("credit/doc")

提交代码

spark2-submit \
--class cn.unisk.es.Credit_User_Model_To_Es \
--master yarn \
--deploy-mode  cluster \
--executor-memory 64G \
--total-executor-cores 100 \
--jars /var/lib/hadoop-hdfs/elasticsearch-hadoop-7.3.1.jar \
/var/lib/hadoop-hdfs/cm.jar 

需要注意的是:

  1. 当集群的各个节点没有elasticsearch-hadoop的jar包时,必须要在提交的时候手动写–jars,否则报错:
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.sql.package$

因此如要提交的时候添加:

--jars /var/lib/hadoop-hdfs/elasticsearch-hadoop-7.3.1.jar \
  1. 第一次我下载了一个elasticsearch-hadoop-5.5.jar版本的jar包,然后报错:
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version 7.3.0

显然原因就是这个版本太低了,建议用7.3.0,于是我下载了一个7.3.1的jar包,发现终于成功了

Kibana查询一下数据

GET credit/_search
{
  "query": {
    "match_all": {}
  }
}

结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "credit",
        "_type" : "doc",
        "_id" : "Sx6SGm0BLr7Gr_l6AMkt",
        "_score" : 1.0,
        "_source" : {
          "serv_number" : "18640205299",
          "product_type" : "4",
          "is_online" : "0",
          "is_tencent" : "0",
          "credit_scores" : 760,
          "statis_month" : "201907",
          "pro_code" : "091"
        }
      },
      ...

你可能感兴趣的:(Spark)