有个新需求说来比较简单,就是spark读取hive中的数据,处理完后入es,这里就是简单整理一下流程
伪代码
object Credit_User_Model_To_Es {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(name = s"${this.getClass.getSimpleName}")
.config("es.index.auto.create", "true")
.config("es.nodes", "172.16.0.216:9200,172.16.0.217:9200,172.16.0.218:9200")
.enableHiveSupport()
.getOrCreate()
import spark.sql
import org.elasticsearch.spark.sql._
sql(sqlText = "select * from ads.ads_credit_score_m")
.saveToEs("credit/doc")
spark.stop()
}
}
需要注意是:
.config("es.index.auto.create", "true")
.config("es.nodes", "172.16.0.216:9200,172.16.0.217:9200,172.16.0.218:9200")
import org.elasticsearch.spark.sql._
.saveToEs("credit/doc")
spark.esDF("credit/doc")
提交代码
spark2-submit \
--class cn.unisk.es.Credit_User_Model_To_Es \
--master yarn \
--deploy-mode cluster \
--executor-memory 64G \
--total-executor-cores 100 \
--jars /var/lib/hadoop-hdfs/elasticsearch-hadoop-7.3.1.jar \
/var/lib/hadoop-hdfs/cm.jar
需要注意的是:
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.sql.package$
因此如要提交的时候添加:
--jars /var/lib/hadoop-hdfs/elasticsearch-hadoop-7.3.1.jar \
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version 7.3.0
显然原因就是这个版本太低了,建议用7.3.0,于是我下载了一个7.3.1的jar包,发现终于成功了
GET credit/_search
{
"query": {
"match_all": {}
}
}
结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "credit",
"_type" : "doc",
"_id" : "Sx6SGm0BLr7Gr_l6AMkt",
"_score" : 1.0,
"_source" : {
"serv_number" : "18640205299",
"product_type" : "4",
"is_online" : "0",
"is_tencent" : "0",
"credit_scores" : 760,
"statis_month" : "201907",
"pro_code" : "091"
}
},
...