ES和spark版本:
elaticsearch 6.8.2
安装传送门:https://blog.csdn.net/mei501501/article/details/100866673
spark-2.4.4-bin-hadoop2.7
安装传送门:https://blog.csdn.net/mei501501/article/details/102565970
首先,启动es后,spark shell导入es-hadoop jar包:
下载地址:https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-20_2.11/6.8.2
spark-shell --jars /Users/mengqingmei/Documents/elasticsearch-spark-20_2.11-6.8.2.jar
交互如下:
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val conf = new SparkConf()
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "127.0.0.1")
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
0、初始化SparkContext设置ElasticSearch相关参数:
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val conf = new SparkConf()
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "127.0.0.1")
在写入数据之前,先导入org.elasticsearch.spark._
包,这将使得所有的RDD拥有saveToEs
方法。下面我将一一介绍将不同类型的数据写入ElasticSearch中。
1、将Map对象写入ElasticSearch
import org.elasticsearch.spark._
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/doc")
上面构建了两个Map对象,然后将它们写入到ElasticSearch中;其中saveToEs里面参数的spark表示索引(indexes),而doc表示type。然后我们可以通过下面URL查看iteblog这个index的属性:
curl -XGET https://127.0.0.1:9200/spark
同时使用下面URL搜索出所有的documents:
https://127.0.0.1:9200/spark/doc/_search
2、将case class对象写入ElasticSearch
我们还可以将Scala中的case class对象写入到ElasticSearch;Java中可以写入JavaBean对象,如下:
case class Trip(departure: String, arrival: String)
val upcomingTrip = Trip("OTP", "SFO")
val lastWeekTrip = Trip("MUC", "OTP")
val rdd = sc.makeRDD(Seq(upcomingTrip, lastWeekTrip))
rdd.saveToEs("spark/doc")
上面的代码片段将upcomingTrip和lastWeekTrip写入到名为spark的_index中,type是doc。上面都是通过隐式转换才使得rdd拥有saveToEs
方法。elasticsearch-hadoop
还提供显式方法来把RDD写入到ElasticSearch中,如下:
import org.elasticsearch.spark.rdd.EsSpark
val rdd = sc.makeRDD(Seq(upcomingTrip, lastWeekTrip))
EsSpark.saveToEs(rdd, "spark/doc")
3、将Json字符串写入ElasticSearch
我们可以直接将Json字符串写入到ElasticSearch中,如下:
val json1 = """{"id" : 1, "blog" : "www.iteblog.com", "weixin" : "iteblog_hadoop"}"""
val json2 = """{"id" : 2, "blog" : "books.iteblog.com", "weixin" : "iteblog_hadoop"}"""
sc.makeRDD(Seq(json1, json2)).saveJsonToEs("iteblog3/json")
4、自定义id
在ElasticSearch中,_index/_type/_id
的组合可以唯一确定一个Document。如果我们不指定id的话,ElasticSearch将会自动为我们生产全局唯一的id,自动生成的ID有20个字符长。很显然,这么长的字符串没啥意义,而且也不便于我们记忆使用。不过我们可以在插入数据的时候手动指定id的值,如下:
val otp = Map("iata" -> "OTP", "name" -> "Otopeni")
val muc = Map("iata" -> "MUC", "name" -> "Munich")
val sfo = Map("iata" -> "SFO", "name" -> "San Fran")
val airportsRDD = sc.makeRDD(Seq((1, otp), (2, muc), (3, sfo)))
airportsRDD.saveToEsWithMeta("iteblog5/2015")