此文章主要整理Elasticsearch的实际使用中遇到的一些搜索技巧以及JAVA API的调用方法。后续会不断地补充。
目录
简单搜索
Match All Query
Term Query
Match Query
Boolean
Phrase和Phrase_prefix
MultiMatch Query
Wildcard Query
Query String Query
复合查询
Bool Query
JAVA API
连接ES集群
获取文档
删除文档
添加或更新文档
Bulk
搜索
在Java中使用scrolls
多搜索API
使用聚合
使用搜索模板
查询删除
一条搜索的json语句如下:
{
"query": {
...
}
}
可以指定起始值和返回结果数实现分页查询,如下:
{
"from": 0,
"size": 10,
"query": {
"match_all": {}
}
}
如果不指定分页数的话默认起始值是0,返回结果数是10。
可以选择性的加载一部分字段,如下:
{
"fields": [
"userId"
],
"query": {
"match_all": {}
}
}
表示Hits结果只加载userId字段,如果fields字段为空或不存在则只返回"_index","_type","_id","_score"这些字段
{
"query": {
"match_all": {}
}
}
matchAllQuery表示查询匹配全部文档。其对应的Java类为MatchAllQueryBuilder。
{
"query": {
"term" : { "user" : "Kimchy" }
}
}
termQuery表示精确匹配搜索,不对内容进行分词。即实例中表示是查找内容的user字段的值为Kimchy的文档。其对应的Java为
TermQueryBuilder。有多个构造器第一个参数为要匹配字段,第二个参数为匹配值。
eg:
QueryBuilders.termQuery("name", "你的名字。")
{
"query": {
"match": {
"name": "甜心格格 第二季"
}
}
}
matchQuery匹配单个字段查询,即查询name字段名为"甜心格格 第二季"的文档。其对应的JAVA类为MatchQueryBuilder。
{
"query": {
"match": {
"_all": "你神"
}
}
}
如果字段为“_all”则表示对所有字段进行检索。matchQuery有三种类型:boolean
, phrase
,phrase_prefix。
Boolean
boolean是默认类型。根据官网文档,设置为boolean时意味着对所提供的文本进行分析,并且分析过程根据所提供的文本构造布尔查询。设置
operator可以控制,默认为or。即会对给出的值进行分词。minimum_should_match
用来设置最小分词匹配数。
phrase和phrase_prefix都可以检索短语。不同的是phrase_prefix可以在最后一个词进行前缀匹配。
eg:
{
"query": {
"match_phrase_prefix": {
"name": "quick brown f"
}
}
}
{
"query": {
"multi_match": {
"query": "你的名字(花絮预告)",
"fields": [
"name",
"awards"
]
}
}
}
multiMatchQuery是多个字段匹配值。field字段可以使用通配符指定。比如*_name可以匹配例如first_name与last_name这样的字段。^可以提升字段的重要度,例如name^3。
它的type属性可以被设置为best_fields、most_fields、cross_fields、phrase、phrase_prefix这几种。具体的用法今后再研究。
对应的JAVA类为MultiMatchQueryBuilder。
ps:还有一种用法
{
"query": {
"term": {
"all_worlds": "日本"
}
}
}
这样会查找所有字段中包含“日本”的文档。
{
"query": {
"wildcard": {
"name": "*的*"
}
}
}
wildcardQuery是模糊查询。?匹配单个字符,*匹配多个字符。JAVA类WildcardQueryBuilder。
{
"query": {
"query_string" : {
"query" : "(new york city) OR (big apple)"
}
}
}
Parameter | Description |
---|---|
|
The actual query to be parsed. See Query string syntax. |
|
The default field for query terms if no prefix field is specified. Defaults to the |
|
The default operator used if no explicit operator is specified. For example, with a default operator of |
|
The analyzer name used to analyze the query string. |
|
When set, |
|
Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Defaults to |
|
Set to |
|
Controls the number of terms fuzzy queries will expand to. Defaults to |
|
Set the fuzziness for fuzzy queries. Defaults to |
|
Set the prefix length for fuzzy queries. Default is |
|
Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is |
|
Sets the boost value of the query. Defaults to |
|
By default, wildcards terms in a query string are not analyzed. By setting this value to |
|
Defaults to |
|
Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000. |
|
A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value ( |
|
If set to |
|
Locale that should be used for string conversions. Defaults to |
|
Time Zone to be applied to any range query related to dates. See also JODA timezone. |
{
"query": {
"bool": {
"should": [
{
"term": {
"releaseYear": "2014"
}
},
{
"match_phrase_prefix": {
"name": "你的名字"
}
}
]
}
}
}
boolQuery为复合查询,可以进行组合查询。
Occur | Description |
---|---|
|
The clause (query) must appear in matching documents and will contribute to the score. |
|
The clause (query) must appear in matching documents. However unlike |
|
The clause (query) should appear in the matching document. In a boolean query with no |
|
The clause (query) must not appear in the matching documents. |
TransportClient
利用transport模块远程连接一个elasticsearch集群。它并不加入到集群中,只是简单的获得一个或者多个初始化的transport地址,并以轮询的方式与这些地址进行通信。
// on startup
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
// on shutdown
client.close();
注意,如果你有一个与elasticsearch
集群不同的集群,你可以设置机器的名字。
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
//Add transport addresses and do something with the client...
你也可以用elasticsearch.yml
文件来设置。
这个客户端可以嗅到集群的其它部分,并将它们加入到机器列表。为了开启该功能,设置client.transport.sniff
为true。
Settings settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);
其它的transport客户端设置有如下几个:
Parameter | Description |
---|---|
client.transport.ignore_cluster_name | true:忽略连接节点的集群名验证 |
client.transport.ping_timeout | ping一个节点的响应时间,默认是5s |
client.transport.nodes_sampler_interval | sample/ping 节点的时间间隔,默认是5s |
PS:client使用完毕后最好关闭,测试过如果一直获取连接不关闭的话连接可能会报错。
获取API允许你通过id从索引中获取类型化的JSON文档,如下例:
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.execute()
.actionGet();
默认情况下,operationThreaded
设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
删除api允许你通过id,从特定的索引中删除类型化的JSON文档。
默认情况下,operationThreaded
设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
你能够创建一个UpdateRequest
,然后将其发送给client。
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
或者你也可以利用prepareUpdate
方法
client.prepareUpdate("ttl", "doc", "1")
.setScript("ctx._source.gender = \"male\"" , ScriptService.ScriptType.INLINE)
.get();
client.prepareUpdate("ttl", "doc", "1")
.setDoc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject())
.get();
1-3行用脚本来更新索引,5-10行用doc来更新索引。
当然,java API也支持使用upsert
。如果文档还不存在,会根据upsert
内容创建一个新的索引。
IndexRequest indexRequest = new IndexRequest("index", "type", "1")
.source(jsonBuilder()
.startObject()
.field("name", "Joe Smith")
.field("gender", "male")
.endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject())
.upsert(indexRequest);
client.update(updateRequest).get();
如果文档index/type/1
已经存在,那么在更新操作完成之后,文档为:
{
"name" : "Joe Dalton",
"gender": "male"
}
否则,文档为:
{
"name" : "Joe Smith",
"gender": "male"
}
bulk API允许开发者在一个请求中索引和删除多个文档。下面是使用实例。
import static org.elasticsearch.common.xcontent.XContentFactory.*;
BulkRequestBuilder bulkRequest = client.prepareBulk();
// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elasticsearch")
.endObject()
)
);
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "another post")
.endObject()
)
);
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
// process failures by iterating through each bulk response item
}
搜索API允许开发者执行一个搜索查询,返回满足查询条件的搜索信息。它能够跨索引以及跨类型执行。查询既可以用Java查询API也可以用Java过滤API。 查询的请求体由SearchSourceBuilder
构建。
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.FilterBuilders.*;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("index1", "index2")
.setTypes("type1", "type2")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("multi", "test")) // Query
.setPostFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) // Filter
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
注意,所有的参数都是可选的。下面是最简洁的形式。
// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
for (SearchHit hit : scrollResp.getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
//Break condition: No hits are returned
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}
SearchRequestBuilder srb1 = node.client()
.prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
.prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);
MultiSearchResponse sr = node.client().prepareMultiSearch()
.add(srb1)
.add(srb2)
.execute().actionGet();
// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
SearchResponse response = item.getResponse();
nbHits += response.getHits().getTotalHits();
}
下面的例子显示怎样添加两个聚合到你的搜索中。
SearchResponse sr = node.client().prepareSearch()
.setQuery(QueryBuilders.matchAllQuery())
.addAggregation(
AggregationBuilders.terms("agg1").field("field")
)
.addAggregation(
AggregationBuilders.dateHistogram("agg2")
.field("birth")
.interval(DateHistogram.Interval.YEAR)
)
.execute().actionGet();
// Get your facet results
Terms agg1 = sr.getAggregations().get("agg1");
DateHistogram agg2 = sr.getAggregations().get("agg2");
定义你的模板参数为Map
Map template_params = new HashMap<>();
template_params.put("param_gender", "male");
你可以用你保存在config/scripts
目录中的模板。例如,你拥有如下的文件config/scripts/template_gender.mustache
{
"template" : {
"query" : {
"match" : {
"gender" : "{{param_gender}}"
}
}
}
}
可以通过如下方式执行:
SearchResponse sr = client.prepareSearch()
.setTemplateName("template_gender")
.setTemplateType(ScriptService.ScriptType.FILE)
.setTemplateParams(template_params)
.get();
你也可以将模板存储在一个专门的索引中,这个索引名为.scripts
client.preparePutIndexedScript("mustache", "template_gender",
"{\n" +
" \"template\" : {\n" +
" \"query\" : {\n" +
" \"match\" : {\n" +
" \"gender\" : \"{{param_gender}}\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}").get();
为了用这个被索引的模板,需要用到ScriptService.ScriptType.INDEXED
:
SearchResponse sr = client.prepareSearch()
.setTemplateName("template_gender")
.setTemplateType(ScriptService.ScriptType.INDEXED)
.setTemplateParams(template_params)
.get();
基于查询的删除API允许开发者基于查询删除一个或者多个索引、一个或者多个类型。下面是一个例子。
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
.setQuery(termQuery("_type", "type1"))
.execute()
.actionGet();