Elasticsearch的学习以及其JAVA API的使用

 

此文章主要整理Elasticsearch的实际使用中遇到的一些搜索技巧以及JAVA API的调用方法。后续会不断地补充。

目录

简单搜索

Match All Query

Term Query

Match Query

Boolean

Phrase和Phrase_prefix

MultiMatch Query

Wildcard Query

Query String Query

复合查询

Bool Query

JAVA API

连接ES集群

获取文档

 

删除文档

添加或更新文档

Bulk

搜索

在Java中使用scrolls

多搜索API

使用聚合

使用搜索模板

 查询删除


  • 简单搜索

一条搜索的json语句如下:

{
  "query": {
    ... 
  }
}
可以指定起始值和返回结果数实现分页查询,如下:

{
    "from": 0,
    "size": 10,
    "query": {
        "match_all": {}
    }
}

如果不指定分页数的话默认起始值是0,返回结果数是10。

可以选择性的加载一部分字段,如下:

{
  "fields": [
    "userId"
  ],
  "query": {
    "match_all": {}
  }
}

表示Hits结果只加载userId字段,如果fields字段为空或不存在则只返回"_index","_type","_id","_score"这些字段

  • Match All Query

{
    "query": {
        "match_all": {}
    }
}
matchAllQuery表示查询匹配全部文档。其对应的Java类为MatchAllQueryBuilder。
  • Term Query

{
  "query": {
    "term" : { "user" : "Kimchy" } 
  }
}

termQuery表示精确匹配搜索,不对内容进行分词。即实例中表示是查找内容的user字段的值为Kimchy的文档。其对应的Java为

TermQueryBuilder。有多个构造器第一个参数为要匹配字段,第二个参数为匹配值。

eg:

QueryBuilders.termQuery("name", "你的名字。")
  • Match Query

{
  "query": {
    "match": {
      "name": "甜心格格 第二季"
    }
  }
}

matchQuery匹配单个字段查询,即查询name字段名为"甜心格格 第二季"的文档。其对应的JAVA类为MatchQueryBuilder。

{
  "query": {
    "match": {
      "_all": "你神"
    }
  }
}

如果字段为“_all”则表示对所有字段进行检索。matchQuery有三种类型:booleanphrase,phrase_prefix。

  • Boolean

boolean是默认类型。根据官网文档,设置为boolean时意味着对所提供的文本进行分析,并且分析过程根据所提供的文本构造布尔查询。设置operator可以控制,默认为or。即会对给出的值进行分词。minimum_should_match 用来设置最小分词匹配数。

  • Phrase和Phrase_prefix

phrase和phrase_prefix都可以检索短语。不同的是phrase_prefix可以在最后一个词进行前缀匹配。

eg:

{
  "query": {
    "match_phrase_prefix": {
        "name": "quick brown f"
    }
  }
}
  • MultiMatch Query

{
  "query": {
    "multi_match": {
      "query": "你的名字(花絮预告)",
      "fields": [
        "name",
        "awards"
      ]
    }
  }
}

multiMatchQuery是多个字段匹配值。field字段可以使用通配符指定。比如*_name可以匹配例如first_name与last_name这样的字段。^可以提升字段的重要度,例如name^3。

它的type属性可以被设置为best_fields、most_fields、cross_fields、phrase、phrase_prefix这几种。具体的用法今后再研究。

对应的JAVA类为MultiMatchQueryBuilder。

ps:还有一种用法

{
  "query": {
    "term": {
      "all_worlds": "日本"
    }
  }
}

这样会查找所有字段中包含“日本”的文档。

  • Wildcard Query

{
  "query": {
    "wildcard": {
      "name": "*的*"
    }
  }
}

wildcardQuery是模糊查询。?匹配单个字符,*匹配多个字符。JAVA类WildcardQueryBuilder。

  • Query String Query

{
    "query": {
        "query_string" : {
            "query" : "(new york city) OR (big apple)"
        }
    }
}
Parameter Description

query

The actual query to be parsed. See Query string syntax.

default_field

The default field for query terms if no prefix field is specified. Defaults to the index.query.default_field index settings, which in turn defaults to _all.

default_operator

The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.

analyzer

The analyzer name used to analyze the query string.

allow_leading_wildcard

When set, * or ? are allowed as the first character. Defaults to true.

lowercase_expanded_terms

Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Defaults to true.

enable_position_increments

Set to true to enable position increments in result queries. Defaults to true.

fuzzy_max_expansions

Controls the number of terms fuzzy queries will expand to. Defaults to 50

fuzziness

Set the fuzziness for fuzzy queries. Defaults to AUTO. See Fuzzinesseditfor allowed settings.

fuzzy_prefix_length

Set the prefix length for fuzzy queries. Default is 0.

phrase_slop

Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.

boost

Sets the boost value of the query. Defaults to 1.0.

analyze_wildcard

By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.

auto_generate_phrase_queries

Defaults to false.

max_determinized_states

Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000.

minimum_should_match

A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value (2), a percentage (30%) or a combination of both.

lenient

If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

locale

Locale that should be used for string conversions. Defaults to ROOT.

time_zone

Time Zone to be applied to any range query related to dates. See also JODA timezone.

  • 复合查询

  • Bool Query

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "releaseYear": "2014"
          }
        },
        {
          "match_phrase_prefix": {
            "name": "你的名字"
          }
        }
      ]
    }
  }
}

boolQuery为复合查询,可以进行组合查询。

Occur Description

must

The clause (query) must appear in matching documents and will contribute to the score.

filter

The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.

should

The clause (query) should appear in the matching document. In a boolean query with no must or filter clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_matchparameter.

must_not

The clause (query) must not appear in the matching documents.

  • JAVA API

  • 连接ES集群

TransportClient利用transport模块远程连接一个elasticsearch集群。它并不加入到集群中,只是简单的获得一个或者多个初始化的transport地址,并以轮询的方式与这些地址进行通信。

// on startup
Client client = new TransportClient()
        .addTransportAddress(new InetSocketTransportAddress("host1", 9300))
        .addTransportAddress(new InetSocketTransportAddress("host2", 9300));

// on shutdown
client.close();

注意,如果你有一个与elasticsearch集群不同的集群,你可以设置机器的名字。

Settings settings = ImmutableSettings.settingsBuilder()
        .put("cluster.name", "myClusterName").build();
Client client =    new TransportClient(settings);
//Add transport addresses and do something with the client...

你也可以用elasticsearch.yml文件来设置。

这个客户端可以嗅到集群的其它部分,并将它们加入到机器列表。为了开启该功能,设置client.transport.sniff为true。

Settings settings = ImmutableSettings.settingsBuilder()
        .put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);

其它的transport客户端设置有如下几个:

Parameter Description
client.transport.ignore_cluster_name true:忽略连接节点的集群名验证
client.transport.ping_timeout ping一个节点的响应时间,默认是5s
client.transport.nodes_sampler_interval

sample/ping 节点的时间间隔,默认是5s

 

PS:client使用完毕后最好关闭,测试过如果一直获取连接不关闭的话连接可能会报错。

  • 获取文档

获取API允许你通过id从索引中获取类型化的JSON文档,如下例:

GetResponse response = client.prepareGet("twitter", "tweet", "1")
        .execute()
        .actionGet();

 

默认情况下,operationThreaded设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。

GetResponse response = client.prepareGet("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .execute()
        .actionGet();
  • 删除文档

删除api允许你通过id,从特定的索引中删除类型化的JSON文档。

默认情况下,operationThreaded设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .execute()
        .actionGet();
  • 添加或更新文档

 

你能够创建一个UpdateRequest,然后将其发送给client。

复制代码

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

或者你也可以利用prepareUpdate方法

 client.prepareUpdate("ttl", "doc", "1")
        .setScript("ctx._source.gender = \"male\""  , ScriptService.ScriptType.INLINE)
        .get();

 client.prepareUpdate("ttl", "doc", "1")
        .setDoc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();

1-3行用脚本来更新索引,5-10行用doc来更新索引。

当然,java API也支持使用upsert。如果文档还不存在,会根据upsert内容创建一个新的索引。

IndexRequest indexRequest = new IndexRequest("index", "type", "1")
        .source(jsonBuilder()
            .startObject()
                .field("name", "Joe Smith")
                .field("gender", "male")
            .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .upsert(indexRequest);
client.update(updateRequest).get();

如果文档index/type/1已经存在,那么在更新操作完成之后,文档为:

{
    "name"  : "Joe Dalton",
    "gender": "male"
}

否则,文档为:

{
    "name" : "Joe Smith",
    "gender": "male"
}
  • Bulk

bulk API允许开发者在一个请求中索引和删除多个文档。下面是使用实例。

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );

BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}
  • 搜索

搜索API允许开发者执行一个搜索查询,返回满足查询条件的搜索信息。它能够跨索引以及跨类型执行。查询既可以用Java查询API也可以用Java过滤API。 查询的请求体由SearchSourceBuilder构建。

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.FilterBuilders.*;
import org.elasticsearch.index.query.QueryBuilders.*;

SearchResponse response = client.prepareSearch("index1", "index2")
        .setTypes("type1", "type2")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.termQuery("multi", "test"))             // Query
        .setPostFilter(FilterBuilders.rangeFilter("age").from(12).to(18))   // Filter
        .setFrom(0).setSize(60).setExplain(true)
        .execute()
        .actionGet();

注意,所有的参数都是可选的。下面是最简洁的形式。

// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();

在Java中使用scrolls

import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
    for (SearchHit hit : scrollResp.getHits()) {
        //Handle the hit...
    }
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
    //Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        break;
    }
}

多搜索API

SearchRequestBuilder srb1 = node.client()
    .prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
    .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);

MultiSearchResponse sr = node.client().prepareMultiSearch()
        .add(srb1)
        .add(srb2)
        .execute().actionGet();

// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
    SearchResponse response = item.getResponse();
    nbHits += response.getHits().getTotalHits();
}

使用聚合

下面的例子显示怎样添加两个聚合到你的搜索中。

SearchResponse sr = node.client().prepareSearch()
    .setQuery(QueryBuilders.matchAllQuery())
    .addAggregation(
            AggregationBuilders.terms("agg1").field("field")
    )
    .addAggregation(
            AggregationBuilders.dateHistogram("agg2")
                    .field("birth")
                    .interval(DateHistogram.Interval.YEAR)
    )
    .execute().actionGet();

// Get your facet results
Terms agg1 = sr.getAggregations().get("agg1");
DateHistogram agg2 = sr.getAggregations().get("agg2");

使用搜索模板

定义你的模板参数为Map

Map template_params = new HashMap<>();
template_params.put("param_gender", "male");

你可以用你保存在config/scripts目录中的模板。例如,你拥有如下的文件config/scripts/template_gender.mustache

{
    "template" : {
        "query" : {
            "match" : {
                "gender" : "{{param_gender}}"
            }
        }
    }
}

可以通过如下方式执行:

SearchResponse sr = client.prepareSearch()
        .setTemplateName("template_gender")
        .setTemplateType(ScriptService.ScriptType.FILE)
        .setTemplateParams(template_params)
        .get();

你也可以将模板存储在一个专门的索引中,这个索引名为.scripts

client.preparePutIndexedScript("mustache", "template_gender",
        "{\n" +
        "    \"template\" : {\n" +
        "        \"query\" : {\n" +
        "            \"match\" : {\n" +
        "                \"gender\" : \"{{param_gender}}\"\n" +
        "            }\n" +
        "        }\n" +
        "    }\n" +
        "}").get();

为了用这个被索引的模板,需要用到ScriptService.ScriptType.INDEXED:

SearchResponse sr = client.prepareSearch()
        .setTemplateName("template_gender")
        .setTemplateType(ScriptService.ScriptType.INDEXED)
        .setTemplateParams(template_params)
        .get();
  •  查询删除

基于查询的删除API允许开发者基于查询删除一个或者多个索引、一个或者多个类型。下面是一个例子。

import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;

DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
        .setQuery(termQuery("_type", "type1"))
        .execute()
        .actionGet();

你可能感兴趣的:(Elasticsearch)