查询器(query)倾向于更准确的查找,根据elasticsearch内部分析相关度得到与搜索内容匹配度更高的内容,因此速度较慢。
elasticsearch有着默认分词器。term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇。使用term要确定的是这个字段是否“被分析”(analyzed),默认的字符串是被分析的。
例如存入“日志”,分词器默认把中文分割存储,存为“日”“志”,当查询“日志”时发现查不到;或者存入"error log",因其中有空格,所以其实存入es的数据为"error""log",当查询"error log"时会发现查不到。这是因为默认"index":"analyzed",即默认对该字段进行分析,可设置"index":"not_analyzed"即不对该字段进行分词。
match是查询的字符串也是要被分析的,如果存入字符串是被分析的,如果想被查到,就必须要用matchquery。
构造Query查询器实例:
QueryBuilder qb = termQuery("_type", "log"); SearchResponse response = transportClient.prepareSearch("test_tx") .setQuery(qb) .setFrom(0).setSize(60).setExplain(true) .execute() .actionGet();prepareSearch(String str),str为index即索引名,如若prepareSearch()即为查询整个集群。
QueryBuilder qb = boolQuery() .must(termQuery("type", "typeValue")) .mustNot(termsQuery("logrank",logrank)). .should(termsQuery("logsource",logsource));must、mustNot、should分别代表and、not、or逻辑。
SearchResponse response = client.prepareSearch() .setQuery(query) .setFrom(0).setSize(60).setExplain(true) .execute() .actionGet();
FilterBuilders.andFilter( FilterBuilders.rangeFilter("age").from(1).to(100), FilterBuilders.prefixFilter("name", "Jack") );
同样,多条件查询也可以用boolFilter():
FilterBuilders.boolFilter() .must(FilterBuilders.termFilter("name", "Jack")) .mustNot(FilterBuilders.rangeFilter("age").from(10).to(30)) .should(FilterBuilders.termFilter("home", "hometown")); }构造好Filter 传到elasticsearch里进行过滤:
SearchResponse response = client.prepareSearch() .setFilter(filterBuilder) .execute().actionGet();
QueryBuilder qb = boolQuery() .must(termQuery("logRank", logRank)); CountResponse response = client.prepareCount("test_tx") // 索引 .setQuery(qb) //类型 .execute().actionGet();
DateHistogramBuilder dateAgg = AggregationBuilders.dateHistogram("dateAgg");//取名字 dateAgg.field("time"); //设置聚合字段 dateAgg.interval(DateHistogram.Interval.YEAR); //dateAgg.interval(DateHistogram.Interval.QUARTER); //dateAgg.interval(DateHistogram.Interval.MONTH); //dateAgg.interval(DateHistogram.Interval.weeks(2)); //dateAgg.interval(DateHistogram.Interval.days(1)); //dateAgg.interval(DateHistogram.Interval.hours(2)); //dateAgg.interval(DateHistogram.Interval.minutes(5)); //dateAgg.interval(DateHistogram.Interval.seconds(10)); dateAgg .format("yyyy-MM-dd") //设置聚合日期格式 .minDocCount(0) //设置聚合后最小值,防止聚合数为0时得到的结果没有该项 .extendedBounds(beginDate, endDate); //强制设置时间范围然后在elasticsearch中开始聚合:
SearchResponse response = client.prepareSearch() .addAggregation("dateAgg") .execute().actionGet(); Map<String, Long> logNumByDay = new HashMap<String, Long>(); DateHistogram aggByDay = response.getAggregations().get("dateAgg"); //得到聚合后的桶 for(DateHistogram.Bucket logByDay : aggByDay.getBuckets()) //遍历桶 { logNumByDay.put(logByDay.getKey(),logByDay.getDocCount()); //对每个桶取信息构造键值对 }
4.2 terms聚合
public Map<String, Long> getTypesAggNum(String type, List<String> logRank, Date beginDate, Date endDate) { TermsBuilder typeAgg = AggregationBuilders .terms("typeAgg") //取名字 .field(type); //聚合属性 SearchResponse response = client.prepareSearch("testindex") .setQuery(boolQuery() .must(termsQuery("logRank", logRank)) .must(rangeQuery("time").from(beginDate).to(endDate)) ) .addAggregation(typeAgg) .execute().actionGet(); Map<String, Long> logNumByAgg = new LinkedHashMap<>(); Terms aggByType = response.getAggregations().get("typeAgg"); //取结果 for(Bucket logByType : aggByType.getBuckets()) { logNumByAgg.put(logByType.getKey(),logByType.getDocCount()); } return logNumByAgg; }
上述代码为以“logRank”属性取值为logRank链表内元素并且时间范围在beginDate与endDate之间条件进行查询后,根据type进行聚合,聚合后的结果为terms类型,terms聚合为根据属性所有取值进行聚合。