Elasticsearch常用查询及JAVA实现

介绍

Elasticsearch简称(ES) 是一个分布式的 RESTful 风格的搜索和数据分析引擎, 是一个高度可伸缩的开源全文搜索和分析引擎。它允许您快速和接近实时地存储、搜索和分析大量数据。
本文不讨论ES的搭建和实现原理,只介绍ES的常用查询和SQL的对应,以及部分JAVA代码的实现。

前提背景

假设你已经有一个搭建好的ES环境,主要工作是查询分析ES中的数据。
本文的示例是基于联通智慧客服对话记录的,联通热线客服和在线客服已经实现了智能语音人机对话,所有对话记录数据存储在ES中,为了更好的节约人工成本、提升服务质量,本文主要介绍如何查询和分析对话记录。

辅助工具

Postman,本文示例使用Postman调用,当然可以通过其他方式调用 REST API

常用查询

  • 查询所有索引

请求:

 http://127.0.0.1:9200/_cat/indices?v

响应:

health status index     uuid   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index_0   xxxx   5   0   10855755          813      8.7gb          8.7gb
  • 查询数据结构

请求:

 http://127.0.0.1:9200/index_0/_mapping

响应:(为了方便演示只列出了部分结构)

{
    "index_0": {
        "mappings": {
            "dialogue_history": {
                "_all": {
                    "enabled": false
                },
                "dynamic_templates": [
                    {
                        "nested_records": {
                            "match": "records",
                            "mapping": {
                                "type": "nested"
                            }
                        }
                    }
                ],
                "properties": {
                    "create_time": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis"
                    },
                    "id": {
                        "type": "keyword"
                    },
                    "records": {
                        "type": "nested",
                        "properties": {
                            "intent_name": {
                                "type": "keyword"
                            },
                            "prev_intent_name": {
                                "type": "keyword"
                            },
                            "query_text": {
                                "type": "text"
                            },
                            "slot_fill_fail_num": {
                                "type": "long"
                            },
                            "slot_fill_num": {
                                "type": "long"
                            },
                            "slot_fill_success_num": {
                                "type": "long"
                            },
                            "global_variable_num": {
                                "type": "long"
                            },
                            "global_variable_success_num": {
                                "type": "long"
                            }
                        }
                    }
                }
            }
        }
    }
}

关系型数据库表示:
只用于和ES查询对比,没有实际意义

主表:dialogue_history:对话历史记录,一通电话一条记录,会有多条交互记录

描述
id 主键
create_time 创建时间

子表:records:交互记录,一句话一条记录

描述
id 主键
parent_id 父表id
intent_name 意图名称
prev_intent_name 前一意图名称
query_text 用户话述
slot_fill_num 槽位数量
slot_fill_success_num 槽位成功数量
slot_fill_fail_num 槽位失败数量
global_variable_num 全局变量数量
global_variable_success_num 全局变量成功数量

Elasticsearch常用查询及JAVA实现_第1张图片

  • 查询所有数据(分页+排序)
    说明:查询所有数据,按创建时间排序,取 0 到 10 条记录。

请求:(后续查询都使用该地址)

 http://127.0.0.1:9200/index_0/_search

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "from": 0,
    "size": 10,
    "sort": { "create_time": { "order": "desc" } }
}

SQL对比:

 SELECT * FROM dialogue_history ORDER BY create_time DESC LIMIT 0,10  
  • 一级索引查询,单字段
    说明:按主键id查询对话记录

DSL语句:

{
    "query": {
        "match": {"id":"2-1612108799-5900994"}
    }
}

SQL对比:

 SELECT * FROM dialogue_history where id = '2-1612108799-5900994'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.termsQuery("id", "2-1612108799-5900994"));
searchSourceBuilder.query(boolBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);
  • 一级索引查询,按范围
    说明:按“创建时间”范围查询对话记录

DSL语句:

{
    "query": {
        "range": {
            "create_time": {
                "from": "2021-01-21 00:00:00",
                "to": "2021-01-22 00:00:00"
            }
        }
    }
}

SQL对比:

 SELECT * FROM dialogue_history 
 where create_time BETWEEN '2021-01-21 00:00:00' AND '2021-01-22 00:00:00'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.rangeQuery("create_time").from(
                dateForm.format("2021-01-21 00:00:00")).to(dateForm.format("2021-01-22 00:00:00")));
searchSourceBuilder.query(boolBuilder);
  • 二级索引查询
    说明:按“用户话述”模糊查询

DSL语句:

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "records",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "match": {
                                            "records.query_text": "欢迎语通用播报"
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
} 

SQL对比:

 SELECT * FROM dialogue_history d 
 left outer join records r on d.id = records.parent_id 
 where records.query_text like '%欢迎语通用播报%'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.nestedQuery("records",
                        QueryBuilders.boolQuery().must(QueryBuilders.matchPhraseQuery("records.query_text",
                                "欢迎语通用播报")), ScoreMode.Total));
searchSourceBuilder.query(boolBuilder);
  • 二级索引聚合查询
    说明:按“意图名称”聚合,对“槽位数量”、“槽位成功数量”、“槽位失败数量”求和。

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggs": {
        "intent_group": {
            "nested": {
                "path": "records"
            },
            "aggs": {
                "intent_name": {
                    "terms": {
                        "field": "records.intent_name"
                    },
                    "aggs": {
                        "slot_fill_num": {
                            "sum": {
                                "field": "records.slot_fill_num"
                            }
                        },
                        "slot_fill_success_num": {
                            "sum": {
                                "field": "records.slot_fill_success_num"
                            }
                        },
                        "slot_fill_fail_num": {
                            "sum": {
                                "field": "records.slot_fill_fail_num"
                            }
                        }
                    }
                }
            }
        }
    }
}

SQL对比:

 SELECT records.intent_name intent_name,
	 count(records.intent_name) count,
	 sum(records.slot_fill_num) slot_fill_num,
	 sum(records.slot_fill_success_num) slot_fill_success_num,
	 sum(records.slot_fill_fail_num) slot_fill_fail_num
 FROM dialogue_history d 
 left outer join records  on d.id = records.parent_id 
 group by records.intent_name

java代码(请求拼接):

 // 按意图统计
TermsAggregationBuilder intentAggr
        = AggregationBuilders.terms("count").field("records.intent_name");
// 按槽位总数求和
SumAggregationBuilder slotFillNumAggr
        = AggregationBuilders.sum("slot_fill_num").field("records.slot_fill_num");
// 按槽位成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
        = AggregationBuilders.sum("slot_fill_success_num").field("records.slot_fill_success_num");
// 按槽位失败数求和
SumAggregationBuilder slotFillFailNumAggr
        = AggregationBuilders.sum("slot_fill_fail_num").field("records.slot_fill_fail_num");

intentAggr.subAggregation(slotFillNumAggr)
        .subAggregation(slotFillSuccessNumAggr)
        .subAggregation(slotFillFailNumAggr)
        .size(Integer.MAX_VALUE);

NestedAggregationBuilder aggregationBuilder = AggregationBuilders
        .nested("records", "records").subAggregation(intentAggr);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);

java代码(返回解析):

Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
   ParsedNested details = (ParsedNested) detailsMap.get("records");
   Map<String, Aggregation> countMap = details.getAggregations().asMap();
   ParsedStringTerms count = (ParsedStringTerms) countMap.get("count");

   count.getBuckets().forEach(it -> {
       String intentName = (String) it.getKey();
       Long intentSum = it.getDocCount();
       Map<String, Aggregation> sumMap = it.getAggregations().asMap();
       ParsedSum slotFillNum = (ParsedSum) sumMap.get("slot_fill_num");
       ParsedSum slotFillSuccessNum = (ParsedSum) sumMap.get("slot_fill_success_num");
       ParsedSum slotFillFailNum = (ParsedSum) sumMap.get("slot_fill_fail_num");
       result.add(new StatisticIntentSlot(
               intentName,
               intentSum,
               slotFillNum.getValue(),
               slotFillSuccessNum.getValue(),
               slotFillFailNum.getValue()));
   });
} catch (Exception e) {
   LoggerHelper.err(getClass(), e.getMessage(), e);
}
  • 二级索引聚合查询+多列聚合+过滤
    说明:按“意图名称”、“前一意图名称”聚合,过滤掉没有“全局变量数量”的记录,对“全局变量数量”、“全局变量成功数量”求和。。

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggregations": {
        "records": {
            "nested": {
                "path": "records"
            },
            "aggregations": {
                "global_variable": {
                    "filter": {
                        "exists": {
                            "field": "records.global_variable_num"
                        }
                    },
                    "aggregations": {
                        "prev_intent_name": {
                            "terms": {
                                "field": "records.prev_intent_name",
                                 "size": 10
                            },
                            "aggregations": {
                                "intent_name": {
                                    "terms": {
                                        "field": "records.intent_name",
                                        "size": 2147483647,
                                    },
                                    "aggregations": {
                                        "global_variable_num": {
                                            "sum": {
                                                "field": "records.global_variable_num"
                                            }
                                        },
                                        "global_variable_success_num": {
                                            "sum": {
                                                "field": "records.global_variable_success_num"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

SQL对比:

 SELECT records.prev_intent_name,
 	 records.intent_name,
	 count(records.prev_intent_name) count,
	 sum(records.global_variable_num) global_variable_num,
	 sum(records.global_variable_success_num) global_variable_success_num
 FROM dialogue_history d 
 left outer join records  on d.id = records.parent_id 
 group by
	records.prev_intent_name,
	records.intent_name
 HAVING records.global_variable_num is not null 

java代码(请求拼接):

 // 过滤聚合数据
FilterAggregationBuilder filterAggr
        = AggregationBuilders.filter("global_variable", QueryBuilders.existsQuery("records.global_variable_num"));
// 按上一意图统计
TermsAggregationBuilder prevIntentAggr
        = AggregationBuilders.terms("prev_intent_name").field("records.prev_intent_name");
// 按意图统计
TermsAggregationBuilder intentAggr
        = AggregationBuilders.terms("intent_name").field("records.intent_name");
// 按变量总数求和
SumAggregationBuilder slotFillNumAggr
        = AggregationBuilders.sum("global_variable_num").field("records.global_variable_num");
// 按变量成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
        = AggregationBuilders.sum("global_variable_success_num").
        field("records.global_variable_success_num");

filterAggr.subAggregation(
        prevIntentAggr.subAggregation(
                intentAggr.subAggregation(slotFillNumAggr).subAggregation(slotFillSuccessNumAggr)
                        .size(Integer.MAX_VALUE)));

NestedAggregationBuilder aggregationBuilder = AggregationBuilders
        .nested("records", "records").subAggregation(filterAggr);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);

java代码(返回解析):

Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
    ParsedNested records = (ParsedNested) detailsMap.get("records");
    Map<String, Aggregation> globalVariableMap = records.getAggregations().asMap();
    ParsedFilter globalVariableFilter = (ParsedFilter) globalVariableMap.get("global_variable");
    Map<String, Aggregation> prevIntentNameMap = globalVariableFilter.getAggregations().asMap();
    ParsedStringTerms prevIntentNameCount = (ParsedStringTerms) prevIntentNameMap.get("prev_intent_name");
    prevIntentNameCount.getBuckets().forEach(prevIntentNameCountIt -> {
        String prevIntentName = (String) prevIntentNameCountIt.getKey();
        Map<String, Aggregation> intentNameMap = prevIntentNameCountIt.getAggregations().asMap();
        ParsedStringTerms intentNameCount = (ParsedStringTerms) intentNameMap.get("intent_name");
        intentNameCount.getBuckets().forEach(it -> {
            String intentName = (String) it.getKey();
            Long intentSum = it.getDocCount();
            Map<String, Aggregation> sumMap = it.getAggregations().asMap();
            ParsedSum globalVariableNum = (ParsedSum) sumMap.get("global_variable_num");
            ParsedSum globalVariableSuccessNum = (ParsedSum) sumMap.get("global_variable_success_num");
            result.add(new StatisticGlobalIntentSlot(
                    intentName,
                    prevIntentName,
                    intentSum,
                    globalVariableNum.getValue(),
                    globalVariableSuccessNum.getValue()));
        });

    });
} catch (Exception e) {
    LoggerHelper.err(getClass(), e.getMessage(), e);
}

总结

  1. SQL能查的结构ES都能查,SQL查不了试试ES能不能查。
  2. 注意字段类型,节省索引空间。
  3. 一级索引、二级索引查询效率没问题,三级索引性能会下降。

后记

  1. 本文是基于业务提炼的ES查询示例,为用惯了SQL的同学提供参考,也为自己备忘。随着业务发展还会继续更新。
  2. 本文只摘录了部分查询DSL语句和JAVA代码,直接运行可能会有问题,有什么问题欢迎大家留言指正。

你可能感兴趣的:(elasticsearch,数据库,java)