Elasticsearch简称(ES) 是一个分布式的 RESTful 风格的搜索和数据分析引擎, 是一个高度可伸缩的开源全文搜索和分析引擎。它允许您快速和接近实时地存储、搜索和分析大量数据。
本文不讨论ES的搭建和实现原理,只介绍ES的常用查询和SQL的对应,以及部分JAVA代码的实现。
假设你已经有一个搭建好的ES环境,主要工作是查询分析ES中的数据。
本文的示例是基于联通智慧客服对话记录的,联通热线客服和在线客服已经实现了智能语音人机对话,所有对话记录数据存储在ES中,为了更好的节约人工成本、提升服务质量,本文主要介绍如何查询和分析对话记录。
Postman,本文示例使用Postman调用,当然可以通过其他方式调用 REST API
请求:
http://127.0.0.1:9200/_cat/indices?v
响应:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index_0 xxxx 5 0 10855755 813 8.7gb 8.7gb
请求:
http://127.0.0.1:9200/index_0/_mapping
响应:(为了方便演示只列出了部分结构)
{
"index_0": {
"mappings": {
"dialogue_history": {
"_all": {
"enabled": false
},
"dynamic_templates": [
{
"nested_records": {
"match": "records",
"mapping": {
"type": "nested"
}
}
}
],
"properties": {
"create_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis"
},
"id": {
"type": "keyword"
},
"records": {
"type": "nested",
"properties": {
"intent_name": {
"type": "keyword"
},
"prev_intent_name": {
"type": "keyword"
},
"query_text": {
"type": "text"
},
"slot_fill_fail_num": {
"type": "long"
},
"slot_fill_num": {
"type": "long"
},
"slot_fill_success_num": {
"type": "long"
},
"global_variable_num": {
"type": "long"
},
"global_variable_success_num": {
"type": "long"
}
}
}
}
}
}
}
}
关系型数据库表示:
只用于和ES查询对比,没有实际意义
主表:dialogue_history:对话历史记录,一通电话一条记录,会有多条交互记录
列 | 描述 |
---|---|
id | 主键 |
create_time | 创建时间 |
子表:records:交互记录,一句话一条记录
列 | 描述 |
---|---|
id | 主键 |
parent_id | 父表id |
intent_name | 意图名称 |
prev_intent_name | 前一意图名称 |
query_text | 用户话述 |
slot_fill_num | 槽位数量 |
slot_fill_success_num | 槽位成功数量 |
slot_fill_fail_num | 槽位失败数量 |
global_variable_num | 全局变量数量 |
global_variable_success_num | 全局变量成功数量 |
请求:(后续查询都使用该地址)
http://127.0.0.1:9200/index_0/_search
DSL语句:
{
"query": {
"match_all": {}
},
"from": 0,
"size": 10,
"sort": { "create_time": { "order": "desc" } }
}
SQL对比:
SELECT * FROM dialogue_history ORDER BY create_time DESC LIMIT 0,10
DSL语句:
{
"query": {
"match": {"id":"2-1612108799-5900994"}
}
}
SQL对比:
SELECT * FROM dialogue_history where id = '2-1612108799-5900994'
java代码(请求拼接):
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.termsQuery("id", "2-1612108799-5900994"));
searchSourceBuilder.query(boolBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);
DSL语句:
{
"query": {
"range": {
"create_time": {
"from": "2021-01-21 00:00:00",
"to": "2021-01-22 00:00:00"
}
}
}
}
SQL对比:
SELECT * FROM dialogue_history
where create_time BETWEEN '2021-01-21 00:00:00' AND '2021-01-22 00:00:00'
java代码(请求拼接):
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.rangeQuery("create_time").from(
dateForm.format("2021-01-21 00:00:00")).to(dateForm.format("2021-01-22 00:00:00")));
searchSourceBuilder.query(boolBuilder);
DSL语句:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"match": {
"records.query_text": "欢迎语通用播报"
}
}
]
}
}
}
}
]
}
}
}
SQL对比:
SELECT * FROM dialogue_history d
left outer join records r on d.id = records.parent_id
where records.query_text like '%欢迎语通用播报%'
java代码(请求拼接):
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.nestedQuery("records",
QueryBuilders.boolQuery().must(QueryBuilders.matchPhraseQuery("records.query_text",
"欢迎语通用播报")), ScoreMode.Total));
searchSourceBuilder.query(boolBuilder);
DSL语句:
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"intent_group": {
"nested": {
"path": "records"
},
"aggs": {
"intent_name": {
"terms": {
"field": "records.intent_name"
},
"aggs": {
"slot_fill_num": {
"sum": {
"field": "records.slot_fill_num"
}
},
"slot_fill_success_num": {
"sum": {
"field": "records.slot_fill_success_num"
}
},
"slot_fill_fail_num": {
"sum": {
"field": "records.slot_fill_fail_num"
}
}
}
}
}
}
}
}
SQL对比:
SELECT records.intent_name intent_name,
count(records.intent_name) count,
sum(records.slot_fill_num) slot_fill_num,
sum(records.slot_fill_success_num) slot_fill_success_num,
sum(records.slot_fill_fail_num) slot_fill_fail_num
FROM dialogue_history d
left outer join records on d.id = records.parent_id
group by records.intent_name
java代码(请求拼接):
// 按意图统计
TermsAggregationBuilder intentAggr
= AggregationBuilders.terms("count").field("records.intent_name");
// 按槽位总数求和
SumAggregationBuilder slotFillNumAggr
= AggregationBuilders.sum("slot_fill_num").field("records.slot_fill_num");
// 按槽位成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
= AggregationBuilders.sum("slot_fill_success_num").field("records.slot_fill_success_num");
// 按槽位失败数求和
SumAggregationBuilder slotFillFailNumAggr
= AggregationBuilders.sum("slot_fill_fail_num").field("records.slot_fill_fail_num");
intentAggr.subAggregation(slotFillNumAggr)
.subAggregation(slotFillSuccessNumAggr)
.subAggregation(slotFillFailNumAggr)
.size(Integer.MAX_VALUE);
NestedAggregationBuilder aggregationBuilder = AggregationBuilders
.nested("records", "records").subAggregation(intentAggr);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);
java代码(返回解析):
Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
ParsedNested details = (ParsedNested) detailsMap.get("records");
Map<String, Aggregation> countMap = details.getAggregations().asMap();
ParsedStringTerms count = (ParsedStringTerms) countMap.get("count");
count.getBuckets().forEach(it -> {
String intentName = (String) it.getKey();
Long intentSum = it.getDocCount();
Map<String, Aggregation> sumMap = it.getAggregations().asMap();
ParsedSum slotFillNum = (ParsedSum) sumMap.get("slot_fill_num");
ParsedSum slotFillSuccessNum = (ParsedSum) sumMap.get("slot_fill_success_num");
ParsedSum slotFillFailNum = (ParsedSum) sumMap.get("slot_fill_fail_num");
result.add(new StatisticIntentSlot(
intentName,
intentSum,
slotFillNum.getValue(),
slotFillSuccessNum.getValue(),
slotFillFailNum.getValue()));
});
} catch (Exception e) {
LoggerHelper.err(getClass(), e.getMessage(), e);
}
DSL语句:
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"records": {
"nested": {
"path": "records"
},
"aggregations": {
"global_variable": {
"filter": {
"exists": {
"field": "records.global_variable_num"
}
},
"aggregations": {
"prev_intent_name": {
"terms": {
"field": "records.prev_intent_name",
"size": 10
},
"aggregations": {
"intent_name": {
"terms": {
"field": "records.intent_name",
"size": 2147483647,
},
"aggregations": {
"global_variable_num": {
"sum": {
"field": "records.global_variable_num"
}
},
"global_variable_success_num": {
"sum": {
"field": "records.global_variable_success_num"
}
}
}
}
}
}
}
}
}
}
}
}
SQL对比:
SELECT records.prev_intent_name,
records.intent_name,
count(records.prev_intent_name) count,
sum(records.global_variable_num) global_variable_num,
sum(records.global_variable_success_num) global_variable_success_num
FROM dialogue_history d
left outer join records on d.id = records.parent_id
group by
records.prev_intent_name,
records.intent_name
HAVING records.global_variable_num is not null
java代码(请求拼接):
// 过滤聚合数据
FilterAggregationBuilder filterAggr
= AggregationBuilders.filter("global_variable", QueryBuilders.existsQuery("records.global_variable_num"));
// 按上一意图统计
TermsAggregationBuilder prevIntentAggr
= AggregationBuilders.terms("prev_intent_name").field("records.prev_intent_name");
// 按意图统计
TermsAggregationBuilder intentAggr
= AggregationBuilders.terms("intent_name").field("records.intent_name");
// 按变量总数求和
SumAggregationBuilder slotFillNumAggr
= AggregationBuilders.sum("global_variable_num").field("records.global_variable_num");
// 按变量成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
= AggregationBuilders.sum("global_variable_success_num").
field("records.global_variable_success_num");
filterAggr.subAggregation(
prevIntentAggr.subAggregation(
intentAggr.subAggregation(slotFillNumAggr).subAggregation(slotFillSuccessNumAggr)
.size(Integer.MAX_VALUE)));
NestedAggregationBuilder aggregationBuilder = AggregationBuilders
.nested("records", "records").subAggregation(filterAggr);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);
java代码(返回解析):
Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
ParsedNested records = (ParsedNested) detailsMap.get("records");
Map<String, Aggregation> globalVariableMap = records.getAggregations().asMap();
ParsedFilter globalVariableFilter = (ParsedFilter) globalVariableMap.get("global_variable");
Map<String, Aggregation> prevIntentNameMap = globalVariableFilter.getAggregations().asMap();
ParsedStringTerms prevIntentNameCount = (ParsedStringTerms) prevIntentNameMap.get("prev_intent_name");
prevIntentNameCount.getBuckets().forEach(prevIntentNameCountIt -> {
String prevIntentName = (String) prevIntentNameCountIt.getKey();
Map<String, Aggregation> intentNameMap = prevIntentNameCountIt.getAggregations().asMap();
ParsedStringTerms intentNameCount = (ParsedStringTerms) intentNameMap.get("intent_name");
intentNameCount.getBuckets().forEach(it -> {
String intentName = (String) it.getKey();
Long intentSum = it.getDocCount();
Map<String, Aggregation> sumMap = it.getAggregations().asMap();
ParsedSum globalVariableNum = (ParsedSum) sumMap.get("global_variable_num");
ParsedSum globalVariableSuccessNum = (ParsedSum) sumMap.get("global_variable_success_num");
result.add(new StatisticGlobalIntentSlot(
intentName,
prevIntentName,
intentSum,
globalVariableNum.getValue(),
globalVariableSuccessNum.getValue()));
});
});
} catch (Exception e) {
LoggerHelper.err(getClass(), e.getMessage(), e);
}