我们在通过elasticsearch查询text类型的字段时,我们使用aggs进行聚合某个text类型field。这时elasticsearch会自动进行分词将分词后的结果进行聚合。获取每一个分词出现在文档的文档个数。注意:是文档的次数不是文档中分词出现的次数,也就是说即便某个词在某个文档中出现了多次,但是只记录这个词的doc_count次数为1.
添加一个可分词的text字段模板:
需要添加 analyzer 和 fielddata两个属性
"allContent": {
"type": "text",
"analyzer": "ik_smart",
"fielddata": true
}
查询语句例子:
GET voice*/_search
{
"_source": "{transData.allContent}",
"query": {},
"aggs": {
"hotword": {
"terms": {
"field": "transData.allContent",
"size": 10,
"order": {
"_count": "desc"
}
}
}
},
"size": 0
}
查询结果例子:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"hotword": {
"doc_count_error_upper_bound": 1,
"sum_other_doc_count": 314,
"buckets": [
{
"key": "ok",
"doc_count": 119
},
{
"key": "一",
"doc_count": 123
},
{
"key": "一下",
"doc_count": 114
},
{
"key": "一个",
"doc_count": 91
},
{
"key": "一个月",
"doc_count": 52
},
{
"key": "一些",
"doc_count": 23
},
{
"key": "一包",
"doc_count": 13
},
{
"key": "一块",
"doc_count": 11
},
{
"key": "一天",
"doc_count": 4
},
{
"key": "一定",
"doc_count": 2
}
]
}
}
}