Elasticsearch-sql针对groupby 聚合查询limit问题

在使用Elasticsearch-sql插件查询ES中,我们经常遇到多个字段group by聚合查询,例如:

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num    
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,
ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31    
where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z'    
group by pageRef,province,city,ip limit 100

解析后:

{
    "from": 0,
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "match_phrase": {
                                                "appCode": {
                                                    "query": "5f05acfc9a084d9f9a07e165a2516c18",
                                                    "slop": 0,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": "2018-07-07T09:57:15.436Z",
                                                    "to": null,
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": null,
                                                    "to": "2018-08-07T09:57:15.436Z",
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        }
                                    ],
                                    "disable_coord": false,
                                    "adjust_pure_negative": true,
                                    "boost": 1
                                }
                            }
                        ],
                        "disable_coord": false,
                        "adjust_pure_negative": true,
                        "boost": 1
                    }
                }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "_source": {
        "includes": [
            "SUM"
        ],
        "excludes": []
    },
    "aggregations": {
        "pageRef": {
            "terms": {
                "field": "pageRef",
                "size": 100,
                "shard_size": 2000,
                "min_doc_count": 1,
                "shard_min_doc_count": 0,
                "show_term_doc_count_error": false,
                "order": [
                    {
                        "_count": "desc"
                    },
                    {
                        "_term": "asc"
                    }
                ]
            },
            "aggregations": {
                "province": {
                    "terms": {
                        "field": "province",
                        "size": 10,
                        "min_doc_count": 1,
                        "shard_min_doc_count": 0,
                        "show_term_doc_count_error": false,
                        "order": [
                            {
                                "_count": "desc"
                            },
                            {
                                "_term": "asc"
                            }
                        ]
                    },
                    "aggregations": {
                        "city": {
                            "terms": {
                                "field": "city",
                                "size": 10,
                                "min_doc_count": 1,
                                "shard_min_doc_count": 0,
                                "show_term_doc_count_error": false,
                                "order": [
                                    {
                                        "_count": "desc"
                                    },
                                    {
                                        "_term": "asc"
                                    }
                                ]
                            },
                            "aggregations": {
                                "ip": {
                                    "terms": {
                                        "field": "ip",
                                        "size": 10,
                                        "min_doc_count": 1,
                                        "shard_min_doc_count": 0,
                                        "show_term_doc_count_error": false,
                                        "order": [
                                            {
                                                "_count": "desc"
                                            },
                                            {
                                                "_term": "asc"
                                            }
                                        ]
                                    },
                                    "aggregations": {
                                        "num": {
                                            "sum": {
                                                "field": "errorCount"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

我们看到解析后的json看到:limit 15中的15只对group by 后面的第一个字段起作用,其他的字段size其实都是10,limit并没起作用,这就是Elasticsearch-sql针对group by存在的问题。

解决方式为:使用terms(field='correspond_brand_name',size='10',alias='correspond_brand_name',include='\".*sport.*\"',exclude='\"water_.*\"')")

注意:这种方式不再添加limit关键词,另外还要注意group by后面字段的顺序不一样,因为数据的情况,查询结果条数不一样,但是整体是没有问题的。

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num    
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,
ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31    
where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z'    
group by terms(field='pageRef',size='15',alias='pageRef'),
terms(field='province',size='15',alias='province'),
terms(field='city',size='15',alias='city'),
terms(field='ip',size='15',alias='ip')

解析后:

{
    "from": 0,
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "match_phrase": {
                                                "appCode": {
                                                    "query": "5f05acfc9a084d9f9a07e165a2516c18",
                                                    "slop": 0,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": "2018-07-07T09:57:15.436Z",
                                                    "to": null,
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        },
                                        {
                                            "range": {
                                                "logTime": {
                                                    "from": null,
                                                    "to": "2018-08-07T09:57:15.436Z",
                                                    "include_lower": true,
                                                    "include_upper": true,
                                                    "boost": 1
                                                }
                                            }
                                        }
                                    ],
                                    "disable_coord": false,
                                    "adjust_pure_negative": true,
                                    "boost": 1
                                }
                            }
                        ],
                        "disable_coord": false,
                        "adjust_pure_negative": true,
                        "boost": 1
                    }
                }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "_source": {
        "includes": [
            "SUM"
        ],
        "excludes": []
    },
    "aggregations": {
        "pageRef": {
            "terms": {
                "field": "pageRef",
                "size": 15,
                "min_doc_count": 1,
                "shard_min_doc_count": 0,
                "show_term_doc_count_error": false,
                "order": [
                    {
                        "_count": "desc"
                    },
                    {
                        "_term": "asc"
                    }
                ]
            },
            "aggregations": {
                "province": {
                    "terms": {
                        "field": "province",
                        "size": 15,
                        "min_doc_count": 1,
                        "shard_min_doc_count": 0,
                        "show_term_doc_count_error": false,
                        "order": [
                            {
                                "_count": "desc"
                            },
                            {
                                "_term": "asc"
                            }
                        ]
                    },
                    "aggregations": {
                        "city": {
                            "terms": {
                                "field": "city",
                                "size": 15,
                                "min_doc_count": 1,
                                "shard_min_doc_count": 0,
                                "show_term_doc_count_error": false,
                                "order": [
                                    {
                                        "_count": "desc"
                                    },
                                    {
                                        "_term": "asc"
                                    }
                                ]
                            },
                            "aggregations": {
                                "ip": {
                                    "terms": {
                                        "field": "ip",
                                        "size": 15,
                                        "min_doc_count": 1,
                                        "shard_min_doc_count": 0,
                                        "show_term_doc_count_error": false,
                                        "order": [
                                            {
                                                "_count": "desc"
                                            },
                                            {
                                                "_term": "asc"
                                            }
                                        ]
                                    },
                                    "aggregations": {
                                        "num": {
                                            "sum": {
                                                "field": "errorCount"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

从解析后的内容看出:四个字段的size都是15了,可以使用postman查询看看,结果是正确的。

Elasticsearch-sql针对groupby 聚合查询limit问题_第1张图片

Elasticsearch-sql针对groupby 聚合查询limit问题_第2张图片

 语法来自;https://github.com/NLPchina/elasticsearch-sql中的terms用法。

补充:如果是nested(嵌套查询),比如:

Elasticsearch-sql针对groupby 聚合查询limit问题_第3张图片

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num   
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31   
where appCode = '5f05acfc9a084d9f9a07e165a2516c18'        
and logTime>= '2018-07-08T06:20:13.144Z'    
and logTime<= '2018-08-08T06:20:13.144Z'
group by pageRef,province,city,ip,nested(errors.message) limit 10

那么需要这么来查:

select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num   
from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31   
where appCode = '5f05acfc9a084d9f9a07e165a2516c18'        
and logTime>= '2018-07-08T06:20:13.144Z'    
and logTime<= '2018-08-08T06:20:13.144Z'
group by terms(field='pageRef',size='15',alias='pageRef'),
terms(field='province',size='1',alias='province'),
terms(field='city',size='2',alias='city'),
terms(field='ip',size='3',alias='ip'),
terms(field='errors.message',size='4',alias='errors.message',nested="errors")

对于聚合查询可以参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations.html

本文版权归作者所有,欢迎转载,请在文章页面明显位置给出原文连接:https://www.cnblogs.com/wynjauu/articles/9439089.html

作者:敲完代码好睡觉

转载于:https://www.cnblogs.com/wynjauu/articles/9439089.html

你可能感兴趣的:(大数据,c/c++,postman)