1、能用term就不用match_phrase
The Lucene nightly benchmarks show that a simple term query is about 10 times as fast as a phrase query, and about 20 times as fast as a proximity query (a phrase query with slop).
term查询比match_phrase性能要快10倍,比带slop的match_phrase快20倍。
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick" } } } 变为 GET /my_index/my_type/_search { "query": { "term": { "title": "quick" } } }
2、如果查询条件与文档排序无关,则一定要用filter,既不用参与分数计算,还能缓存数据,加快下次查询。
比如说要查询类型为Ford,黄色的,名字包含dev的汽车,一般的查询语句应该如下:
GET /my_index/my_type/_search { "bool": { "must": [ { "term": { "type": "ford" } }, { "term": { "color": "yellow" } }, { "term": { "name": "dev" } } ] } }
上述查询中类型和颜色同样参与了文档排名得分的计算,但是由于类型和颜色仅作为过滤条件,计算得分至于name的匹配相关。因此上述的查询是不合理且效率不高的。
GET /my_index/my_type/_search { "bool": { "must": { "term": { "name": "dev" } }, "filter": [ { "term": { "type": "ford" } }, { "term": { "color": "yellow" } }] } }
3、如果对查出的数据的顺序没有要求,则可按照_doc排序,取数据时按照插入的顺序返回。
_doc has no real use-case besides being the most efficient sort order. So if you don’t care about the order in which documents are returned, then you should sort by _doc. This especially helps when scrolling. _doc to sort by index order.
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort":[ "_doc" ] }
4、随机取n条(n>=10000)数据
1)可以利用ES自带的方法random score查询。缺点慢,消耗内存。
GET /my_index/my_type/_search { "size": 10000, "query": { "function_score": { "query": { "term": { "name": "dev" } }, "random_score": { } } } }
2)可以利用ES的脚本查询。缺点比random score少消耗点内存,但比random score慢。
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort": { "_script": { "type": "number", "script": { "lang": "painless", "inline": "Math.random()" }, "order": "asc" } } }
3)插入数据时,多加一个字段mark,该字段的值随机生成。查询时,对该字段排序即可。
GET /my_index/my_type/_search { "query": { "term": { "name": "dev" } }, "sort":[ "mark" ] }
5、range Aggregations时耗时太长
{ "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "from" : 10, "to" : 50 }, { "from" : 50, "to" : 70 }, { "from" : 70, "to" : 100 } ] } } } }
如例子所示,我们对[10,50),[50,70),[70,100)三个区间做了聚合操作。因为涉及到比较操作,数据量较大的情况下,可能会比较慢。 解决方案:在插入时,将要聚合的区间以keyword的形式写入索引中,查询时,对该字段做聚合即可。
假设price都小于100,插入的字段为mark,mark的值为10-50, 50-70, 70-100。 { "aggs" : { "genres" : { "terms" : { "field" : "mark" } } } }
6、查询空字符串
如果是要查字段是否存在或丢失,用Exists Query查询即可(exists, must_not exits)。
GET /_search { "query": { "exists" : { "field" : "user" } } } GET /_search { "query": { "bool": { "must_not": { "exists": { "field": "user" } } } } }
这里指的是字段存在,且字段为“”的field。
curl localhost:9200/customer/_search?pretty -d'{ "size": 5, "query": { "bool": { "must": { "script": { "script": { "inline": "doc['\''strnickname'\''].length()<1", "lang": "painless" } } } } } }'