若在传统DBMS 关系型数据库中查询海量数据,特别是模糊查询,一般我们都是使用like %查询的值%,但这样会导致无法应用索引,从而形成全表扫描效率低下,即使是在有索引的字段精确值查找,面对海量数据,效率也是相对较低的,所以目前一般的互联网公司或大型公司,若要查询海量数据,最好的办法就是使用搜索引擎,目前比较主流的搜索引擎框架就是:Elasticsearch,故今天我这里总结了Elasticsearch必知必会的干货知识一:ES索引文档的CRUD,后面陆续还会有其它干货知识分享,敬请期待。
ES索引文档的CRUD(6.X与7.X有区别,6.X中支持一个index创建多个type,而7.X中及以上只支持1个固定的type,即:_doc,API用法上也稍有不同):
Create创建索引文档【POST index/type/id可选,如果index、type、id已存在则重建索引文档(先删除后创建索引文档,与Put index/type/id 原理相同),如果在指定id情况下需要限制自动更新,则可以使用:index/type/id?op_type=create 或 index/type/id/_create,指明操作类型为创建,这样当存在的记录的情况下会报错】
POST demo_users/_doc 或 demo_users/_doc/2vJKsm8BriJODA6s9GbQ/_create
Request Body:
{ "userId":1, "username":"张三", "role":"administrator", "enabled":true, "createdDate":"2020-01-01T12:00:00" }
Response Body:
{ "_index": "demo_users", "_type": "_doc", "_id": "2vJKsm8BriJODA6s9GbQ", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
Get获取索引文档【Get index/type/id】
Get demo_users/_doc/123
Response Body:
{ "_index": "demo_users", "_type": "_doc", "_id": "123", "_version": 1, "found": true, "_source": { "userId": 1, "username": "张三", "role": "administrator", "enabled": true, "createdDate": "2020-01-01T12:00:00" } }
Index Put重建索引文档【PUT index/type/id 或 index/type/id?op_type=index,id必传,如果id不存在文档则创建文档,否则先删除原有id文档后再重新创建文档,version加1】
Put/POST demo_users/_doc/123 或 demo_users/_doc/123?op_type=index
Request Body:
{ "userId":1, "username":"张三", "role":"administrator", "enabled":true, "createdDate":"2020-01-01T12:00:00", "remark":"仅演示" }
Response Body:
{ "_index": "demo_users", "_type": "_doc", "_id": "123", "_version": 4, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 10, "_primary_term": 1 }
Update更新索引文档【POST index/type/id/_update 请求体必需是{"doc":{具体的文档JSON}},如果指定的键字段已存在则更新,如果指定的键字段不存在则附加新的键值对,支持多层级嵌套,多次请求,如果有字段值有更新则version加1,否则提示更新0条 】
POST demo_users/_doc/123/_update
Request Body:
{ "doc": { "userId": 1, "username": "张三", "role": "administrator", "enabled": true, "createdDate": "2020-01-01T12:00:00", "remark": "仅演示POST更新5", "updatedDate": "2020-01-17T15:30:00" } }
Response Body:
{ "_index": "demo_users", "_type": "_doc", "_id": "123", "_version": 26, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 35, "_primary_term": 1 }
Delete删除索引文档【DELETE index/type/id】
DELETE demo_users/_doc/123
Response Body:
{ "_index": "demo_users", "_type": "_doc", "_id": "123", "_version": 2, "result": "deleted", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 39, "_primary_term": 1 }
Bulk批量操作文档【
POST _bulk 或 index/_bulk 或 index/type/_bulk
一次请求支持进行多个索引、多个type的多种不同的CRUD操作,如果操作中有某个出现错误不会影响其它操作;】POST _bulk
Request Body:(注意最后还得多一个换行,因为ES是根据换行符来识别多条命令的,如果缺少最后一条换行则会报错,注意请求体非标准的JSON,每行才是一个JSON,整体顶多可看成是\n区分的JSON对象数组)
{ "index" : { "_index" : "demo_users_test", "_type" : "_doc", "_id" : "1" } } { "bulk_field1" : "测试创建index" } { "delete" : { "_index" : "demo_users", "_type" : "_doc", "_id" : "123" } } { "create" : { "_index" : "demo_users", "_type" : "_doc", "_id" : "2" } } { "bulk_field2" : "测试创建index2" } { "update" : { "_index" : "demo_users_test","_type" : "_doc","_id" : "1" } } { "doc": {"bulk_field1" : "测试创建index1","bulk_field2" : "测试创建index2"} }
Response Body:
{ "took": 162, "errors": true, "items": [ { "index": { "_index": "demo_users_test", "_type": "_doc", "_id": "1", "_version": 8, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 7, "_primary_term": 1, "status": 200 } }, { "delete": { "_index": "demo_users", "_type": "_doc", "_id": "123", "_version": 2, "result": "not_found", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 44, "_primary_term": 1, "status": 404 } }, { "create": { "_index": "demo_users", "_type": "_doc", "_id": "2", "status": 409, "error": { "type": "version_conflict_engine_exception", "reason": "[_doc][2]: version conflict, document already exists (current version [1])", "index_uuid": "u7WE286CQnGqhHeuwW7oyw", "shard": "2", "index": "demo_users" } } }, { "update": { "_index": "demo_users_test", "_type": "_doc", "_id": "1", "_version": 9, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 8, "_primary_term": 1, "status": 200 } } ] }
mGet【POST
_mget 或 index/_mget 或 index/type/_mget
,如果指定了index或type,则请求报文中则无需再指明index或type,可以通过_source指明要查询的include以及要排除exclude的字段】POST _mget
Request Body:
{ "docs": [ { "_index": "demo_users", "_type": "_doc", "_id": "12345" }, { "_index": "demo_users", "_type": "_doc", "_id": "1234567", "_source": [ "userId", "username", "role" ] }, { "_index": "demo_users", "_type": "_doc", "_id": "1234", "_source": { "include": [ "userId", "username" ], "exclude": [ "role" ] } } ] }
Response Body:
{ "docs":[ { "_index":"demo_users", "_type":"_doc", "_id":"12345", "_version":1, "found":true, "_source":{ "userId":1, "username":"张三", "role":"administrator", "enabled":true, "createdDate":"2020-01-01T12:00:00" } }, { "_index":"demo_users", "_type":"_doc", "_id":"1234567", "_version":7, "found":true, "_source":{ "role":"administrator", "userId":1, "username":"张三" } }, { "_index":"demo_users", "_type":"_doc", "_id":"1234", "_version":1, "found":true, "_source":{ "userId":1, "username":"张三" } } ] }
POST demo_users/_doc/_mget
Request Body:
{ "ids": [ "1234", "12345", "123457" ] }
Response Body:
{ "docs":[ { "_index":"demo_users", "_type":"_doc", "_id":"1234", "_version":1, "found":true, "_source":{ "userId":1, "username":"张三", "role":"administrator", "enabled":true, "createdDate":"2020-01-01T12:00:00", "remark":"仅演示" } }, { "_index":"demo_users", "_type":"_doc", "_id":"12345", "_version":1, "found":true, "_source":{ "userId":1, "username":"张三", "role":"administrator", "enabled":true, "createdDate":"2020-01-01T12:00:00" } }, { "_index":"demo_users", "_type":"_doc", "_id":"123457", "found":false } ] }
_update_by_query根据查询条件更新匹配到的索引文档的指定字段【
POST index/_update_by_query
请求体写查询条件以及更新的字段,更新字段这里采用了painless脚本进行灵活更新】POST demo_users/_update_by_query
Request Body:(意思是查询role=administrator【可能大家看到keyword,这是因为role字段为text类型,无法直接匹配,需要借助于子字段role.keyword,如果有不理解后面会有简要说明】,更新role为poweruser、remark为remark+采用_update_by_query更新)
{ "script":{ "source":"ctx._source.role=params.role;ctx._source.remark=ctx._source.remark+params.remark", "lang":"painless", "params":{ "role":"poweruser", "remark":"采用_update_by_query更新" } }, "query":{ "term":{ "role.keyword":"administrator" } } }
painless写法请具体参考:painless语法教程
Response Body:
{ "took": 114, "timed_out": false, "total": 6, "updated": 6, "deleted": 0, "batches": 1, "version_conflicts": 0, "noops": 0, "retries": { "bulk": 0, "search": 0 }, "throttled_millis": 0, "requests_per_second": -1, "throttled_until_millis": 0, "failures": [ ] }
_delete_by_query根据查询条件删除匹配到的索引文档【
POST index/_delete_by_query
请求体写查询匹配条件】POST demo_users/_delete_by_query
Request Body:(意思是查询enabled=false)
{ "query": { "match": { "enabled": false } } }
Response Body:
{ "took":29, "timed_out":false, "total":3, "deleted":3, "batches":1, "version_conflicts":0, "noops":0, "retries":{ "bulk":0, "search":0 }, "throttled_millis":0, "requests_per_second":-1, "throttled_until_millis":0, "failures":[ ] }
search查询
URL GET查询(GET index/_search?q=query_string语法,注意中文内容默认分词器是一个汉字拆分成一个term)
A.Term Query:【即分词片段(词条)查询,注意这里讲的包含是指与分词片段匹配】 GET /demo_users/_search?q=role:poweruser //指定字段查询,即:字段包含查询的值 GET /demo_users/_search?q=poweruser //泛查询(没有指定查询的字段),即查询文档中所有字段包含poweruser的值,只要有一个字段符合,那么该文档将会被返回 B.Phrase Query【即分组查询】 操作符有:AND / OR / NOT 或者表示为: && / || / ! +表示must -表示must_not 例如:field:(+a -b)意为field中必需包含a但不能包含b GET /demo_users/_search?q=remark:(POST test) GET /demo_users/_search?q=remark:(POST OR test) GET /demo_users/_search?q=remark:"POST test" //分组查询,即:查询remark中包含POST 或 test的文档记录 GET /demo_users/_search?q=remark:(test AND POST) //remark同时包含test与POST GET /demo_users/_search?q=remark:(test NOT POST) //remark包含test但不包含POST C.范围查询 区间表示:[]闭区间,{}开区间 如:year:[2019 TO 2020] 或 {2019 TO 2020} 或 {2019 TO 2020] 或 [* TO 2020] 算数符号 year:>2019 或 (>2012 && <=2020) 或 (+>=2012 +<=2020) GET /demo_users/_search?q=userId:>123 //查询userId字段大于123的文档记录 D.通配符查询 ?表示匹配任意1个字符,*表示匹配0或多个字符 例如:role:power* , role:use? GET /demo_users/_search?q=role:power* //查询role字段前面是power,后面可以是0或多个其它任意字符。 可使用正则表达式,如:username:张三\d+ 可使用近似查询偏移量(slop)提高查询匹配结果【使用~N,N表示偏移量】 GET /demo_users/_search?q=remark:tett~1 //查询remark中包含test的文档,但实际写成了tett,故使用~1偏移近似查询,可以获得test的查询结果 GET /demo_users/_search?q=remark:"i like shenzhen"~2 //查询i like shenzhen但实际remark字段中值为:i like hubei and shenzhen,比查询值多了 hubei and,这里使用~2指定可偏移相隔2个term(这里即两个单词),最终也是可以查询出结果
DSL POST查询(POST index/_search)
POST demo_users/_search
Request Body:
{ "query":{ "bool":{ "must":[ { "term":{ "enabled":"true" #查询enabled=true } }, { "term":{ "role.keyword":"poweruser" #且role=poweruser } }, { "query_string":{ "default_field":"username.keyword", "query":"张三" #且 username 包含张三 } } ], "must_not":[ ], "should":[ ] } }, "from":0, "size":1000, "sort":[ { "createdDate":"desc" #根据createdDate倒序 } ], "_source":{ #指明返回的字段,includes需返回字段,excludes不需要返回字段 "includes":[ "role", "username", "userId", "remark" ], "excludes":[ ] } }
具体用法可参见:
【Elasticsearch】query_string的各种用法
Elasticsearch中 match、match_phrase、query_string和term的区别
Elasticsearch Query DSL 整理总结
[布尔查询Bool Query]
最后附上ES官方的API操作链接指引:
Indices APIs:负责索引Index的创建(create)、删除(delete)、获取(get)、索引存在(exist)等操作。
Document APIs:负责索引文档的创建(index)、删除(delete)、获取(get)等操作。
Search APIs:负责索引文档的search(查询),Document APIS根据doc_id进行查询,Search APIs]根据条件查询。
Aggregations:负责针对索引的文档各维度的聚合(Aggregation)。
cat APIs:负责查询索引相关的各类信息查询。
Cluster APIs:负责集群相关的各类信息查询。