安装。要求内核3.5,这就对应redhat7和centos7。
一些基本概念:https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html
Near real time(NRT):不是严格的实时数据同步,可能有些许差距,但也是秒级的,通常一秒。
Cluster:集群环境,通常多个node,实现高可用性。
Node:就代表了一台机器,提供一个ES服务。多个node组成一个cluster。
index:多个document的集合,通常拥有类似的特性characteristics。
type:一个index中可以定义多个type,type是对数据的逻辑分类,比如一个博客系统中,用户数据是一种type,博客数据时一种type,用户评论是一种type。
document:最基本的数据单元。例如,用户信息是一类document,订单信息是一类document。
shards:一个index中的数据可能非常大,所以一个index可能拆分为多个shards,这样解决了容量限制问题,也可以提供并发处理的性能。
replicas:通常shards会有不止一份(考虑到系统故障,高可用性),所以replicas就是复制的意思,它必须分布在不同node。
个人理解:
与数据库做类比,index类似数据库中的table,type类似数据库中table中的column,document类似数据库中table中的行row。
下载直接解压即可使用。
[appadmin@hadoop4 bin]$ ./elasticsearch [2016-03-17 17:49:50,177][INFO ][node ] [Living Lightning] version[2.2.1], pid[21524], build[d045fc2/2016-03-09T09:38:54Z] [2016-03-17 17:49:50,178][INFO ][node ] [Living Lightning] initializing ... [2016-03-17 17:49:51,006][INFO ][plugins ] [Living Lightning] modules [lang-groovy, lang-expression], plugins [], sites [] [2016-03-17 17:49:51,053][INFO ][env ] [Living Lightning] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [6.1gb], net total_space [8.4gb], spins? [unknown], types [rootfs] [2016-03-17 17:49:51,053][INFO ][env ] [Living Lightning] heap size [1007.3mb], compressed ordinary object pointers [true] [2016-03-17 17:49:51,054][WARN ][env ] [Living Lightning] max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536] [2016-03-17 17:49:54,134][INFO ][node ] [Living Lightning] initialized [2016-03-17 17:49:54,135][INFO ][node ] [Living Lightning] starting ... [2016-03-17 17:49:54,238][INFO ][transport ] [Living Lightning] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300} [2016-03-17 17:49:54,254][INFO ][discovery ] [Living Lightning] elasticsearch/xlgtVqIGT_eqqdDuJMvTug [2016-03-17 17:49:57,312][INFO ][cluster.service ] [Living Lightning] new_master {Living Lightning}{xlgtVqIGT_eqqdDuJMvTug}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received) [2016-03-17 17:49:57,480][INFO ][http ] [Living Lightning] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200} [2016-03-17 17:49:57,480][INFO ][node ] [Living Lightning] started [2016-03-17 17:49:57,504][INFO ][gateway ] [Living Lightning] recovered [0] indices into cluster_state
启动后健康检查:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/health?v' epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1458208581 17:56:21 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0% [appadmin@hadoop4 ~]$
Cluster环境中列出所有node:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/nodes?v' host ip heap.percent ram.percent load node.role master name 127.0.0.1 127.0.0.1 3 32 0.01 d * Living Lightning [appadmin@hadoop4 ~]$
列出所有index:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size [appadmin@hadoop4 ~]$
这里是新建环境,没有任何index。
创建名为customer的index:
[appadmin@hadoop4 ~]$ curl -XPUT 'localhost:9200/customer?pretty' { "acknowledged" : true } [appadmin@hadoop4 ~]$ [appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size yellow open customer 5 1 0 0 650b 650b [appadmin@hadoop4 ~]$
创建一个customer类型的document。
[appadmin@hadoop4 ~]$ curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' > { > "name":"John Doe" > }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true } [appadmin@hadoop4 ~]$
查看含有index的document:
[appadmin@hadoop4 ~]$ curl -XGET 'localhost:9200/customer/external/1?pretty' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "John Doe" } } [appadmin@hadoop4 ~]$
删除index:
[appadmin@hadoop4 ~]$ curl -XDELETE 'localhost:9200/customer?pretty' { "acknowledged" : true } [appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size [appadmin@hadoop4 ~]$
总结一下,语法为REST格式:
curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>
由于刷新间隔,可能会碰到数据没有及时更新的情况,比如相差1秒。这是与SQL数据库平台的明显差别。
创建:
[appadmin@hadoop4 ~]$ curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' > { > "name":"John doe" > }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true } [appadmin@hadoop4 ~]$ [appadmin@hadoop4 ~]$ curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' > { > "name":"Jane Doee" > }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 2, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : false } [appadmin@hadoop4 ~]$
这里的ID都是1,所以第二次操作,会将name由"John doe"改为"Jane Doee"。还有一处区别是version。
如果不指定id,那么会随机生成id,例如:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external?pretty' -d ' { "name":"Jack London" }' { "_index" : "customer", "_type" : "external", "_id" : "AVOHw3NsnhAb0PWbDpXt", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true } [appadmin@hadoop4 ~]$
更新document,不会替换原有的document,而是新建然后删除老的document,最后做index。
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d ' > { > "doc":{"name":"Linda Doe"} > }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 3, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } } [appadmin@hadoop4 ~]$
注意这里version又有变化。
下面显示了更新name,并新增字段age。同时注意version依然变化。
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d ' > { > "doc":{"name":"Lucy Doe","age":20} > }' { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 4, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } } [appadmin@hadoop4 ~]$
下面方法还可以通过script来修改属性,比如修改id为1的(上面例子)age属性,让这个属性加5。
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
"script" : "ctx._source.age += 5"
}'
但是脚本方法再1.4.3版本上默认禁止了。怎么打开?稍后再说。
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d ' > { > "script":"ctx._source.age+=5" > }' { "error" : { "root_cause" : [ { "type" : "remote_transport_exception", "reason" : "[Catiana][127.0.0.1:9300][indices:data/write/update[s]]" } ], "type" : "illegal_argument_exception", "reason" : "failed to execute script", "caused_by" : { "type" : "script_exception", "reason" : "scripts of type [inline], operation [update] and lang [groovy] are disabled" } }, "status" : 400 }
在这里参考脚本的用法:https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html
这里是针对单独document的修改,后面还有针对多个文档的修改方法。就像SQL中的UPDATE-WHERE方法。
删除文档:
[appadmin@hadoop4 ~]$ curl -XDELETE 'localhost:9200/customer/external/2?pretty' { "found" : false, "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } } [appadmin@hadoop4 ~]$
批量处理:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d ' > {"index":{"_id":"1"}} > {"name": "John Doe" } > {"index":{"_id":"2"}} > {"name": "Jane Doe" } > ' { "took" : 101, "errors" : false, "items" : [ { "index" : { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 5, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 200 } }, { "index" : { "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 201 } } ] } [appadmin@hadoop4 ~]$
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d ' > {"update":{"_id":"1"}} > {"doc":{"name":"John Doe becomes Jane Doe"}} > {"delete":{"_id":"2"}} > ' { "took" : 85, "errors" : false, "items" : [ { "update" : { "_index" : "customer", "_type" : "external", "_id" : "1", "_version" : 6, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 200 } }, { "delete" : { "_index" : "customer", "_type" : "external", "_id" : "2", "_version" : 2, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 200, "found" : true } } ] } [appadmin@hadoop4 ~]$
这种buk操作是逐一操作的,如果其中一个操作出现错误,那么会继续执行下面的操作。
下面我们创建一些测试的数据,数据来源:https://github.com/bly2k/files/blob/master/accounts.zip?raw=true 。这些数据建立在index名为bank下,一共有1000个document。
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json" 省略了很多数据 }, { "index" : { "_index" : "bank", "_type" : "account", "_id" : "990", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 201 } }, { "index" : { "_index" : "bank", "_type" : "account", "_id" : "995", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "status" : 201 } } ] } [appadmin@hadoop4 ~]$ curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size yellow open customer 5 1 2 0 6.7kb 6.7kb yellow open bank 5 1 1000 0 447kb 447kb [appadmin@hadoop4 ~]$
下面介绍搜索方法(API)。有两种方式的搜索:REST request URI和REST request BODY。具体方式是通过名为_search的endpoint。例如:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/bank/_search?q=*&pretty' { "took" : 43, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.0, "_source" : { "account_number" : 25, "balance" : 40540, "firstname" : "Virginia", "lastname" : "Ayala", "age" : 39, "gender" : "F", "address" : "171 Putnam Avenue", "employer" : "Filodyne", "email" : "[email protected]", "city" : "Nicholson", "state" : "PA" } }, { "_index" : "bank", "_type" : "account", "_id" : "44", "_score" : 1.0, "_source" : { "account_number" : 44, "balance" : 34487, "firstname" : "Aurelia", "lastname" : "Harding", "age" : 37, "gender" : "M", "address" : "502 Baycliff Terrace", "employer" : "Orbalix", "email" : "[email protected]", "city" : "Yardville", "state" : "DE" } }, { "_index" : "bank", "_type" : "account", "_id" : "99", "_score" : 1.0, "_source" : { "account_number" : 99, "balance" : 47159, "firstname" : "Ratliff", "lastname" : "Heath", "age" : 39, "gender" : "F", "address" : "806 Rockwell Place", "employer" : "Zappix", "email" : "[email protected]", "city" : "Shaft", "state" : "ND" } }, { "_index" : "bank", "_type" : "account", "_id" : "119", "_score" : 1.0, "_source" : { "account_number" : 119, "balance" : 49222, "firstname" : "Laverne", "lastname" : "Johnson", "age" : 28, "gender" : "F", "address" : "302 Howard Place", "employer" : "Senmei", "email" : "[email protected]", "city" : "Herlong", "state" : "DC" } }, { "_index" : "bank", "_type" : "account", "_id" : "126", "_score" : 1.0, "_source" : { "account_number" : 126, "balance" : 3607, "firstname" : "Effie", "lastname" : "Gates", "age" : 39, "gender" : "F", "address" : "620 National Drive", "employer" : "Digitalus", "email" : "[email protected]", "city" : "Blodgett", "state" : "MD" } }, { "_index" : "bank", "_type" : "account", "_id" : "145", "_score" : 1.0, "_source" : { "account_number" : 145, "balance" : 47406, "firstname" : "Rowena", "lastname" : "Wilkinson", "age" : 32, "gender" : "M", "address" : "891 Elton Street", "employer" : "Asimiline", "email" : "[email protected]", "city" : "Ripley", "state" : "NH" } }, { "_index" : "bank", "_type" : "account", "_id" : "183", "_score" : 1.0, "_source" : { "account_number" : 183, "balance" : 14223, "firstname" : "Hudson", "lastname" : "English", "age" : 26, "gender" : "F", "address" : "823 Herkimer Place", "employer" : "Xinware", "email" : "[email protected]", "city" : "Robbins", "state" : "ND" } }, { "_index" : "bank", "_type" : "account", "_id" : "190", "_score" : 1.0, "_source" : { "account_number" : 190, "balance" : 3150, "firstname" : "Blake", "lastname" : "Davidson", "age" : 30, "gender" : "F", "address" : "636 Diamond Street", "employer" : "Quantasis", "email" : "[email protected]", "city" : "Crumpler", "state" : "KY" } }, { "_index" : "bank", "_type" : "account", "_id" : "208", "_score" : 1.0, "_source" : { "account_number" : 208, "balance" : 40760, "firstname" : "Garcia", "lastname" : "Hess", "age" : 26, "gender" : "F", "address" : "810 Nostrand Avenue", "employer" : "Quiltigen", "email" : "[email protected]", "city" : "Brooktrails", "state" : "GA" } }, { "_index" : "bank", "_type" : "account", "_id" : "222", "_score" : 1.0, "_source" : { "account_number" : 222, "balance" : 14764, "firstname" : "Rachelle", "lastname" : "Rice", "age" : 36, "gender" : "M", "address" : "333 Narrows Avenue", "employer" : "Enaut", "email" : "[email protected]", "city" : "Wright", "state" : "AZ" } } ] } } [appadmin@hadoop4 ~]$
这里可以看到虽然有1000条记录,但是只返回了10条,因为返回10条是默认配置。下面将省略一些搜索结果。
同样的功能,换一种方式:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match_all":{}} > }' { "took" : 6, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.0, "_source" : { "account_number" : 25, "balance" : 40540, "firstname" : "Virginia", "lastname" : "Ayala", "age" : 39, "gender" : "F", "address" : "171 Putnam Avenue", "employer" : "Filodyne", "email" : "[email protected]", "city" : "Nicholson", "state" : "PA" } 省略了中间的搜索结果 "_index" : "bank", "_type" : "account", "_id" : "222", "_score" : 1.0, "_source" : { "account_number" : 222, "balance" : 14764, "firstname" : "Rachelle", "lastname" : "Rice", "age" : 36, "gender" : "M", "address" : "333 Narrows Avenue", "employer" : "Enaut", "email" : "[email protected]", "city" : "Wright", "state" : "AZ" } } ] } } [appadmin@hadoop4 ~]$
一旦查询结束,ES返回所有数据,它的任务也完成,这与SQL平台不同,SQL平台会保留资源例如cursor,而ES不会。
下面是一些常用的搜索:
只返回一条结果的查询。(size的值默认为10)
[appadmin@hadoop4 ~]$ curl 'localhost:9200/bank/_search?pretty' -d ' { "query":{"match_all":{}}, "size":1 }' { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.0, "_source" : { "account_number" : 25, "balance" : 40540, "firstname" : "Virginia", "lastname" : "Ayala", "age" : 39, "gender" : "F", "address" : "171 Putnam Avenue", "employer" : "Filodyne", "email" : "[email protected]", "city" : "Nicholson", "state" : "PA" } } ] } }
从第10条记录开始,返回10条记录:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match_all":{}}, > "from":10, > "size":10 > }' { "took" : 12, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "227", "_score" : 1.0, "_source" : { "account_number" : 227, "balance" : 19780, "firstname" : "Coleman", "lastname" : "Berg", "age" : 22, "gender" : "M", "address" : "776 Little Street", "employer" : "Exoteric", "email" : "[email protected]", "city" : "Eagleville", "state" : "WV" } 省略了中间部分 }, { "_index" : "bank", "_type" : "account", "_id" : "450", "_score" : 1.0, "_source" : { "account_number" : 450, "balance" : 2643, "firstname" : "Bradford", "lastname" : "Nielsen", "age" : 25, "gender" : "M", "address" : "487 Keen Court", "employer" : "Exovent", "email" : "[email protected]", "city" : "Hamilton", "state" : "DE" } } ] } }
查询并排序:
[appadmin@hadoop4 ~]$ curl 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match_all":{}}, > "sort":{"balance":{"order":"desc"}} > }' { "took" : 68, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "248", "_score" : null, "_source" : { "account_number" : 248, "balance" : 49989, "firstname" : "West", "lastname" : "England", "age" : 36, "gender" : "M", "address" : "717 Hendrickson Place", "employer" : "Obliq", "email" : "[email protected]", "city" : "Maury", "state" : "WA" }, "sort" : [ 49989 ] 省略了中间部分 "_index" : "bank", "_type" : "account", "_id" : "572", "_score" : null, "_source" : { "account_number" : 572, "balance" : 49355, "firstname" : "Therese", "lastname" : "Espinoza", "age" : 20, "gender" : "M", "address" : "994 Chester Court", "employer" : "Gonkle", "email" : "[email protected]", "city" : "Hayes", "state" : "UT" }, "sort" : [ 49355 ] } ] } } [appadmin@hadoop4 ~]$
进一步的搜索研究。
搜索返回特定字段:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match_all":{}}, > "_source":["account_number","balance"], > "size":2 > }' { "took" : 9, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.0, "_source" : { "balance" : 40540, "account_number" : 25 } }, { "_index" : "bank", "_type" : "account", "_id" : "44", "_score" : 1.0, "_source" : { "balance" : 34487, "account_number" : 44 } } ] } }
返回account_number为20的document:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match":{"account_number":20}} > }' { "took" : 16, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 5.6587105, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "20", "_score" : 5.6587105, "_source" : { "account_number" : 20, "balance" : 16418, "firstname" : "Elinor", "lastname" : "Ratliff", "age" : 36, "gender" : "M", "address" : "282 Kings Place", "employer" : "Scentric", "email" : "[email protected]", "city" : "Ribera", "state" : "WA" } } ] } } [appadmin@hadoop4 ~]$
返回address字段中含有字符mill(不区分大小写)的document:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match":{"address":"mill"}} > }' { "took" : 12, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 2.8025851, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "472", "_score" : 2.8025851, "_source" : { "account_number" : 472, "balance" : 25571, "firstname" : "Lee", "lastname" : "Long", "age" : 32, "gender" : "F", "address" : "288 Mill Street", "employer" : "Comverges", "email" : "[email protected]", "city" : "Movico", "state" : "MT" } 省略中间2个document "_index" : "bank", "_type" : "account", "_id" : "345", "_score" : 2.6023464, "_source" : { "account_number" : 345, "balance" : 9812, "firstname" : "Parker", "lastname" : "Hines", "age" : 38, "gender" : "M", "address" : "715 Mill Avenue", "employer" : "Baluba", "email" : "[email protected]", "city" : "Blackgum", "state" : "KY" } } ] } }
返回address字段含有mill或者lane字符的document:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match":{"address":"mill lane"}} > }' { "took" : 27, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 19, "max_score" : 3.5637083, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 3.5637083, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "[email protected]", "city" : "Urie", "state" : "IL" } }, { "_index" : "bank", "_type" : "account", "_id" : "472", "_score" : 1.0920914, "_source" : { "account_number" : 472, "balance" : 25571, "firstname" : "Lee", "lastname" : "Long", "age" : 32, "gender" : "F", "address" : "288 Mill Street", "employer" : "Comverges", "email" : "[email protected]", "city" : "Movico", "state" : "MT" } 省略中间一部分 "_index" : "bank", "_type" : "account", "_id" : "742", "_score" : 0.82493615, "_source" : { "account_number" : 742, "balance" : 24765, "firstname" : "Merle", "lastname" : "Wooten", "age" : 26, "gender" : "M", "address" : "317 Pooles Lane", "employer" : "Tropolis", "email" : "[email protected]", "city" : "Bentley", "state" : "ND" } } ] } }
搜索adress中包含mill lane字符的document:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{"match_phrase":{"address":"mill lane"}} > }' { "took" : 18, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 5.00982, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 5.00982, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "[email protected]", "city" : "Urie", "state" : "IL" } } ] } } [appadmin@hadoop4 ~]$
搜索adress中包含mill lane字符的document,这里用到了bool query
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{ > "bool":{ > "must":[ > {"match":{"address":"mill"}}, > {"match":{"address":"lane"}} > ] > } > } > }' { "took" : 6, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 3.5637083, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 3.5637083, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "[email protected]", "city" : "Urie", "state" : "IL" } } ] } } [appadmin@hadoop4 ~]$
搜索即不包含mill又不包含lane字符的document:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{ > "bool":{ > "must_not":[ > {"match":{"address":"MILL"}}, > {"match":{"address":"lane"}} > ] > } > } > }' { "took" : 14, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 981, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.0, "_source" : { "account_number" : 25, "balance" : 40540, "firstname" : "Virginia", "lastname" : "Ayala", "age" : 39, "gender" : "F", "address" : "171 Putnam Avenue", "employer" : "Filodyne", "email" : "[email protected]", "city" : "Nicholson", "state" : "PA" } 中间省略一部分 "_index" : "bank", "_type" : "account", "_id" : "222", "_score" : 1.0, "_source" : { "account_number" : 222, "balance" : 14764, "firstname" : "Rachelle", "lastname" : "Rice", "age" : 36, "gender" : "M", "address" : "333 Narrows Avenue", "employer" : "Enaut", "email" : "[email protected]", "city" : "Wright", "state" : "AZ" } } ] } }
混合使用各种查询条件。
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "query":{ > "bool":{ > "must":[ > {"match":{"age":"40"}} > ] > , > "must_not":[ > {"match":{"state":"ID"}} > ] > } > } > }' { "took" : 10, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 43, "max_score" : 4.2724166, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "549", "_score" : 4.2724166, "_source" : { "account_number" : 549, "balance" : 1932, "firstname" : "Jacqueline", "lastname" : "Maxwell", "age" : 40, "gender" : "M", "address" : "444 Schenck Place", "employer" : "Fuelworks", "email" : "[email protected]", "city" : "Oretta", "state" : "OR" } }, { "_index" : "bank", "_type" : "account", "_id" : "477", "_score" : 4.2724166, "_source" : { "account_number" : 477, "balance" : 25892, "firstname" : "Holcomb", "lastname" : "Cobb", "age" : 40, "gender" : "M", "address" : "369 Marconi Place", "employer" : "Steeltab", "email" : "[email protected]", "city" : "Byrnedale", "state" : "CA" } 省略一部分 "_index" : "bank", "_type" : "account", "_id" : "468", "_score" : 4.085979, "_source" : { "account_number" : 468, "balance" : 18400, "firstname" : "Foreman", "lastname" : "Fowler", "age" : 40, "gender" : "M", "address" : "443 Jackson Court", "employer" : "Zillactic", "email" : "[email protected]", "city" : "Wakarusa", "state" : "WA" } } ] } }
搜索结果中,会返回每个document的一组信息,其中有:
_index代表了index
_type代表类型type
_id代表id值
_score搜索到结果的匹配程度,值越大表示越关联度越大more relevant。它不是必须的。
_source 详细信息
bool语句支持filter:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query":{ "bool":{ "must":{"match_all":{}}, "filter":{ "range":{ "balance":{ "gte":20000, "lte":30000 } } } } } }' { "took" : 19, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 217, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "253", "_score" : 1.0, "_source" : { "account_number" : 253, "balance" : 20240, "firstname" : "Melissa", "lastname" : "Gould", "age" : 31, "gender" : "M", "address" : "440 Fuller Place", "employer" : "Buzzopia", "email" : "[email protected]", "city" : "Lumberton", "state" : "MD" } 省略一部分 "_index" : "bank", "_type" : "account", "_id" : "204", "_score" : 1.0, "_source" : { "account_number" : 204, "balance" : 27714, "firstname" : "Mavis", "lastname" : "Deleon", "age" : 39, "gender" : "F", "address" : "400 Waldane Court", "employer" : "Lotron", "email" : "[email protected]", "city" : "Stollings", "state" : "LA" } } ] } }
下面再介绍aggregation,它相当于SQL中的group by。
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "size":0, > "aggs":{ > "group_by_state":{ > "terms":{ > "field":"state" > } > } > } > }' { "took" : 113, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 4, "sum_other_doc_count" : 743, "buckets" : [ { "key" : "tx", "doc_count" : 30 }, { "key" : "md", "doc_count" : 28 }, { "key" : "id", "doc_count" : 27 }, { "key" : "al", "doc_count" : 25 }, { "key" : "me", "doc_count" : 25 }, { "key" : "tn", "doc_count" : 25 }, { "key" : "wy", "doc_count" : 25 }, { "key" : "dc", "doc_count" : 24 }, { "key" : "ma", "doc_count" : 24 }, { "key" : "nd", "doc_count" : 24 } ] } } }
上面的查询相当于SQL中的”SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC。
这里讲size设置为0是为了在显示结果的hits部分中不显示具体查询了哪些内容(这里是查询了所有内容,可以去掉size试试)。
下面还有几个aggregation的例子:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "size":0, > "aggs":{ > "group_by_state":{ > "terms":{ > "field":"state" > }, > "aggs":{ > "average_balance":{ > "avg":{ > "field":"balance" > } > } > } > } > } > }' { "took" : 37, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 4, "sum_other_doc_count" : 743, "buckets" : [ { "key" : "tx", "doc_count" : 30, "average_balance" : { "value" : 26073.3 } }, { "key" : "md", "doc_count" : 28, "average_balance" : { "value" : 26161.535714285714 } }, { "key" : "id", "doc_count" : 27, "average_balance" : { "value" : 24368.777777777777 } }, { "key" : "al", "doc_count" : 25, "average_balance" : { "value" : 25739.56 } }, { "key" : "me", "doc_count" : 25, "average_balance" : { "value" : 21663.0 } }, { "key" : "tn", "doc_count" : 25, "average_balance" : { "value" : 28365.4 } }, { "key" : "wy", "doc_count" : 25, "average_balance" : { "value" : 21731.52 } }, { "key" : "dc", "doc_count" : 24, "average_balance" : { "value" : 23180.583333333332 } }, { "key" : "ma", "doc_count" : 24, "average_balance" : { "value" : 29600.333333333332 } }, { "key" : "nd", "doc_count" : 24, "average_balance" : { "value" : 26577.333333333332 } } ] } } }
排序:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "size":0, > "aggs":{ > "group_by_state":{ > "terms":{ > "field":"state", > "order":{ > "average_balance":"desc" > } > }, > "aggs":{ > "average_balance":{ > "avg":{ > "field":"balance" > } > } > } > } > } > }' { "took" : 38, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : -1, "sum_other_doc_count" : 827, "buckets" : [ { "key" : "co", "doc_count" : 14, "average_balance" : { "value" : 32460.35714285714 } }, { "key" : "ne", "doc_count" : 16, "average_balance" : { "value" : 32041.5625 } }, { "key" : "az", "doc_count" : 14, "average_balance" : { "value" : 31634.785714285714 } }, { "key" : "mt", "doc_count" : 17, "average_balance" : { "value" : 31147.41176470588 } }, { "key" : "va", "doc_count" : 16, "average_balance" : { "value" : 30600.0625 } }, { "key" : "ga", "doc_count" : 19, "average_balance" : { "value" : 30089.0 } }, { "key" : "ma", "doc_count" : 24, "average_balance" : { "value" : 29600.333333333332 } }, { "key" : "il", "doc_count" : 22, "average_balance" : { "value" : 29489.727272727272 } }, { "key" : "nm", "doc_count" : 14, "average_balance" : { "value" : 28792.64285714286 } }, { "key" : "la", "doc_count" : 17, "average_balance" : { "value" : 28791.823529411766 } } ] } } }
另一个例子:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' > { > "size": 0, > "aggs": { > "group_by_age": { > "range": { > "field": "age", > "ranges": [ > { > "from": 20, > "to": 30 > }, > { > "from": 30, > "to": 40 > }, > { > "from": 40, > "to": 50 > } > ] > }, > "aggs": { > "group_by_gender": { > "terms": { > "field": "gender" > }, > "aggs": { > "average_balance": { > "avg": { > "field": "balance" > } > } > } > } > } > } > } > }' { "took" : 34, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_age" : { "buckets" : [ { "key" : "20.0-30.0", "from" : 20.0, "from_as_string" : "20.0", "to" : 30.0, "to_as_string" : "30.0", "doc_count" : 451, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "m", "doc_count" : 232, "average_balance" : { "value" : 27374.05172413793 } }, { "key" : "f", "doc_count" : 219, "average_balance" : { "value" : 25341.260273972603 } } ] } }, { "key" : "30.0-40.0", "from" : 30.0, "from_as_string" : "30.0", "to" : 40.0, "to_as_string" : "40.0", "doc_count" : 504, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "f", "doc_count" : 253, "average_balance" : { "value" : 25670.869565217392 } }, { "key" : "m", "doc_count" : 251, "average_balance" : { "value" : 24288.239043824702 } } ] } }, { "key" : "40.0-50.0", "from" : 40.0, "from_as_string" : "40.0", "to" : 50.0, "to_as_string" : "50.0", "doc_count" : 45, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "m", "doc_count" : 24, "average_balance" : { "value" : 26474.958333333332 } }, { "key" : "f", "doc_count" : 21, "average_balance" : { "value" : 27992.571428571428 } } ] } } ] } } }