ubuntu16.04+elasticsearch6.5为例,参考官网文档https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
安装java
参考文章:https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
$ sudo apt-get update
$ sudo apt-get install -y default-jre
$ sudo add-apt-repository ppa:webupd8team/java && sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
$ java -version #测试java
$ echo $JAVA_HOME #测试java_home
Elasticsearch
安装(6.5.4)
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.zip
$ unzip elasticsearch-6.5.4.zip
启动
$ cd elasticsearch-6.5.4/bin
$ ./elasticsearch
启动时,如果报错vm.maxmapcount [65530] is too low执行下面
$ sudo sysctl -w vm.max_map_count=262144
curl测试,出现以下信息表示启动成功,安装正常
$ curl 127.0.0.1:9200
{
"name" : "c5skAub",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "bdkUuVtQSvWOiY_vXEFnvw",
"version" : {
"number" : "6.5.4",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "d2ef93d",
"build_date" : "2018-12-17T21:17:40.758843Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
基础概念
Elastic是目前全文搜索引擎的首选,本质上是非关系型数据库,与mysql一些概念对比如下。
Mysql | Elastic |
---|---|
database(数据库) | index(索引) |
table(表) | type(类型,7.x将废弃) |
row(记录) | document(文档) |
column(字段) | fileds(字段) |
基本操作
Elastic的操作通过rest api来完成,以下操作都将省去
curl -XMETHOD "http://localhost:9200" -H 'Content-Type: application/json' [-d 'request body']
,如果想远程访问,修改/path-to-elastic/config/elasticsearch.yml
中的network.host: 0.0.0.0
后重启即可
操作索引
新建一个名为customer的index,?pretty返回友好的json
$ PUT /customer?pretty
列出所有索引
$ GET /_cat/indices?v
删除索引
$ DELETE /customer
操作文档
新建id为1的document,由于type将被废除,所以规定每个index只包含一个type,统一为_doc
$ PUT /customer/_doc/1?pretty
{
"name": "luke"
}
如果使用post并且id留空将会生成一个随机的id
$ POST /customer/_doc?pretty {"name": "php"}
{
"_index": "customer",
"_type": "_doc",
"_id": "hIkkLGgBFVhvdLuiNNGD", ##返回的id
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 3
}
更新文档与新建相同,改变数据即可,或者
$ POST /customer/_doc/1/_update?pretty
{
"doc": { "name": "luke44", "age": 24 }
}
使用简单的脚本更新,这里的ctx._source指向将被修改的文档
$ POST /customer/_doc/1/_update?pretty
{
"script" : "ctx._source.age += 5"
}
查询id为1的文档
$ GET /customer/_doc/1?pretty
{
"_index": "customer",
"_type": "_doc",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "luke"
}
}
删除文档
$ DELETE /customer/_doc/2?pretty
批量操作,批量更新id为1和2的文档,注意在postman中body最后必须空一行
$ POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "luke" }
{"index":{"_id":"2"}}
{"name": "php", "age": "20" }
先更新id为1的文档,然后删除id为2的文档
$ POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc":{"name":"php best"}}
{"delete":{"_id":"2"}}
批量操作时其中一个操作失败时,其他操作任然会继续执行,结束时根据执行顺序返回状态。
浏览数据
先准备一个虚拟的银行客户帐户信息数据集,类似这种格式,请右键下载数据集另存为accounts.json
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "[email protected]",
"city": "Hobucken",
"state": "CO"
}
导入数据集
$ POST /bank/_doc/_bulk?pretty&refresh --data-binary "@accounts.json"
$ GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 3inMmuQzRqaTpMkzfh07_A 5 1 1000 0 95.9kb 95.9kb
yellow open customer gSRgPG9cScKHcuycJE2drw 5 1 2 0 7.7kb 7.7kb
match_all查询
使用URI搜索,q=*
匹配所有,sort=account_number:asc
表示按account_number
升序排列
$ GET /bank/_search?q=*&sort=account_number:asc&pretty
{
"took" : 63, //耗时,毫秒
"timed_out" : false, //是否超时
"_shards" : { //碎片
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : { //命中
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"[email protected]","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
}, ...
]
}
}
使用json请求体搜索,获取跟上面相同的效果
$ GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
使用size和from限制结果条数,类似mysql的limit和from;使用_source查询指定字段
$ GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } },
"from": 10,
"size": 15, //默认10
"_source": ["account_number", "balance"]
}
match查询
查询account_number为20的所有账户
$ GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
查询address中包含mill
单词的所有账户
$ GET /bank/_search
{
"query": { "match": { "address": "mill" } }
}
查询address中包含mill
或者lane
单词的所有账户
$ GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
match_phrase查询,match的变种,查询address中包含mill lane
的所有账户
$ GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
bool查询
查询address中包含mill
和lane
单词的所有账户,bool must
子句指定所有必须为true的查询才能将文档视为匹配项
$ GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
//"should": [...] 或查询
//"must_not": [...] 都不是
}
}
}
组合查询,查询年龄为40并且不住在ID
省的客户账户
$ GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
bool过滤器
查询余额在20000到30000(包含)的客户账户
$ GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}