Ubuntu Server 18.04登录非root普通用户,如没有可新建。
运行Elasticsearch和Kibana的机器IP为192.168.205.20,以下简称”测试机“。
运行浏览器查看Kibana的机器IP是192.168.205.10,以下简称”用户机“。
本文主要参考了官方原版手册Elasticsearch Reference [7.3] | Elastic和中文手册Elasticsearch: 权威指南 | Elastic,中文版手册内容很旧,建议以原版为主。
本文涉及的使用包括在Kibana Dev Tools内,用控制台装入测试数据、单条查询、复杂查询、聚合查询、排序。
sudo apt install openjdk-8-jdk
下载安装包elasticsearch-7.3.1-linux-x86_64.tar.gz
解压即安装,tar -zxvf elasticsearch-7.3.1.tar.gz
运行:单机跑一个master和两个data节点。注意机器的内存要大一些,避免ES经常使用磁盘交换空间当虚拟内存,导致性能降低。
cd elasticsearch-7.3.1/bin
./elasticsearch
./elasticsearch -Epath.data=data2 -Epath.logs=log2
./elasticsearch -Epath.data=data3 -Epath.logs=log3
配置(单机学习时可以先不作任何更改直接运行,走Kibana的外网端口):进入elasticsearch的config目录下,修改配置文件elasticsearch.yml
#vi elasticsearch.yml
将network.host: 127.0.0.1 中的IP替换为0.0.0.0
为了避免缓存数量不够的报错,修改配置文件 :
# vi /etc/sysctl.conf
在最后添加: vm.max_map_count=262144
以下两项可选:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
下载安装包Download Kibana Free • Get Started Now | Elastic
安装(需要Elasticsearch已运行)和运行:解压后进kibana_home/bin,运行./kibana
配置:打开$KINANA_HOME/config/kinana.yml,找到server.host,改成server.host: "192.168.205.20"
,以供外网访问。
在用户机上打开浏览器,访问192.168.205.20:5601测试Kibana是否能打开。
打开Kibana页面左侧倒数第3个按钮Dev Tools,进入Console标签页,输入以下内容并点击右侧的绿色运行按钮或按快捷键Ctrl+Enter
运行代码,完成测试数据录入。
PUT /megacorp/_doc/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
PUT /megacorp/_doc/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/_doc/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
单条数据查询使用命令GET /megacorp/_doc/1
,其中megacorp是索引名,_doc是类型名,1是文档名,输出如下:
{
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
多条查询使用命令GET /megacorp/_search
,输出如下,其中hits
是保存搜索命中结果的数组。_search
后可以增加查询字符串,例如GET /megacorp/_search?q=last_name:Smith
可以查询姓为Smith的megacorp员工。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [
"forestry"
]
}
}
]
}
}
更复杂的搜索应使用查询表达式,下面的表达式用到了bool、must、match、filter、range、gt等查询关键字,输出结果的结构和上面无查询请求体的查询类似。
GET /megacorp/_search
{
"query" : {
"bool": {
"must": {
"match" : {
"last_name" : "smith"
}
},
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
下边是全文搜索的例子,可以看到输出结果中有_score和max_score,即ES对搜索结果(也就是文档,ES的数据类型)与搜索词相关性的打分,分值高的靠前,同时可以看到结果的第二项并未全文匹配关键字,所以其分值较低。如需要全文精确搜索,可使用关键字match_phrase
替代match
。
输入请求如下:
GET /megacorp/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
输出结果如下:
{
"took" : 34,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.4167402,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.4167402,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.45895916,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
}
]
}
}
Elasticsearch可以返回带高亮标记的搜索结果,如下所示:
GET /megacorp/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
返回结果中有highlight
字段,命中检索词的部分加上了标签
{
"took" : 356,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.4167402,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.4167402,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
},
"highlight" : {
"about" : [
"I love to go rock climbing"
]
}
}
]
}
}
此处使用官方的 accounts.json示例数据集,将其下载至ES的bin目录。批量索引相比多次单条索引建立速度显著加快,因为减少了大量的网络往返。批尺寸调优依赖于多个因素,如文档大小和复杂度,索引和搜索的负载,还有集群的可用资源量。根据经验,可以先从1000到5000个文档及总容量5-15MB开始尝试,直到找到当前环境最优值。在测试机控制台运行以下命令,将accounts.json一次装入ES并批量建立索引:
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"
第一条运行完可以看到ES的日志输出以下内容:
[bank] creating index, cause [auto(bulk api)], templates [], shards [1]/[1], mappings []
第二条运行完控制台输出以下内容,其中uuid不一定一样:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank aY2jy79TT9WVCI8qH0S1VQ 1 1 1000 0 414.2kb 414.2kb
文档搜索过程与单条相同,可使用请求体实现复杂搜索。默认情况返回值的hits
部分显示前10个匹配查询条件的文档。查询体中的from
和size
是分页用的,可不指定。
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
例一,姓Smith 的员工中最受欢迎的兴趣爱好。注意,未设置文本的域数据索引前,应使用fieldname.keyword来取值,如本例的interests.keyword。
GET /megacorp/_search
{
"aggs": {
"all_interests": {
"terms": {
"field": "interests.keyword",
"size": 10
}
}
}
}
输出如下(已省略查询部分,只列出聚合部分):
...
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "music",
"doc_count" : 2
},
{
"key" : "sports",
"doc_count" : 1
}
]
}
}
}
例二,统计各州用户数量。
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
输出如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30
},
{
"key" : "MD",
"doc_count" : 28
},
{
"key" : "ID",
"doc_count" : 27
},
{
"key" : "AL",
"doc_count" : 25
},
{
"key" : "ME",
"doc_count" : 25
},
{
"key" : "TN",
"doc_count" : 25
},
{
"key" : "WY",
"doc_count" : 25
},
{
"key" : "DC",
"doc_count" : 24
},
{
"key" : "MA",
"doc_count" : 24
},
{
"key" : "ND",
"doc_count" : 24
}
]
}
}
}
例三,嵌套聚合,求平均数,排序,等等。在Kibana Dev Tools的Console中输入查询命令时有代码提示。本例在例二基础上,增加了嵌套的”州平均余额“聚合域,并按此域对州名进行排序。
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
分析特定类型的数据时,比如日期、IP地址、地理信息等,Elasticsearch给这类多域操作提供了专门的聚合工具。另外,可以将单个聚合的结果喂给流水线聚合,以进行更深入的分析。
聚合提供的核心分析能力使带来了一些高级特性,比如使用机器学习探测异常现象。