Elasticsearch和Kibana在Ubuntu Server 18.04上的安装、配置、简单使用

文章目录

    • 0、环境和说明
    • 1、安装JDK
    • 2、安装Elasticsearch
    • 3、安装Kibana
    • 4、Kibana和Elasticsearch的交互测试
    • 5、批量建立文档索引
    • 6、聚合分析

0、环境和说明

Ubuntu Server 18.04登录非root普通用户,如没有可新建。
运行Elasticsearch和Kibana的机器IP为192.168.205.20,以下简称”测试机“。
运行浏览器查看Kibana的机器IP是192.168.205.10,以下简称”用户机“。
本文主要参考了官方原版手册Elasticsearch Reference [7.3] | Elastic和中文手册Elasticsearch: 权威指南 | Elastic,中文版手册内容很旧,建议以原版为主。
本文涉及的使用包括在Kibana Dev Tools内,用控制台装入测试数据、单条查询、复杂查询、聚合查询、排序。

1、安装JDK

sudo apt install openjdk-8-jdk

2、安装Elasticsearch

下载安装包elasticsearch-7.3.1-linux-x86_64.tar.gz
解压即安装,tar -zxvf elasticsearch-7.3.1.tar.gz
运行:单机跑一个master和两个data节点。注意机器的内存要大一些,避免ES经常使用磁盘交换空间当虚拟内存,导致性能降低。

cd elasticsearch-7.3.1/bin
./elasticsearch
./elasticsearch -Epath.data=data2 -Epath.logs=log2
./elasticsearch -Epath.data=data3 -Epath.logs=log3

配置(单机学习时可以先不作任何更改直接运行,走Kibana的外网端口):进入elasticsearch的config目录下,修改配置文件elasticsearch.yml

#vi elasticsearch.yml

将network.host: 127.0.0.1 中的IP替换为0.0.0.0
为了避免缓存数量不够的报错,修改配置文件 :

# vi /etc/sysctl.conf

在最后添加: vm.max_map_count=262144
以下两项可选:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false

3、安装Kibana

下载安装包Download Kibana Free • Get Started Now | Elastic
安装(需要Elasticsearch已运行)和运行:解压后进kibana_home/bin,运行./kibana
配置:打开$KINANA_HOME/config/kinana.yml,找到server.host,改成server.host: "192.168.205.20",以供外网访问。
在用户机上打开浏览器,访问192.168.205.20:5601测试Kibana是否能打开。

4、Kibana和Elasticsearch的交互测试

打开Kibana页面左侧倒数第3个按钮Dev Tools,进入Console标签页,输入以下内容并点击右侧的绿色运行按钮或按快捷键Ctrl+Enter运行代码,完成测试数据录入。

PUT /megacorp/_doc/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}
PUT /megacorp/_doc/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
PUT /megacorp/_doc/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

单条数据查询使用命令GET /megacorp/_doc/1,其中megacorp是索引名,_doc是类型名,1是文档名,输出如下:

{
  "_index" : "megacorp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [
      "sports",
      "music"
    ]
  }
}

多条查询使用命令GET /megacorp/_search,输出如下,其中hits是保存搜索命中结果的数组。_search后可以增加查询字符串,例如GET /megacorp/_search?q=last_name:Smith 可以查询姓为Smith的megacorp员工。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

更复杂的搜索应使用查询表达式,下面的表达式用到了bool、must、match、filter、range、gt等查询关键字,输出结果的结构和上面无查询请求体的查询类似。

GET /megacorp/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

下边是全文搜索的例子,可以看到输出结果中有_score和max_score,即ES对搜索结果(也就是文档,ES的数据类型)与搜索词相关性的打分,分值高的靠前,同时可以看到结果的第二项并未全文匹配关键字,所以其分值较低。如需要全文精确搜索,可使用关键字match_phrase替代match
输入请求如下:

GET /megacorp/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

输出结果如下:

{
  "took" : 34,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4167402,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.4167402,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.45895916,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      }
    ]
  }
}

Elasticsearch可以返回带高亮标记的搜索结果,如下所示:

GET /megacorp/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

返回结果中有highlight字段,命中检索词的部分加上了标签

{
  "took" : 356,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.4167402,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.4167402,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        },
        "highlight" : {
          "about" : [
            "I love to go rock climbing"
          ]
        }
      }
    ]
  }
}

5、批量建立文档索引

此处使用官方的 accounts.json示例数据集,将其下载至ES的bin目录。批量索引相比多次单条索引建立速度显著加快,因为减少了大量的网络往返。批尺寸调优依赖于多个因素,如文档大小和复杂度,索引和搜索的负载,还有集群的可用资源量。根据经验,可以先从1000到5000个文档及总容量5-15MB开始尝试,直到找到当前环境最优值。在测试机控制台运行以下命令,将accounts.json一次装入ES并批量建立索引:

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

第一条运行完可以看到ES的日志输出以下内容:

[bank] creating index, cause [auto(bulk api)], templates [], shards [1]/[1], mappings []

第二条运行完控制台输出以下内容,其中uuid不一定一样:

health status index                           uuid                pri rep docs.count docs.deleted store.size pri.store.size
yellow open  bank     aY2jy79TT9WVCI8qH0S1VQ    1   1       1000            0                414.2kb      414.2kb

文档搜索过程与单条相同,可使用请求体实现复杂搜索。默认情况返回值的hits部分显示前10个匹配查询条件的文档。查询体中的fromsize是分页用的,可不指定。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

6、聚合分析

例一,姓Smith 的员工中最受欢迎的兴趣爱好。注意,未设置文本的域数据索引前,应使用fieldname.keyword来取值,如本例的interests.keyword。

GET /megacorp/_search
{
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests.keyword",
        "size": 10
      }
    }
  }
}

输出如下(已省略查询部分,只列出聚合部分):

...
"aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "music",
          "doc_count" : 2
        },
        {
          "key" : "sports",
          "doc_count" : 1
        }
      ]
    }
  }
}

例二,统计各州用户数量。

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

输出如下:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
          "doc_count" : 24
        }
      ]
    }
  }
}

例三,嵌套聚合,求平均数,排序,等等。在Kibana Dev Tools的Console中输入查询命令时有代码提示。本例在例二基础上,增加了嵌套的”州平均余额“聚合域,并按此域对州名进行排序。

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

分析特定类型的数据时,比如日期、IP地址、地理信息等,Elasticsearch给这类多域操作提供了专门的聚合工具。另外,可以将单个聚合的结果喂给流水线聚合,以进行更深入的分析。
聚合提供的核心分析能力使带来了一些高级特性,比如使用机器学习探测异常现象。

你可能感兴趣的:(ELK)