ElasticSearch Curator使用教程

[TOC]
在日常工作中,当我们需要去维护一个elasitcsearch集群以期能稳定工作。通常需要有计划的做很多事情。比如定期的清除数据,合并 segment,备份恢复等。如果我们具备编程能力,这些工作一般都是可以通过各种编程语言根据我们的需求,调用elasticsearch的API可以完成的。但是,重复造轮子之前,我们应该确定,别人没有遇到过类似的事情,没有通用的工具可以完成我们的需求,我们才自己动手去做。elasticsearch整个生态圈已经很成熟。elastic.co提供的curator这个工具(用python开发的)已经为各种运维场景提供了完善的解决方案,大部分情况下,我们只需要使用curator就可以完成我们的日常需求。

安装curator

关于它的安装,可以查看官网。如果我们的服务器已经安装了pip,则可以很方便的通过pip install来完成:

pip install elasticsearch-curator

但很多生产环境是没有安装pip的。因为防火墙的关系,也不能直接访问https://packages.elastic.co。所以,官网上介绍的大部分安装方式,其实都是很适用。
因此,解决方案是直接下完整个RPM安装包,直接在服务器上安装。
地址:
Elasticsearch Curator 5.2.0 Binary Package (DEB)
Elasticsearch Curator 5.2.0 Binary Package for newer Debian 9 based systems (DEB)
Elasticsearch Curator 5.2.0 RHEL/CentOS 6 Binary Package (RPM)
Elasticsearch Curator 5.2.0 RHEL/CentOS 7 Binary Package (RPM)

curator的接口

curator提供了两个interface。一个是curator,一个是curator_cli。

curator_cli接口

先说这个接口,是因为它适合用于调试,但真正但运维场景我还是推荐curator。

$ curator_cli --help
Usage: curator_cli [OPTIONS] COMMAND [ARGS]...

Options:
  --config PATH       Path to configuration file. Default:
                      ~/.curator/curator.yml
  --host TEXT         Elasticsearch host.
  --url_prefix TEXT   Elasticsearch http url prefix.
  --port TEXT         Elasticsearch port.
  --use_ssl           Connect to Elasticsearch through SSL.
  --certificate TEXT  Path to certificate to use for SSL validation.
  --client-cert TEXT  Path to file containing SSL certificate for client auth.
  --client-key TEXT   Path to file containing SSL key for client auth.
  --ssl-no-validate   Do not validate SSL certificate
  --http_auth TEXT    Use Basic Authentication ex: user:pass
  --timeout INTEGER   Connection timeout in seconds.
  --master-only       Only operate on elected master node.
  --dry-run           Do not perform any changes.
  --loglevel TEXT     Log level
  --logfile TEXT      log file
  --logformat TEXT    Log output format [default|logstash|json].
  --version           Show the version and exit.
  --help              Show this message and exit.

Commands:
  allocation        Shard Routing Allocation
  close             Close indices
  delete_indices    Delete indices
  delete_snapshots  Delete snapshots
  forcemerge        forceMerge index/shard segments
  open              Open indices
  replicas          Change replica count
  show_indices      Show indices
  show_snapshots    Show snapshots
  snapshot          Snapshot indices

上面是基本的命令参数。但为什么说不推荐在运维期间使用curator_cli。是因为这个接口只支持一次运行一个action。并且通过命令行写入复杂的filter是很反人类的。所以,一般是使用curator_cli来配合写curator的action.yml,或者做写简单的测试。

例子

获取所有的index

curator_cli --host 10.33.4.160 --port 9200 show_indices --verbos

输出:

.kibana                            open    54.9KB        6   1   1 2017-09-06T02:13:00Z
.monitoring-alerts-6               open     6.5KB        1   1   1 2017-09-06T02:14:01Z
.monitoring-es-6-2017.10.12        open   376.1MB   556576   1   1 2017-10-12T00:00:06Z
.monitoring-es-6-2017.10.13        open    76.8MB    96220   1   1 2017-10-13T00:00:08Z
.monitoring-kibana-6-2017.10.12    open     3.3MB     8638   1   1 2017-10-12T00:00:08Z
.monitoring-kibana-6-2017.10.13    open     1.3MB     3390   1   1 2017-10-13T00:00:09Z
.monitoring-logstash-6-2017.10.12  open     2.4MB     8211   1   1 2017-10-12T01:09:48Z
.monitoring-logstash-6-2017.10.13  open     1.1MB     3390   1   1 2017-10-13T00:00:08Z
.reporting-2017.09.17              open   376.9KB        2   5   1 2017-09-21T09:58:01Z
.triggered_watches                 open     9.2MB       19   1   1 2017-09-06T02:14:01Z
.watcher-history-3-2017.10.12      open     6.0MB     7200   1   1 2017-10-12T00:00:03Z
.watcher-history-3-2017.10.13      open     2.4MB     2830   1   1 2017-10-13T00:00:03Z
.watches                           open    23.6KB        4   1   1 2017-09-06T02:13:00Z
syslog-network-2017.10.11          open    26.1MB   109195   5   1 2017-10-13T02:20:58Z
syslog-network-2017.10.12          open    11.5KB        1   5   1 2017-10-12T20:11:28Z
syslog-platform-2017.10.11         open  1019.5MB  4004662   5   1 2017-10-13T02:36:11Z
syslog-platform-2017.10.12         open    16.0MB    61915   5   1 2017-10-12T03:17:38Z
syslog-platform-2017.10.13         open    20.8MB    90628   5   1 2017-10-12T23:52:10Z
watcher                            open    69.0KB        5   5   1 2017-09-21T02:23:10Z
watcher_alarms-2017.10.11          open   365.5KB        1   5   1 2017-10-11T08:00:06Z

close index

curator_cli --host 10.33.4.160 --port 9200 close --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":1},{"filtertype":"pattern","kind":"prefix","value":"syslog-"}]'
2017-10-13 17:30:21,573 INFO      Closing selected indices: ['syslog-platform-2017.10.12']
2017-10-13 17:30:21,713 INFO      Singleton "close" action completed.

上面的操作就是通过--fliter_list过滤出所有1天前创建的,以syslog-开头的index,然后关闭它们。可以从例子上看到,curator_cli很难阅读。

curator接口

这个接口从调用上就很简单:

curator [--config CONFIG.YML] [--dry-run] ACTION_FILE.YML

--config之后跟上配置文件,再跟action文件。action文件中可以包含一连串的action(我们所有的操作都可以放在一起)。相比于curator_cli接口,curator接口集中式的config和action管理,可以方便我们重用变量,更利于维护和阅读。

configuration

一般来说,配置文件命名为curator.yml,当然,什么名字都无所谓,通过--config引用即可。

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - 10.33.4.160
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile: /var/log/curator.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

很直观的配置,每个参数的含义都很清楚。这里需要指出的是,如果不配置参数的话,留空,即可,不要画蛇添足的写None。

另外,logfile如果不填的话,默认是输出到stdout。推荐是存储到文件中。如上例。

action

每个action由三部分组成:
- action,具体执行什么操作
- option, 配置哪些可选项
- filter, 过滤条件,哪些index需要执行action

可执行的操作:

对比curator_cli,多出来了alias, store, shrink等操作:
- alias
- allocation
- close
- cluster_routing
- create_index
- delete_indices
- delete_snapshots
- forcemerge
- index_settings
- open
- reindex
- replicas
- restore
- rollover
- shrink
- snapshot

options:

很多,这里不一一介绍,看后面的例子,理解最关键的几个,剩下自己到官网查资料:
- allocation_type
- continue_if_exception
- count
- delay
- delete_after
- delete_aliases
- disable_action
- extra_settings
- ignore_empty_list
- ignore_unavailable
- include_aliases
- include_global_state
- indices
- key
- max_age
- max_docs
- max_num_segments
- max_wait
- migration_prefix
- migration_suffix
- name
- node_filters
- number_of_replicas
- number_of_shards
- partial
- post_allocation
- preserve_existing-
- refresh
- remote_aws_key
- remote_aws_region
- remote_aws_secret_key
- remote_certificate
- remote_client_cert
- remote_client_key
remote_filters
- remote_ssl_no_validate
- remote_url_prefix
- rename_pattern
- rename_replacement
- repository
- requests_per_second
- request_body
- retry_count
- retry_interval
- routing_type
- setting
- shrink_node
- shrink_prefix
- shrink_suffix
- slices
- skip_repo_fs_check
- timeout
- timeout_override
- value
- wait_for_active_shards
- wait_for_completion
- wait_interval
- warn_if_no_indices

filters

最常用的filtertype是pattern和age:
- age
- alias
- allocated
- closed
- count
- forcemerged
- kibana
- none
- opened
- pattern
- period
- space
- state

例子:

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True.  If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
  1:
    action: delete_indices
    description: >-
      Delete metric indices older than 3 days (based on index name), for
      .monitoring-es-6-
      .monitoring-kibana-6-
      .monitoring-logstash-6-
      .watcher-history-3-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
 #     disable_action: True
    filters:
    - filtertype: pattern
      kind: regex
      value: '^(\.monitoring-(es|kibana|logstash)-6-|\.watcher-history-3-).*$'
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

  2:
    action: close
    description: >-
      Close indices older than 30 days (based on index name), for syslog-
      prefixed indices.
    options:
      ignore_empty_list: True
      delete_aliases: False
#      disable_action: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: syslog-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30

  3:
    action: forcemerge
    description: >-
      forceMerge syslog- prefixed indices older than 2 days (based on index
      creation_date) to 2 segments per shard.  Delay 120 seconds between each
      forceMerge operation to allow the cluster to quiesce. Skip indices that
      have already been forcemerged to the minimum number of segments to avoid
      reprocessing.
    options:
      ignore_empty_list: True
      max_num_segments: 2
      delay: 120
      timeout_override:
      continue_if_exception: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: syslog-
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 2
    - filtertype: forcemerged
      max_num_segments: 2
      exclude:

actions定义在一个yml文件中,通过缩进定义变量。例子中定义了3个action。它们会被顺序执行。当然,这三个任务(1,2,3)在这里没有先后依赖,如果有依赖关系,要保证被依赖的action写在前面。

三个任务分别是,删除索引,关闭过期索引,合并索引的segment。

这里特别要注意的是option选项,在多action,并且没有互相依赖的情况下,一定要设置ignore_empty_list: True。这里代表的是,如果filter没有找到符合查询条件的index,略过。如果设置成false。则第一个action,没有找到匹配的index,整个curator会被abort。

官网上有各种action的例子,大家可以查看。

使用crontab定期执行curator

当然,curator是一个命令行工具,而我们的需要是需要自动化的定期维护,因此需要crontab等工具。一般的linux操作系统都自带crontab。修改/etc/crontab文件:

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed

0 0 * * * root curator --config /opt/curator/curator.yml /opt/curator/action.yml

每天都会执行一次,delete index,close index,merge segment

你可能感兴趣的:(ELK,点火三周的Elastic,Stack专栏)