Elasticsearch集群运维

一、索引管理

1、 创建索引

PUT test-2019-03

{

         "settings": {

                   "index": {

                            "number_of_shards": 10,

                            "number_of_replicas": 1,

                            "routing": {

                                     "allocation": {

                                               "include": {

                                                        "type": "hot"

                                               }

                                     }

                            }

                   }

         }

}

2、 删除索引

DELETE test-2019-03

DELETE test*

支持通配符*

3、 修改索引

修改副本数:

PUT test-2019-03/_settings

{

         "index": {

                   "number_of_replicas": 0

         }

}

4、 重构索引ReIndex

POST _reindex

{

         "source": {

                   "index": ["test-2018-07-*"]

         },

         "dest": {

                   "index": "test -2018-07"

         }

}

查看reIndex任务:

GET _tasks?detailed=true&actions=*reindex

5、 删除数据delete_by_query

POST indexApple-2019-02/_delete_by_query?conflicts=proceed

{

    "query": {

        "bool" : {

            "must" : {

                "term" : { "appIndex" : "apple" }

            },

            "filter" : {

                "range": {

                    "timestamp": {

                        "gte": "2019-02-23 08:00:00",

                        "lte": "2019-02-23 22:00:00",

                        "time_zone" :"+08:00"

                    }

                }

            }

        }

    }

}

查看delete_by_query任务:

GET _tasks?detailed=true&actions=*/delete/byquery

二、集群设置

ES cluster的settings:

curl -XPUT http://:/_cluster/settings

1、Shard Allocation Settings

{"persistent":{"cluster.routing.allocation.enable": "all"}}

设置集群哪种分片允许分配,4个选项:

all - (default) Allows shard allocation for all kinds of shards.

primaries - Allows shard allocation only for primary shards.

new_primaries - Allows shard allocation only for primary shards for new indices.

none - No shard allocations of any kind are allowed for any indices.

{"persistent":{"cluster.routing.allocation.node_concurrent_recoveries": 8}}
设置在节点上并发分片恢复的个数(写和读)。

{"persistent":{"cluster.routing.allocation.node_initial_primaries_recoveries": 16}} 
设置节点重启后有多少并发数从本地恢复未分配的主分片。

{"persistent":{"indices.recovery.max_bytes_per_sec": "500mb"}}
设置索引恢复时每秒字节数。

2、Shard Rebalancing Settings

{"persistent":{"cluster.routing. rebalance.enable": "all"}}

设置集群哪种分片允许重平衡,4个选项:

all - (default) Allows shard balancing for all kinds of shards.

primaries - Allows shard balancing only for primary shards.

replicas - Allows shard balancing only for replica shards.

none - No shard balancing of any kind are allowed for any indices.

{"persistent":{"cluster.routing. allocation. allow_rebalance": "all"}}

always - Always allow rebalancing.

indices_primaries_active - Only when all primaries in the cluster are allocated.

indices_all_active - (default) Only when all shards (primaries and replicas) in the cluster are allocated.

{"transient":{"cluster.routing.allocation.cluster_concurrent_rebalance": 8}}
设置在集群上并发分片重平衡的个数,只控制“重平衡”过程的并发数,对集群“恢复”和其他情况下的并发数没有影响。

{"transient":{"cluster.routing.allocation.cluster_concurrent_rebalance": 0}}

禁用集群“rebalance”

{"transient":{"cluster.routing.allocation.cluster_concurrent_rebalance": null}}
启用集群“rebalance”

3、Disk-based Shard Allocation

#调整数据节点的低水位值为80%
{"transient":{"cluster.routing.allocation.disk.watermark.low":"80%"}}
#调整数据节点的高水位值为90%
{"transient":{"cluster.routing.allocation.disk.watermark.high":"90%"}}
#取消用户设置,集群恢复这一项的默认配置
{"transient":{"cluster.routing.allocation.disk.watermark.low": null}}
{"transient":{"cluster.routing.allocation.disk.watermark.high": null}}

4、Allocation策略

明确指定是否允许分片分配到指定Node上,分为index级别和cluster级别

  • index.routing.allocation.require.{attribute}
  • index.routing.allocation.include{attribute}
  • index.routing.allocation.exclude.{attribute}
  • cluster.routing.allocation.require.{attribute}
  • cluster.routing.allocation.include.{attribute}
  • cluster.routing.allocation.exclude.{attribute}

require表示必须分配到指定node,include表示可以分配到指定node,exclude表示不允许分配到指定Node,cluster的配置会覆盖index级别的配置,比如index include某个node,cluster exclude某个node,最后的结果是exclude某个node

#通过IP,排除集群中的某个节点:节点IP:10.100.0.11
{"transient":{"cluster.routing.allocation.exclude._ip":"10.100.0.11"}}
#通过IP,排除集群中的多个节点:节点IP:10.10.0.11,10.100.0.12
{"transient":{"cluster.routing.allocation.exclude._ip":"10.100.0.11,10.100.0.12"}}
#取消节点排除的限制
{"transient":{"cluster.routing.allocation.exclude._ip": null}}

设置索引不分配到某些IP:

PUT test/_settings

{

  "index.routing.allocation.exclude._ip": "192.168.2.*"

}

默认支持的属性:

_name      Match nodes by node name

_host_ip  Match nodes by host IP address (IP associated with hostname)

_publish_ip       Match nodes by publish IP address

_ip    Match either _host_ip or _publish_ip

_host        Match nodes by hostname

5、Shard分配问题

1、查看集群unassigned shards原因
GET _cluster/allocation/explain?pretty

2、查看索引的恢复状态,以索引user为例
GET user/_recovery?active_only=true

3、使用reroute重试之前分配失败的,集群在尝试分配分片index.allocation.max_retries(默认为5)次后会放弃分配
POST /_cluster/reroute?retry_failed=true

4、查看状态是red的索引
GET _cat/indices?health=red

 

集群滚动重启

1、准备工作
##提前打开如下信息,有些API是需要观察的各项指标(出现问题则停止重启),其余是配合检查的API:
##查看集群UNASSIGEN shards原因
curl http://0.0.0.0:9200/_cluster/allocation/explain?pretty

###集群配置
curl http://0.0.0.0:9200/_cluster/settings?pretty

###pending-tasks
curl http://0.0.0.0:9200/_cluster/pending_tasks?pretty

###集群健康
curl http://0.0.0.0:9200/_cluster/health?pretty
2、重启client-node
#start
步骤1:关闭其中一个client节点
步骤2:重启节点
步骤3:检查节点是否加入集群
步骤4:重复步骤2-3重启其他节点
#end

3、重启master-node
#start
步骤1:明确master节点IP
步骤2:关闭master-node组的一个非master节点
步骤3:重启节点
步骤4:检查节点是否加入集群(确保已经加入集群)
步骤5:重复步骤2-4,重启另外的master-node组的一个非master节点
步骤6:关闭master节点
步骤7:重启master节点
##在master节点选举过程中,集群功能不可用(包括了:索引功能、search功能,API功能堵塞等),集群并不会立即选举出master节点(默认进行选举的时间为3s, 由于网络的问题,往往将master选举的时间延长)
步骤8:检查集群装填,检查节点是否加入集群。
##当master选举出来,集群功能将全部正常。
#end

4、重启data-node
#start
步骤1:禁用分片分配
curl -X PUT http://0.0.0.0:9200/_cluster/settings?pretty -d '{"transient": {"cluster.routing.allocation.enable": "new_primaries"}}'
##禁用分片分配期间,集群新建索引将无法分配副本分片,允许新建索引主分片的分配
步骤2:执行同步刷新
curl -XPOST "http://0.0.0.0:9200/_flush/synced?pretty"
##对于在此刻不在更新的索引,此操作将通过synced值来确认主副分片是否数据一致(加快了分片加入集群的时间);对于在此刻索引发生变化的分片,此操作对节点加入集群的索引恢复没有作用
步骤3:关闭一个data-node节点
步骤4:重启节点
步骤5:检查节点是否加入集群
步骤6:启用分片分配
curl -X PUT http://0.0.0.0:9200/_cluster/settings?pretty -d '{"transient": {"cluster.routing.allocation.enable": "all"}}'
步骤7:检查集群状态是否为green
##在启用了分片分配后,UNASSIGEN shards会瞬间减少(不会瞬间减少为0,因为在大的ES集群中,每个节点都会有在更新的索引分片);之后会出现一些initializing shards,这部分分片会需要等待一段时间才会减少为0(分片同步过程中)
步骤8:重复步骤3-7,重启其他节点
步骤9:节点全部重启完毕后,检查集群配置,确保没有禁用分片分配
#end
参考资料:

ES官方重启教程 https://www.elastic.co/guide/en/elasticsearch/reference/1.4/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart

 

 

参考:

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index.html

 

你可能感兴趣的:(Elasticsearch集群运维)