第一章 es集群搭建
通过第一章学习es集群搭建流程及es基本概念后,在本章内容中,我们主要对es的基本命令行操作进行一次基本学习,来源于工作中的整理。
1. 查看没有配置密码的es集群节点
curl -XGET http://ip:9200/_cat/nodes?v
2. 查看配置了密码的es集群节点 (关于es集群配置认证密码将在下章讲)
curl -XGET http://ip:9200/_cat/nodes?v -u用户名:密码
3. es查看安装了哪些插件的命令
curl -XGET http://ip:9200/_cat/plugins
4. 查看集群健康状态
curl http://ip:9200/_cat/health?v
curl http://ip:9200/_cluster/health
{
"cluster_name" : "my_cluster", 集群名称
"status" : "yellow", 集群健康值
"timed_out" : false, 是否超时
"number_of_nodes" : 3, 索引主分片数量
"number_of_data_nodes" : 3, 数据节点数量
"active_primary_shards" : 37, 活跃主分片数量
"active_shards" : 65, 活跃的分片数量
"relocating_shards" : 0, 迁移中的分片的数量
"initializing_shards" : 0, 初始化中的分片数量
"unassigned_shards" : 3, 未分配的分片数量
"delayed_unassigned_shards" : 0, 延迟未分配的分片数量
"number_of_pending_tasks" : 0, 尚未执行的集群级别更改的数量
"number_of_in_flight_fetch" : 0, 未完成的提取次数
"task_max_waiting_in_queue_millis" : 0, 自最早启动的任务等待执行以来的时间
"active_shards_percent_as_number" : 95.58823529411765 集群中活动分片的比率,以百分比表示
}
5. 查看当前集群的master节点
curl -XGET 'http://localhost:9200/_cat/master?v' -uadmin:xxxx
6. 查看集群中每个节点上执行的任务
# 查看节点上执行的task,执行开始时间,执行耗时等等;
curl -XGET 'http://localhost:9200/_cat/tasks?v&detailed&s=start_time'
# 查看task的详情
curl -XGET 'http://localhost:9200/_tasks?detailed'
curl -XGET 'http://localhost:9200/_tasks?detailed=true&actions=*/update/byquery'
7. 查看每个节点的磁盘使用、shard分配情况
curl -XGET 'http://localhost:9200/_cat/allocation?v&s=disk.avail:asc'
8. 查看节点热点线程情况
curl -XGET 'http://localhost:9200/_nodes/hot_threads
9. kill集群中正在执行的任务
如果集群中存在长时间允许的任务,影响集群性能,可尝试将该任务取消
curl -XPOST 'localhost:9200/_tasks/_cancel'
10. 查看es中的某个索引
curl -XGET http://ip:9200/test?pretty 注意:test是索引名
11. 查看es中的全部索引
curl http://ip:9200/_cat/indices?v
12. 创建索引
curl -X PUT http://ip:9200/test #test是要创建的索引名称
13. 删除索引
curl -X DELETE http://ip:9200/test #test是要删除的索引名称
14. 删除某个索引下类型为external的ID为2的数据
curl -XDELETE http://ip:9200/customer/external/2?pretty
15. 解除索引只读状态
curl -XPUT 'http://es_ip:es_port/索引名称/_settings' -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": false}'
16. 打开索引状态为close的索引
curl -X POST http://ip:9200/$item/_open?pretty #$item 是状态为close的索引名称
17. 关闭索引
curl -X POST http://ip:9200/$item/_close?pretty
18. 查看es 未分配的UNASSIGNED
curl -s "http://ip:9200/_cat/shards"|grep UNASSIGNED|egrep -v "^\."
19. 重命名索引,在这之前先创建dest索引
curl -H "Content-Type:application/json" -XPOST 'http://ip:9200/_reindex' -d '
{
"source": {
"index": "source_index"
},
"dest": {
"index": "dest_index"
}
}'
20. 给test索引写入_doc类型数据
curl -s -H "Content-Type:application/json" -XPOST 'http://ip:9200/test/_doc' -d '{"name":"tom","age":32}'
21. 计算product索引中的文档总数
curl -XGET http://ip:9200/_cat/count/product?v
// 返回结果
epoch timestamp count
1654141931 03:52:11 13
22. 查询集群的分片分配信息
curl -XGET 'http://ip:9200/_cat/shards?v
23. 计算集群/索引中的文档总数
curl -XGET http://ip:9200/_cat/count?v
// 返回结果
epoch timestamp count
1654148743 04:51:11 1314
24. 查询集群的快照存储信息
curl -XGET http://ip:9200/_cat/snapshots?v
25. 指定快照名称查询集群的快照存储信息
curl -XGET http://ip:9200/_cat/snapshots/<repository>
26. 查看索引恢复情况
curl -XGET 'http://localhost:9200/_cat/recovery/indexname?v&active_only'
curl -XGET 'http://localhost:9200/_cat/recovery?v&active_only&h=index,shard,time,source_node,target_node,files_percent,bytes_percent,translog_ops_percent'
27. 加快索引分片恢复速度
当节点间 rebalance 数据时、当节点发生重启时、当底层机器故障导致节点failover时,分片需要迁移数据或者重加载数据或者补数据,这时分片数据量如果太大,分片initing、reblance 过程会比较慢。es默认限制了并行恢复的数量、速度等。
可以修改如下配置加快 recovery 速度:
// 查看正在进行的分片 recovery 进度
curl -XGET http://localhost:9200/_recovery?detailed=true&active_only=true
// 查看尚未分配的分片数量,其中,
"relocating_shards" 值为正在均衡的分片数量,
"initializing_shards" 值为正在初始化的分片数量,
"unassigned_shards" 值为尚未分配的分片数量
curl -XGET http://localhost:9200/_cluster/health?pretty
// 返回结果
{
"cluster_name": "test",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 43,
"number_of_data_nodes": 32,
"active_primary_shards": 891,
"active_shards": 1708,
"relocating_shards": 5,
"initializing_shards": 17,
"unassigned_shards": 57,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 0
}
//与分片恢复速度相关 4 个参数:
cluster.routing.allocation.cluster_concurrent_rebalance 该参数用来控制集群内同时运行的数据均衡任务个数
cluster.routing.allocation.node_initial_primaries_recoveries 该参数用来控制节点重启时,允许同时恢复几个主分片
cluster.routing.allocation.node_concurrent_recoveries 该参数用来控制节点除了主分片重启恢复以外其他情况下,允许同时运行的数据恢复任务
indices.recovery.max_bytes_per_sec 该参数用来控制节点恢复时的速率
//与分片恢复速度相关 4 个参数默认值为:
"cluster.routing.allocation.node_concurrent_recoveries": 2,
"cluster.routing.allocation.cluster_concurrent_rebalance": 2,
"cluster.routing.allocation.node_initial_primaries_recoveries": 2,
"indices.recovery.max_bytes_per_sec": "40mb"
//查看集群配置
curl -XGET http://localhost:9200/_cluster/settings?pretty
//注意:可根据集群实际情况适当调大参数,参数调的越大,内部通信带宽占用会越大,集群读、写性能受影响会越大
//加快分片恢复的速度
curl -XPUT http://localhost:9200/_cluster/settings
{
"transient" : {
"cluster.routing.allocation.node_concurrent_recoveries": 5,
"cluster.routing.allocation.cluster_concurrent_rebalance": 5,
"cluster.routing.allocation.node_initial_primaries_recoveries": 5,
"indices.recovery.max_bytes_per_sec": "1000mb"
}
}
//命令行
curl -XPUT -H 'content-type: application/json' 'http://localhost:9200/_cluster/settings' -d '
{
"transient" : {
"cluster.routing.allocation.node_concurrent_recoveries": 5,
"cluster.routing.allocation.cluster_concurrent_rebalance": 5,
"cluster.routing.allocation.node_initial_primaries_recoveries": 5,
"indices.recovery.max_bytes_per_sec": "1000mb"
}
}'
//恢复默认配置,2.x版本集群不支持 null,可使用 ""
curl -XPUT http://localhost:9200/_cluster/settings
{
"transient" : {
"cluster.routing.allocation.node_concurrent_recoveries":null,
"cluster.routing.allocation.cluster_concurrent_rebalance":null,
"cluster.routing.allocation.node_initial_primaries_recoveries": null,
"indices.recovery.max_bytes_per_sec" : null
}
}
28. 集群red/yellow原因分析
集群状态red,有索引主分片未分配、node节点离线;
状态yellow,有索引副本分片未分配
用如下命令分析分片分片为什么没有被分配
curl -XPOST http://localhost:9200/_cluster/allocation/explain
29. 指定具体索引的分片分析其为什么不能改被分配
curl -XGET http://localhost:9200/_cluster/allocation/explain
{
"index": "my-index-000001",
"shard": 0,
"primary": true
}
30. 分片恢复
首先尝试手动触发自动再次分配
curl -XPOST http://localhost:9200/_cluster/reroute?retry_failed
如果分片无法自动恢复;首先用如下命令获取shard的分配情况
curl -XGET http://localhost:9200/索引名/_shard_stores
31. 手动尝试分配副本
# 尝试分配副本
curl -XPOST http://localhost:9200/_cluster/reroute
{
"commands": [
{
"allocate_replica": {
"index": "index1", #索引名
"shard": 3, #副本数
"node": "nodes-9" #节点名
}
}
]
}
32. 恢复主分片
# 恢复主分片
# allocate_empty_primary
# allocate_stale_primary
curl -XPOST http://localhost:9200/_cluster/reroute
{
"commands" : [
{
"allocate_stale_primary" : {
"index" : "index42", #索引名称
"shard" : 0, #副本数
"node" : "II47uXW2QvqzHBnMcl2o_Q", # 可以写node的id或者node name
"accept_data_loss" : false # 是否允许丢失部分translog数据
}
}
]
}
33. 调控shard的分片恢复速度
curl -XPUT http://localhost:9200/_cluster/settings
{
"transient" : {
"cluster.routing.allocation.node_concurrent_recoveries":5,
"cluster.routing.allocation.cluster_concurrent_rebalance":5,
"cluster.routing.allocation.node_initial_primaries_recoveries":2,
"indices.recovery.max_bytes_per_sec" : "1000mb"
}
}
34. 集群级别设置默认超时时间
如果用户侧存在大量慢查询,且用户侧无法进行降级;则集群侧为了保证集群的稳定性可以降低集群级别的默认超时时间
# "search.default_search_timeout" : "-1", 默认值为-1
curl -XPUT http://localhost:9200/_cluster/settings
{
"persistent": {
"search.default_search_timeout" : "800ms"
}
}
35. 索引禁止查询
在业务中存在索引混用集群的场景,由于部分不重要的或者临时可降级的索引,查询导致集群不稳定;但是业务侧没办法紧急上线处理;es侧可以该改索引进行禁止查询
#"index.blocks.read": "false", false - 允许查询 true -> 禁止查询
#"index.blocks.read_only": "false", false - 允许读写 true -> 禁止写入
#"index.blocks.read_only_allow_delete": "false", false - 允许读写 true -> 禁止写入但允许删除索引
#"index.blocks.write": "false", false - 允许写入 true -> 禁止写入 ;如果是索引写入有问题可以进行禁止写入
curl -XPUT http://localhost:9200/索引名称/_setting
{
"index.blocks.read": "true"
}
36. 禁止一次查询过多分片
"action.destructive_requires_name" : "false"
业务按需求进行设置,出现场景,通常indexName*/_search,这种通配符查询
## 默认值为"9223372036854775807"
curl -XPUT http://localhost:9200/_cluster/settings
{
"persistent": {
"action.search.shard_count.limit" : "2000"
}
}
37.熔断设置
"indices.breaker.fielddata.limit" : "60%", 默认值是60%
curl -XPUT http://localhost:9200/_cluster/settings
{
"persistent": {
"indices.breaker.fielddata.limit": "40%",
"indices.breaker.request.limit": "40%",
"indices.breaker.total.limit": "50%"
}
}
38.分片均衡限制--集群级别均衡限制
# "cluster.routing.allocation.enable": "all",
#"cluster.routing.rebalance.enable": "none",
curl -XPUT http://localhost:9200/_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all",
"cluster.routing.rebalance.enable": "none"
}
}
39.分片均衡限制--手动移动分片
curl -XPOST http://localhost:9200/_cluster/reroute
{
"commands": [
{
"move": {
"index": "index1", #索引名称
"shard": 37, #分片数
"from_node": "nodes-71", #源节点
"to_node": "nodes-35" #目标节点
}
}
]
}
curl -XPOST "http://localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'{ "commands": [ { "move": { "index": "index1", "shard": 37, "from_node": "nodes-71", "to_node": "nodes-35" } } ]}'
40.索引级别均衡限制
index.routing.rebalance.enable
Enables shard rebalancing for this index. It can be set to:
all(default) - Allows shard rebalancing for all shards.
primaries- Allows shard rebalancing only for primary shards.
replicas- Allows shard rebalancing only for replica shards.
none- No shard rebalancing is allowed.
# _all所有索引,或者写索引名称通配符,或者写具体索引名称
curl -XPUT http://localhost:9200/_all/_settings
{
"index.routing.allocation.enable": "primaries",
"index.routing.rebalance.enable": "none"
}
41.索引级别强制均衡度限制
# 具体数字依据节点数和索引shard数目来定
curl -XPUT http://localhost:9200/索引名/_settings
{
"index.routing.allocation.total_shards_per_node": 5
}
42.分片迁移进度查看
curl -XGET http://localhost:9200/_cat/recovery?active_only&v
43.分片迁移加速
默认40mb,并发默认为2,可以适当调大,如果集群压力比较大,则调小
curl -XPUT http://localhost:9200/_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 2,
"indices.recovery.max_bytes_per_sec": "40mb/s"
}
}
curl -XPUT http://localhost:9200/_cluster/settings
{
"transient" : {
"indices.recovery.max_bytes_per_sec" : "200mb"
}
}
44. 查看节点上是否还有数据
curl -XGET http://localhost:9200/_cat/allocation?v
45. 紧急变配处理
建议用户停写,停写之后,可以执行如下命令,使得数据都落盘,避免需要回放translog
curl -XPOST http://localhost:9200/_flush/synced
46. 快照备份与恢复---s3远程快照仓库
1、获取集群中所有快照仓库信息
curl -XGET http://localhost:9200/_snapshot
2、获取具体仓库信息
curl -XGET http://localhost:9200/_snapshot/auto_snapshot
3、创建快照仓库
curl -XPUT http://localhost:9200/_snapshot/test_snapshot
{
"type" : "s3", #类型
"settings" : {
"bucket" : "elasticsearch-snapshot-cn-north-1", #桶名
"base_path" : "es-xxxxx/xxxxx", #路径
"endpoint" : "s3-internal.cn-north-1.xxxxx-oss.com", #地址
"protocol" : "http",
"compress" : "true",
"access_key": "xxxxxxxxxxxxx",
"secret_key": "xxxxxxxxxxxx", # s3密钥
"max_restore_bytes_per_sec" : "200mb", # 快照恢复速度
"max_snapshot_bytes_per_sec" : "100mb" # 快照速度
}
}
4、创建快照
curl -XPUT http://localhost:9200/_snapshot/test_snapshot/snapshot_2
{
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "user123",
"taken_because": "backup before upgrading"
}
}
curl -XPUT http://localhost:9200/_snapshot/auto_snapshot/snapshot_2
{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": false
}
5、查看快照进度
curl -XGET http://localhost:9200/_snapshot/_status
curl -XGET http://localhost:9200/_snapshot/<repository>/_status
curl -XGET http://localhost:9200/_snapshot/<repository>/<snapshot>/_status
6、快照恢复
# auto_snapshot 为仓库名称
# snapshot_name 替换为 用户界面的快照名称 auto_snapshot_20240509190034
curl -XPOST http://localhost:9200/_snapshot/auto_snapshot/snapshot_name/_restore
{
#"indices": "index_*",可以使用通配符
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false
}
以上就是实际运维中自身用到最多次的命令,后续也会进行补充更新,通过这些操作更好的熟悉es的用法。也欢迎各位大佬们评论区进行补充,一起学习进步!