kibana启动失败all shards failed,无法连接elasticsearch

现象:

本地集群启动3个Node,es都启动正常,search-head也都能连接上,但是有警告日志:

2019-12-31T08:54:46,320][WARN ][o.e.c.r.a.DiskThresholdMonitor] [node1] high disk watermark [90%] exceeded on [wYsY5n5QRduREAAZvA5Biw][vipnode2][/node-2/data/nodes/0] free: 17.8gb[7.6%], shards will be relocated away from this node

然后启动kibana,启动报一堆的红色日志,控制台打不开,关键错误日志:

elasticsearch - SearchPhaseExecutionException[Failed to execute phase [query], all shards failed]

{ statusCode: 503,

    payload:

      { statusCode: 503,

        error: 'Service Unavailable',

        message: 'Request Timeout after 30000ms' },

    headers: {} },

  reformat: [Function],

  [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/esUnavailable' }

  log  [00:44:10.647] [info][plugins-system] Stopping all plugins.

  log  [00:44:10.648] [info][plugins][translations] Stopping plugin


解决:

参考了https://www.jianshu.com/p/443cf6ce87d5排查问题ap,https://www.elastic.co/guide/en/elasticsearch/reference/5.5/cluster-allocation-explain.htmli,

最后确定了关键的参数cluster.routing.allocation.disk.threshold_enabled

(es可以根据磁盘使用情况来决定是否继续分配shard。默认设置是开启的).

为了在本地单机上测试,我自己电脑磁盘空间剩下没多少了,修改elasticsearch.yml,设置cluster.routing.allocation.disk.threshold_enabled: false。

然后删除了data,logs的文件,重启es,kibana,一切都正常,从red到green.


总结:

1.系统启动的warm日志也很重要,关注每一个细节,能快速定位问题。

 2.这次问题的几个关键参数,具体含义可以去官网查:cluster.routing.allocation.disk.threshold_enabled,cluster.routing.allocation.disk.watermark.low,cluster.routing.allocation.disk.watermark.high

你可能感兴趣的:(kibana启动失败all shards failed,无法连接elasticsearch)