纪实:嵌入式Elasticsearch服务因为gc无法释放内存,导致宕机事件

场景描述

我们电商服务中使用了Elasticsearch嵌入式服务,然后再一次错误代码提交后,导致elasticsearch服务检索了大量数据使得内存无法释放,最后服务发生stop-the-world,宕机了

原因解析

网上查询可能是因为Elasticsearch服务的gc高占用引起的,所以就开分析日志,分析命令为:

cat xxx.log |grep "INFO elasticsearch\[estore\]\[scheduler\]\[T#1\]"

以上是抓取elasticsearch服务的gc处理日志,其返回结果为:

16:06:47 1.6-2018-02-07 16:06:47,609 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][young][319][334] duration [832ms], collections [1]/[1.2s], total [832ms]/[7.1s], memory [530.6mb]->[653.8mb]/[7.9gb], all_pools {[young] [4.1mb]->[4mb]/[123.5mb]}{[survivor] [0b]->[47.4mb]/[47.5mb]}{[old] [526.5mb]->[602.2mb]/[7.7gb]}
16:07:50 1.6-2018-02-07 16:07:50,065 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][363][10] duration [6.2s], collections [1]/[6.9s], total [6.2s]/[23.9s], memory [5.4gb]->[5.3gb]/[7.9gb], all_pools {[young] [2.9mb]->[3.7mb]/[99mb]}{[survivor] [39.6mb]->[0b]/[80mb]}{[old] [5.3gb]->[5.3gb]/[7.7gb]}
16:08:02 1.6-2018-02-07 16:08:02,274 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][368][11] duration [7.8s], collections [1]/[8s], total [7.8s]/[31.8s], memory [6.3gb]->[6.3gb]/[7.9gb], all_pools {[young] [32.1mb]->[2.2mb]/[121.5mb]}{[survivor] [55.9mb]->[0b]/[67.5mb]}{[old] [6.2gb]->[6.3gb]/[7.7gb]}
16:08:15 1.6-2018-02-07 16:08:15,852 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][373][12] duration [8.2s], collections [1]/[9.1s], total [8.2s]/[40s], memory [7.2gb]->[7.4gb]/[7.9gb], all_pools {[young] [2.2mb]->[2.3mb]/[118mb]}{[survivor] [54.4mb]->[0b]/[67mb]}{[old] [7.2gb]->[7.4gb]/[7.7gb]}
16:08:50 1.6-2018-02-07 16:08:50,028 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][377][15] duration [8.4s], collections [1]/[8.4s], total [8.4s]/[1.2m], memory [7.8gb]->[7.7gb]/[7.9gb], all_pools {[young] [57.9mb]->[49.6mb]/[124.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]}

分析结果,最后一条{[old] [7.7gb]->[7.7gb]/[7.7gb]},该结果表示Elasticsearch服务最后无法将gc释放,导致了内存高占用,使得服务宕机了

以上是宕机原因,但什么使得gc高占用呢?

继续查看结果分析,发现在第一条结尾 {[old] [526.5mb]->[602.2mb]/[7.7gb]},而最后一条{[old] [7.7gb]->[7.7gb]/[7.7gb]}

说明应该是有不合适索引检索导致的,那么继续分析日志

cat xxx.log | grep "elasticsearch\[estore\]\[search\]"

抓取结果

16:03:04 1.6-2018-02-07 16:03:04,331 TRACE elasticsearch[estore][search][T#3] - [estore] [blanksimple][2] took[575.1ms], took_millis[575], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,407 TRACE elasticsearch[estore][search][T#4] - [estore] [blanksimple][3] took[652.8ms], took_millis[652], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,424 TRACE elasticsearch[estore][search][T#2] - [estore] [blanksimple][1] took[669.4ms], took_millis[669], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,439 TRACE elasticsearch[estore][search][T#5] - [estore] [blanksimple][4] took[684.4ms], took_millis[684], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,441 TRACE elasticsearch[estore][search][T#1] - [estore] [blanksimple][0] took[687.1ms], took_millis[687], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,327 DEBUG elasticsearch[estore][search][T#14] - [estore] [blanksimple][0] took[4.2s], took_millis[4288], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,331 DEBUG elasticsearch[estore][search][T#20] - [estore] [blanksimple][1] took[4.2s], took_millis[4292], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,332 DEBUG elasticsearch[estore][search][T#21] - [estore] [blanksimple][2] took[4.2s], took_millis[4294], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,343 DEBUG elasticsearch[estore][search][T#19] - [estore] [blanksimple][3] took[4.3s], took_millis[4304], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,347 DEBUG elasticsearch[estore][search][T#22] - [estore] [blanksimple][4] took[4.3s], took_millis[4308], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
...

很多,但是主要注意到里面出现了 WRAN,于是修改抓取命令
cat xxx.log |grep WARN | grep "elasticsearch\[estore\]\[search\]" 

其结果

16:07:38 1.6-2018-02-07 16:07:38,981 WARN elasticsearch[estore][search][T#2] - [estore] [blanksimple][2] took[49.8s], took_millis[49814], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,090 WARN elasticsearch[estore][search][T#24] - [estore] [blanksimple][3] took[49.9s], took_millis[49924], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,101 WARN elasticsearch[estore][search][T#18] - [estore] [blanksimple][0] took[49.9s], took_millis[49934], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,201 WARN elasticsearch[estore][search][T#25] - [estore] [blanksimple][4] took[50s], took_millis[50035], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,469 WARN elasticsearch[estore][search][T#23] - [estore] [xblanksimple][1] took[50.3s], took_millis[50303], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],


发现这个查询有问题,这个查询在没有条件的情况下,其查询条件size却设为了Integer最大值,该types下的数据量很大,从而导致gc高占用,这个是业务代码问题

此外的第2个因素,索引数据量达到了12个G,内嵌elasticsearch-web服务jvm才8G,这个索引数据大小和jvm的问题请大家自行查询下分片策略等,总之jvm应该超过当前服务所有用的分片的数据总量才行

总结

以上2个错误同存在,导致gc无法释放内存,导致宕机事件

参考内容:http://blog.csdn.net/quicknet/article/details/45148447


你可能感兴趣的:(JAVA,elasticsearch)