集群所有数据节点频繁因为StackOverflowError的错误挂掉,启动后还会挂掉,StackOverflowError异常栈如下
[2023-12-22T16:03:44,057][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [xr-data-hdp-dn-rtyarn0725] fatal error in thread [elasticsearch[xr-data-hdp-dn-rtyarn0725][write][T#6]], exiting
java.lang.StackOverflowError: null
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:283) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
...
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
通过堆栈可以看出是写入线程池[write]发生的Stackoverflow,并且可能是在解析mapping的过程发生的,通过ObjectMapper类推断是Object类型数据写入导致的。因此通过拉取集群内所有索引的mapping,尝试找出哪个索引的mapping有Object类型的字段,但结果没能找到。
最后,因为这个集群的索引较少,我们通过简单暴力的方法——二分查找停掉作业观察集群状态,来找到问题索引。
为什么会发生Stackoverflow?
栈溢出的堆栈发生在ES服务端处理客户端的写入请求时,在开启dynamic mapping的情况下,如果写入数据包含新的字段配置,需要解析字段配置,解析字段配置的逻辑是递归解析配置对应的JSON数据,当字段类型为嵌套格式(Object/nested)时,递归的次数取决于用户数据的嵌套层数。问题索引的数据嵌套层数过多导致,递归次数过多,进而导致栈溢出。
验证:
测试写入一条多层嵌套的数据,结果中的代码堆栈和现象中发生StackOverflowError的栈相同,出现了多次递归
{
"o1":{
"a":{
"b":{
"c":{
"d":{
"e":{
"f":{
"g":{
"h":{
"j":"ddd"
}
}
}
}
}
}
}
}
}
}
代码堆栈:
查看问题索引确实开启了dynamic mapping,并且原始日志确实存在包含大量嵌套结构的数据
为什么问题索引的mapping中不包含Object类型的字段?
异常堆栈的触发时机为数据写入解析mapping,此时还未将新的mapping更新为索引的mapping,由于解析mapping时发生了Stackoverflow导致ES进程crash,因此索引mapping没有更新,自然问题索引的mapping中不包含Object类型的字段。
ES侧有nested字段的深度限制(index.mapping.depth.limit),为什么没拦截掉该消息?
该检查在解析字段配置之后,解析字段时就发生了栈溢出,详见下面的代码
private synchronized Map internalMerge(Map mappings, MergeReason reason) {
//...省略无关代码...
try {
documentMapper =
documentParser.parse(type, entry.getValue(), applyDefault ? defaultMappingSourceOrLastStored : null); // 数据的mapping解析
} catch (Exception e) {
throw new MapperParsingException("Failed to parse mapping [{}]: {}", e, entry.getKey(), e.getMessage());
}
}
return internalMerge(defaultMapper, defaultMappingSource, documentMapper, reason);// 这里会检查mapping
}
private synchronized Map internalMerge(@Nullable DocumentMapper defaultMapper,
@Nullable String defaultMappingSource, DocumentMapper mapper,
MergeReason reason) {
//...省略无关代码...
boolean hasNested = this.hasNested;
Map fullPathObjectMappers = this.fullPathObjectMappers;
Map results = new LinkedHashMap<>(2);
if (defaultMapper != null) {
if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_7_0_0)) {
throw new IllegalArgumentException(DEFAULT_MAPPING_ERROR_MESSAGE);
} else if (reason == MergeReason.MAPPING_UPDATE) { // only log in case of explicit mapping updates
deprecationLogger.deprecatedAndMaybeLog("default_mapping_not_allowed", DEFAULT_MAPPING_ERROR_MESSAGE);
}
assert defaultMapper.type().equals(DEFAULT_MAPPING);
results.put(DEFAULT_MAPPING, defaultMapper);
}
for (ObjectMapper objectMapper : objectMappers) {
if (reason != MergeReason.MAPPING_RECOVERY) {
checkTotalFieldsLimit(objectMappers.size() + fieldMappers.size() - metadataMappers.length
+ fieldAliasMappers.size());
checkFieldNameSoftLimit(objectMappers, fieldMappers, fieldAliasMappers);
checkNestedFieldsLimit(fullPathObjectMappers);
checkDepthLimit(fullPathObjectMappers.keySet()); // 检查mapping的最大深度是打破阈值,是则抛出IllegalArgumentException
}
results.put(newMapper.type(), newMapper);
}
return results;
}
官方社区在v8.6修复了该问题,https://github.com/elastic/elasticsearch/issues/52098,我们使用的版本是ES7,需要升级或者打patch才能解决