An astute reader might have noticed a problem with the fielddata size settings. fielddata size is checked after the data is loaded. What happens if a query arrives that tries to load more into fielddata than available memory? The answer is ugly: you would get an OutOfMemoryException.
Elasticsearch includes a fielddata circuit breaker that is designed to deal with this situation. The circuit breaker estimates the memory requirements of a query by introspecting the fields involved (their type, cardinality, size, and so forth). It then checks to see whether loading the required fielddata would push the total fielddata size over the configured percentage of the heap. 在load之前,检查下是否能够超出去,这个方法不错。
If the estimated query size is larger than the limit, the circuit breaker is tripped and the query will be aborted and return an exception. This happens before data is loaded, which means that you won’t hit an OutOfMemoryException.
Available Circuit Breakers
Elasticsearch has a family of circuit breakers, all of which work to ensure that memory limits are not exceeded:
indices.breaker.fielddata.limit
The fielddata circuit breaker limits the size of fielddata to 60% of the heap, by default.
indices.breaker.request.limit
The request circuit breaker estimates the size of structures required to complete other parts of a request, such as creating aggregation buckets, and limits them to 40% of the heap, by default.
indices.breaker.total.limit
The total circuit breaker wraps the request and fielddata circuit breakers to ensure that the combination of the two doesn’t use more than 70% of the heap by default.
The circuit breaker limits can be specified in the config/elasticsearch.yml file, or can be updated dynamically on a live cluster:
PUT /_cluster/settings { "persistent" : { "indices.breaker.fielddata.limit" : "40%" } }
|
The limit is a percentage of the heap. |
It is important to note that the circuit breaker compares estimated query size against the total heap size, not against the actual amount of heap memory used. This is done for a variety of technical reasons (for example, the heap may look full but is actually just garbage waiting to be collected, which is hard to estimate properly). But as the end user, this means the setting needs to be conservative, since it is comparing against total heap, not free heap.