在对elasticsearch建立mapping时,使用了map类型
private Map specs;
使用kibana查看自动映射类型,发现为:
"specs": {
"properties": {
"CPU品牌": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
而不是传统的:
"price": {
"type": "long"
},
"skus": {
"type": "keyword",
"index": false
},
这样的简单类型,对上面的情况和字段比较陌生,于是去搜集资料,最终在官网找到了相关的解答:
fields
editIt is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields. For instance, a
string
field could be mapped as atext
field for full-text search, and as akeyword
field for sorting or aggregations:PUT my_index { "mappings": { "_doc": { "properties": { "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } } } } } PUT my_index/_doc/1 { "city": "New York" } PUT my_index/_doc/2 { "city": "York" } GET my_index/_search { "query": { "match": { "city": "york" } }, "sort": { "city.raw": "asc" }, "aggs": { "Cities": { "terms": { "field": "city.raw" } } } }
COPY AS CURLVIEW IN CONSOLE
The
city.raw
field is akeyword
version of thecity
field.The
city
field can be used for full text search.
The
city.raw
field can be used for sorting and aggregationsMulti-fields do not change the original
_source
field.The
fields
setting is allowed to have different settings for fields of the same name in the same index. New multi-fields can be added to existing fields using the PUT mapping API.Multi-fields with multiple analyzersedit
Another use case of multi-fields is to analyze the same field in different ways for better relevance. For instance we could index a field with the
standard
analyzer which breaks text up into words, and again with theenglish
analyzer which stems words into their root form:PUT my_index { "mappings": { "_doc": { "properties": { "text": { "type": "text", "fields": { "english": { "type": "text", "analyzer": "english" } } } } } } } PUT my_index/_doc/1 { "text": "quick brown fox" } PUT my_index/_doc/2 { "text": "quick brown foxes" } GET my_index/_search { "query": { "multi_match": { "query": "quick brown foxes", "fields": [ "text", "text.english" ], "type": "most_fields" } } }
COPY AS CURLVIEW IN CONSOLE
The
text
field uses thestandard
analyzer.The
text.english
field uses theenglish
analyzer.
Index two documents, one with
fox
and the other withfoxes
.
Query both the
text
andtext.english
fields and combine the scores.The
text
field contains the termfox
in the first document andfoxes
in the second document. Thetext.english
field containsfox
for both documents, becausefoxes
is stemmed tofox
.The query string is also analyzed by the
standard
analyzer for thetext
field, and by theenglish
analyzer for thetext.english
field. The stemmed field allows a query forfoxes
to also match the document containing justfox
. This allows us to match as many documents as possible. By also querying the unstemmedtext
field, we improve the relevance score of the document which matchesfoxes
exactly.
问题解决,fields即为不同目的以不同方式索引相同字段,达到多方式索引,例如,string
字段可以映射为text
全文搜索字段,也可以映射keyword
为排序或聚合字段.如其中CPU品牌的text字段可用于搜索分词,而CPU品牌的keyword字段可用于聚合.测试:
GET /goods/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "specs.内存",
"size": 10
}
}
}
}
搜索结果出现:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [specs.内存] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "goods",
"node": "sjredvFNT729Jrv0wvucVA",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [specs.内存] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
]
},
"status": 400
}
为什么会报这个错?
我在官方找到了解释:
fielddata
editMost fields are indexed by default, which makes them searchable. Sorting, aggregations, and accessing field values in scripts, however, requires a different access pattern from search.
Search needs to answer the question "Which documents contain this term?", while sorting and aggregations need to answer a different question: "What is the value of this field for thisdocument?".
Most fields can use index-time, on-disk
doc_values
for this data access pattern, buttext
fields do not supportdoc_values
.Instead,
text
fields use a query-time in-memory data structure calledfielddata
. This data structure is built on demand the first time that a field is used for aggregations, sorting, or in a script. It is built by reading the entire inverted index for each segment from disk, inverting the term ↔︎ document relationship, and storing the result in memory, in the JVM heap.Fielddata is disabled on
text
fields by defaulteditFielddata can consume a lot of heap space, especially when loading high cardinality
text
fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.If you try to sort, aggregate, or access values from a script on a
text
field, you will see this exception:Fielddata is disabled on text fields by default. Set
fielddata=true
on [your_field_name
] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.Before enabling fielddataedit
Before you enable fielddata, consider why you are using a
text
field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.A text field is analyzed before indexing so that a value like
New York
can be found by searching fornew
or foryork
. Aterms
aggregation on this field will return anew
bucket and ayork
bucket, when you probably want a single bucket calledNew York
.Instead, you should have a
text
field for full text searches, and an unanalyzedkeyword
field withdoc_values
enabled for aggregations, as follows:PUT my_index { "mappings": { "_doc": { "properties": { "my_field": { "type": "text", "fields": { "keyword": { "type": "keyword" } } } } } } }
COPY AS CURLVIEW IN CONSOLE
Use the
my_field
field for searches.Use the
my_field.keyword
field for aggregations, sorting, or in scripts.Enabling fielddata on
text
fieldseditYou can enable fielddata on an existing
text
field using the PUT mapping API as follows:PUT my_index/_mapping/_doc { "properties": { "my_field": { "type": "text", "fielddata": true } } }
COPY AS CURLVIEW IN CONSOLE
The mapping that you specify for
my_field
should consist of the existing mapping for that field, plus thefielddata
parameter.
fielddata_frequency_filter
editFielddata filtering can be used to reduce the number of terms loaded into memory, and thus reduce memory usage. Terms can be filtered by frequency:
The frequency filter allows you to only load terms whose document frequency falls between a
min
andmax
value, which can be expressed an absolute number (when the number is bigger than 1.0) or as a percentage (eg0.01
is1%
and1.0
is100%
). Frequency is calculated per segment. Percentages are based on the number of docs which have a value for the field, as opposed to all docs in the segment.Small segments can be excluded completely by specifying the minimum number of docs that the segment should contain with
min_segment_size
:PUT my_index { "mappings": { "_doc": { "properties": { "tag": { "type": "text", "fielddata": true, "fielddata_frequency_filter": { "min": 0.001, "max": 0.1, "min_segment_size": 500 } } } } } }
简单意思就是说:text字段在默认情况下,禁用Fieddata,而elasticsearch自动映射会给出一个keyword字段用于聚合等操作,为什么text被禁用呢,有两个原因,一个就是Fielddata在加载高基数的text字段时,会消耗大量的堆空间,另一个原因就是对text字段进行聚合通常没有意义,比如
A text field is analyzed before indexing so that a value like
New York
can be found by searching fornew
or foryork
. Aterms
aggregation on this field will return anew
bucket and ayork
bucket, when you probably want a single bucket calledNew York
.Instead, you should have a
text
field for full text searches, and an unanalyzedkeyword
field withdoc_values
enabled for aggregations, as follows:PUT my_index { "mappings": { "_doc": { "properties": { "my_field": { "type": "text", "fields": { "keyword": { "type": "keyword" } } } } } } }
本来你想对"new york"进行聚合,但是在之前进行了分词,聚合后会有两个桶,分别是new和york,如果你想得到一个桶,就应该有一个启用了聚合的未分析keyword
字段doc_values,然后text用于全文搜索,岂不两全其美.
下面对keyword字段进行聚合测试,测试成功:
GET /goods/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "specs.内存.keyword",
"size": 10
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 182,
"max_score": 0,
"hits": []
},
"aggregations": {
"NAME": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "4GB",
"doc_count": 75
},
{
"key": "3GB",
"doc_count": 49
},
{
"key": "6GB",
"doc_count": 48
},
{
"key": "2GB",
"doc_count": 23
},
{
"key": "8GB",
"doc_count": 2
}
]
}
}
}
最后感谢google+baidu.解决了问题.