建立两个字段:
zuMaker 族制作人 keyword类型
zuName 族名称 text类型
我现在分别往两个字段里面存储数据,zuMaker存储 “张三李四”zuName存储 “墙体钢结构”、
PUT test_index
{
"mappings": {
"app": {
"properties": {
"zumaker": {
"type": "keyword",
"index": true
},
"zuname": {
"type": "text",
"index": "true",
"analyzer": "standard",
"search_analyzer": "standard"
}
}
}
}
}
POST test_index/app/1
{
"zumaker":"张三李四",
"zuname":"墙体钢结构"
}
GET test_index/app/_search
{
"query": {
"term": {
"zuname": {
"value": "墙"
}
}
}
}
GET test_index/app/_search
{
"query": {
"term": {
"zumaker": {
"value": "张三李四"
}
}
}
}
其实在存储的过程中zuMaker 没有分词,只是存储了一个张三李四,
GET /test_index/app/1/_termvectors?fields=zumaker
{
"_index": "test_index",
"_type": "app",
"_id": "1",
"_version": 4,
"found": true,
"took": 1,
"term_vectors": {
"zumaker": {
"field_statistics": {
"sum_doc_freq": 1,
"doc_count": 1,
"sum_ttf": -1
},
"terms": {
"张三李四": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4
}
]
}
}
}
}
}
而zuName字段存储倒排索引的时候进行了分词
GET /test_index/app/1/_termvectors?fields=zuname
{
"_index": "test_index",
"_type": "app",
"_id": "1",
"_version": 4,
"found": true,
"took": 1,
"term_vectors": {
"zuname": {
"field_statistics": {
"sum_doc_freq": 5,
"doc_count": 1,
"sum_ttf": 5
},
"terms": {
"体": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 1,
"end_offset": 2
}
]
},
"墙": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 1
}
]
},
"构": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 4,
"end_offset": 5
}
]
},
"结": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 3,
"end_offset": 4
}
]
},
"钢": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 2,
"end_offset": 3
}
]
}
}
}
}
}
这样在查询的时候,这两个字段的区别就表现出来了
如果精确查找zuName字段
会出现空数据,表示查不到数据,
这是因为墙体钢结构这个值在存储的时候被分词了,倒排索引里面只有‘墙',体’,'钢','结','构',
但是单独查询'墙'是可以查到结果的
GET test_index/app/_search
{
"query": {
"term": {
"zuname": {
"value": "墙"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2824934,
"hits": [
{
"_index": "test_index",
"_type": "app",
"_id": "1",
"_score": 0.2824934,
"_source": {
"zumaker": "张三李四",
"zuname": "墙体钢结构"
}
}
]
}
}
如果精确查找zuMakert字段
GET test_index/app/_search
{
"query": {
"term": {
"zumaker": {
"value": "张三李四"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "test_index",
"_type": "app",
"_id": "1",
"_score": 0.2876821,
"_source": {
"zumaker": "张三李四",
"zuname": "墙体钢结构"
}
}
]
}
}
这时候这条记录是存在的,因为keyword字段不会进行分词。
这查询是精确查询出现的结果
使用java api建立时候区别如下
@Field(type = FieldType.text,analyzer = "standard",searchAnalyzer = "standard")
private List category=new ArrayList<>();
@Field(type = FieldType.keyword)
private String logoUrl="";
text类型需要指定analyzer和searchAnalyzer,默认是standard,
一般指定的analyzer和searchAnalyzer都是相同的,才能保证关键词正常被搜索
keyword类型,不会被分词,也不需要指定分词方法