Elasticsearch 5.*版本以上text和keyword的区别

keyword与text的区别:

 

 

在es 2.*版本里面是没有这两个字段,只有string字段。

5.*之后,把string字段设置为了过时字段,引入text,keyword字段

这两个字段都可以存储字符串使用,但建立索引和搜索的时候是不太一样的

 

 

keyword:存储数据时候,不会分词建立索引

text:存储数据时候,会自动分词,并生成索引(这是很智能的,但在有些字段里面是没用的,所以对于有些字段使用text则浪费了空间)。

 

 

测试Case如下:(Elasticsearch 5.4版本)

建立两个字段:

    zuMaker 族制作人 keyword类型

    zuName  族名称 text类型

我现在分别往两个字段里面存储数据,zuMaker存储 “张三李四”zuName存储 “墙体钢结构”、

 

PUT test_index
{
	"mappings": {
		"app": {
		  "properties": { 
			"zumaker": {
				"type": "keyword",
				"index": true
			},
			"zuname": {
				"type": "text",
				"index": "true",
				"analyzer": "standard",
				"search_analyzer": "standard"
			}
		}
		}
	}
}


POST test_index/app/1
{
    "zumaker":"张三李四",
    "zuname":"墙体钢结构"
}



GET test_index/app/_search
{
  "query": {
    "term": {
      "zuname": {
        "value": "墙"
      }
    }
  }
}

GET test_index/app/_search
{
  "query": {
    "term": {
      "zumaker": {
        "value": "张三李四"
      }
    }
  }
}

其实在存储的过程中zuMaker 没有分词,只是存储了一个张三李四,

GET /test_index/app/1/_termvectors?fields=zumaker



{
  "_index": "test_index",
  "_type": "app",
  "_id": "1",
  "_version": 4,
  "found": true,
  "took": 1,
  "term_vectors": {
    "zumaker": {
      "field_statistics": {
        "sum_doc_freq": 1,
        "doc_count": 1,
        "sum_ttf": -1
      },
      "terms": {
        "张三李四": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4
            }
          ]
        }
      }
    }
  }
}

 

而zuName字段存储倒排索引的时候进行了分词

GET /test_index/app/1/_termvectors?fields=zuname

{
  "_index": "test_index",
  "_type": "app",
  "_id": "1",
  "_version": 4,
  "found": true,
  "took": 1,
  "term_vectors": {
    "zuname": {
      "field_statistics": {
        "sum_doc_freq": 5,
        "doc_count": 1,
        "sum_ttf": 5
      },
      "terms": {
        "体": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 1,
              "end_offset": 2
            }
          ]
        },
        "墙": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 1
            }
          ]
        },
        "构": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 4,
              "start_offset": 4,
              "end_offset": 5
            }
          ]
        },
        "结": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 3,
              "start_offset": 3,
              "end_offset": 4
            }
          ]
        },
        "钢": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 2,
              "end_offset": 3
            }
          ]
        }
      }
    }
  }
}

 

这样在查询的时候,这两个字段的区别就表现出来了

如果精确查找zuName字段

会出现空数据,表示查不到数据,

这是因为墙体钢结构这个值在存储的时候被分词了,倒排索引里面只有‘墙',体’,'钢','结','构',

但是单独查询'墙'是可以查到结果的

 

GET test_index/app/_search
{
  "query": {
    "term": {
      "zuname": {
        "value": "墙"
      }
    }
  }
}


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2824934,
    "hits": [
      {
        "_index": "test_index",
        "_type": "app",
        "_id": "1",
        "_score": 0.2824934,
        "_source": {
          "zumaker": "张三李四",
          "zuname": "墙体钢结构"
        }
      }
    ]
  }
}

   

 

如果精确查找zuMakert字段

GET test_index/app/_search
{
  "query": {
    "term": {
      "zumaker": {
        "value": "张三李四"
      }
    }
  }
}


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test_index",
        "_type": "app",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "zumaker": "张三李四",
          "zuname": "墙体钢结构"
        }
      }
    ]
  }
}

这时候这条记录是存在的,因为keyword字段不会进行分词。

这查询是精确查询出现的结果

 

使用java api建立时候区别如下

	@Field(type = FieldType.text,analyzer = "standard",searchAnalyzer = "standard")
	private List category=new ArrayList<>();


	@Field(type = FieldType.keyword)
	private String logoUrl="";

 

text类型需要指定analyzer和searchAnalyzer,默认是standard,

一般指定的analyzer和searchAnalyzer都是相同的,才能保证关键词正常被搜索

 

keyword类型,不会被分词,也不需要指定分词方法

 

 

 

 

你可能感兴趣的:(ElasticSearch)