ElasticSearch是一个基于Lucene的搜索服务器,它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。ElasticSearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。
ElasticSearch是面向文档(document oriented)的,这意味着它可以存储整个对象或文档(document)。然而它不仅仅是存储,还会索引(index)每个文档的内容使之可以被快速搜索。在ElasticSearch中,你可以对文档(而非成行成列的数据)进行索引、搜索、排序、过滤,集合及数据分析。
ElasticSearch使用 JSON作为文档序列化格式。JSON现在已经被大多语言所支持,而且已经成为NoSQL数据领域的标准格式。
ElasticSearch的一个文档不仅包含文档信息,还包含元数据--有关文档的信息。元数据的三大元素分别是:
_index:索引库,类似于关系型数据库里的“数据库”,它是我们存储和索引关联数据的地方。
_type:类型,类似于关系型数据库中的表。可以是大写或小写,不能包含下划线或逗号。
_id:与_index和_type组合时,就可以在ELasticsearch中唯一标识(类似于主键)一个文档。当创建一个文档,你可以自定义_id,也可以让Elasticsearch自动生成。
另外,元数据还包括以下信息:
_uid:文档唯一标识(_type#_id)
_source:文档原始数据
_all:所有字段的连接字符串
ElasticSearch中常用的的各种服务的URL地址,如下表所示:
功能 |
URL |
请求方式 |
说明 |
集群相关 |
/_cat/health?v |
GET |
查看集群健康状态 |
/_cat/nodes?v |
GET |
查看节点健康状态 |
|
/_cat/indices?v |
GET |
查看集群所有索引 |
|
/_cluster/nodes |
GET |
获得集群中所有节点和信息 |
|
/_cluster/health |
GET |
查看集群健康状态 |
|
/_cluster/state |
GET |
获得集群里的所有信息(集群信息、节点信息、mapping信息等) |
|
节点相关 |
/_nodes/process |
GET |
查看file descriptor的相关信息 |
/_nodes/process/stats |
GET |
统计节点的资源信息(内存、CPU等) |
|
/_nodes/jvm |
GET |
获得各节点的虚拟机统计和配置信息 |
|
/_nodes/jvm/stats |
GET |
更加详细的虚拟机信息 |
|
/_nodes/http |
GET |
获得各个节点的http信息(如ip地址) |
|
/_nodes/http/stats |
GET |
获得各个节点处理http请求的统计情况 |
|
/_nodes/thread_pool |
GET |
获得各种类型的线程池 |
|
/_nodes/thread_pool/stats |
GET |
获得各种类型的线程池的统计信息 |
|
索引相关 |
/index/_search |
GET,POST |
索引查询 |
/index |
PUT,DELETE |
创建或操作索引 |
|
/_aliases |
GET,POST |
获取或操作索引的别名 |
|
/index/_settings |
PUT |
创建或操作设置(其中number_of_shards不可更改) |
|
/index/_mapping |
PUT |
创建或操作mapping |
|
/index/_open |
POST |
打开被关闭的索引 |
|
/index/_close |
POST |
关闭索引 |
|
/index/_refresh |
POST |
刷新索引(使新加内容对搜索可见) |
|
/index/_flush |
POST |
刷新索引,将变动提交到lucene索引文件中并清空elasticsearch的transaction log |
|
/index/_optimize |
POST |
优化segement,主要是对索引的segement进行合并 |
|
/index/_status |
GET |
获得索引的状态信息 |
|
/index/_segments |
GET |
获得索引的segments的状态信息 |
|
/index/type/id |
PUT,POST,DELETE |
操作指定文档(增删改查) |
|
/index/type/id/_create |
PUT |
创建一个文档,如果该文件已经存在,则返回失败 |
|
/index/type/id/_update |
POST |
更新一个文件,如果改文件不存在,则返回失败 |
|
/index/type/_bulk |
PUT |
批量提交数据更新 |
|
/index/type/_mget |
GTE |
批量获取指定_id的文档信息 |
|
/index/_explain |
GET |
不执行实际搜索,而返回解释信息 |
|
/index/_analyze |
GET |
不执行实际搜索,根据输入的参数进行文本分析 |
GET _cat/health?v
Response:
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1565253576 08:39:36 my-es.cluster green 1 1 2 2 0 0 0 0 - 100.0%
GET _cat/nodes?v
Response:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.1.199 36 63 1 0.00 0.14 0.12 mdi * node-1
GET _cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_task_manager aP_xUt7lQD2RdQDuT5ynbw 1 0 2 0 12.5kb 12.5kb
green open .kibana_1 -axbsiTwRPmlIVniX-0hOA 1 0 4 1 19.8kb 19.8kb
GET _cluster/health
Response:
{
"cluster_name" : "my-es.cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 2,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
GET _nodes/process
Response:
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "my-es.cluster",
"nodes" : {
"SQYgJvIZR7yqA3TzkURejA" : {
"name" : "node-1",
"transport_address" : "192.168.1.199:9300",
"host" : "192.168.1.199",
"ip" : "192.168.1.199",
"version" : "6.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "b506955",
"roles" : [
"master",
"data",
"ingest"
],
"attributes" : {
"ml.machine_memory" : "3954188288",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 9496,
"mlockall" : false
}
}
}
}
GET _nodes/http
Response:
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "my-es.cluster",
"nodes" : {
"SQYgJvIZR7yqA3TzkURejA" : {
"name" : "node-1",
"transport_address" : "192.168.1.199:9300",
"host" : "192.168.1.199",
"ip" : "192.168.1.199",
"version" : "6.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "b506955",
"roles" : [
"master",
"data",
"ingest"
],
"attributes" : {
"ml.machine_memory" : "3954188288",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"http" : {
"bound_address" : [
"192.168.1.199:9200"
],
"publish_address" : "192.168.1.199:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}
PUT user_index
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 1
}
}
Response:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "user_index"
}
PUT user_index/_settings
{
"settings": {
"number_of_replicas": 2
}
}
Response:
{
"acknowledged" : true
}
DELETE user_index
Response:
{
"acknowledged" : true
}
POST _aliases
{
"actions":[{
"add":{"index":"user_index","alias":"user_alias"}
}]
}
也可以这样写:
PUT user_index/_aliases
{
"actions":[{
"add":{"alias":"user_alias"}
}]
}
POST _aliases
{
"actions":[{
"remove":{"index":"user_index","alias":"user_alias"}
}]
}
也可以这样写:
PUT user_index/_aliases
{
"actions":[{
"remove":{"alias":"user_alias"}
}]
}
GET _aliases
Response:
{
".kibana_1" : {
"aliases" : {
".kibana" : { }
}
},
".kibana_task_manager" : {
"aliases" : { }
},
"user_index" : {
"aliases" : {
"user_alias" : { }
}
}
}
PUT user_index/_mapping/user_type
{
"dynamic":false,
"properties": {
"name":{
"type": "text",
"analyzer": "standard"
},
"age": {
"type": "integer"
},
"join_date":{
"type": "date"
},
"phone":{
"type": "keyword"
},
"country":{
"type": "keyword"
},
"province":{
"type": "keyword"
},
"city":{
"type": "keyword"
},
"remark":{
"type": "text",
"analyzer": "whitespace"
}
}
}
如果没有指定_id则Elasticsearch会自动创建一个_id的值
PUT user_index/user_type/1
{
"name":"chen zhuangyuan",
"age":27,
"join_date":"2018-01-01",
"phone":"18823450001",
"country":"CN",
"province":"guangdong",
"city":"guangzhou",
"remark":"I'm zhuangyuan,I like elasticsearch"
}
Response:
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 20,
"_primary_term" : 3
}
这个和新增一个doc一样,如果doc存在则完全更新,doc不存在则创建。
PUT user_index/user_type/1
{
"name":"chen zhuangyuan",
"age":28
}
更新后_id=1的这个doc的信息如下,其他字段的值已经被清空了。
GET user_index/user_type/1
Response:
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_version" : 8,
"_seq_no" : 23,
"_primary_term" : 3,
"found" : true,
"_source" : {
"name" : "chen zhuangyuan",
"age" : 28
}
}
PUT user_index/user_type/3/_create
{
"name":"zhang fulai",
"age":28,
"join_date":"2018-03-01",
"phone":"18823450003",
"country":"CN",
"province":"guangdong",
"city":"shenzhen",
"remark":"I'm liaiguo,I like hadoop"
}
Response:
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 25,
"_primary_term" : 3
}
再次执行,则返回错误,创建失败。
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[user_type][3]: version conflict, document already exists (current version [1])",
"index_uuid": "FT3HUBPESD6Yih2o_EddLw",
"shard": "2",
"index": "user_index"
}
],
"type": "version_conflict_engine_exception",
"reason": "[user_type][3]: version conflict, document already exists (current version [1])",
"index_uuid": "FT3HUBPESD6Yih2o_EddLw",
"shard": "2",
"index": "user_index"
},
"status": 409
}
如将_id=3的这个用户age修改为29。
POST user_index/user_type/3/_update
{
"doc": {
"age":29
}
}
更新后_id=3的这个doc的信息如下:
GET user_index/user_type/3
Response:
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "3",
"_version" : 2,
"_seq_no" : 26,
"_primary_term" : 3,
"found" : true,
"_source" : {
"name" : "zhang fulai",
"age" : 29,
"join_date" : "2018-03-01",
"phone" : "18823450003",
"country" : "CN",
"province" : "guangdong",
"city" : "shenzhen",
"remark" : "I'm liaiguo,I like hadoop"
}
}
一次提交增、删、改的文档信息,这种操作的效率减少了请求服务器的网络次数,提高了执行的效率。
PUT user_index/user_type/_bulk
{"index":{"_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"index":{"_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_id":"1"}}
{"update":{"_id":"2"}}
{"doc":{"age":"25"}}
Response:
{
"took" : 19,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "user_index",
"_type" : "user_type",
"_id" : "4",
"_version" : 7,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 33,
"_primary_term" : 3,
"status" : 200
}
},
{
"index" : {
"_index" : "user_index",
"_type" : "user_type",
"_id" : "5",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 34,
"_primary_term" : 3,
"status" : 201
}
},
{
"delete" : {
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_version" : 10,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 27,
"_primary_term" : 3,
"status" : 200
}
},
{
"update" : {
"_index" : "user_index",
"_type" : "user_type",
"_id" : "2",
"_version" : 8,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 35,
"_primary_term" : 3,
"status" : 200
}
}
]
}
另外,如果一次提交的_bulk的参数不在同一个index下,在每一个参数体里面指定index和type就可以。
PUT _bulk
{"create":{"_index":"user_index","_type":"user_type","_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"create":{"_index":"user_index","_type":"user_type","_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_index":"user_index","_type":"user_type","_id":"1"}}
{"update":{"_index":"user_index","_type":"user_type","_id":"2"}}
{"doc":{"age":"25"}}
URL地址格式:index/type/_id
GET user_index/user_type/1
URL地址格式:index/type/_mget。API参数是一个docs数组,数组的每个节点定义一个文档的_index、_type、_id元数据。如果你只想检索一个或几个确定的字段,也可以定义一个_source。
GET _mget
{
"docs":[{
"_index":"user_index",
"_type":"user_type",
"_id":"1",
"_source":["name","phone"]
},
{
"_index":"user_index",
"_type":"user_type",
"_id":"1",
"_source":["name","phone"]
}]
}
Response:
{
"docs" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_version" : 1,
"_seq_no" : 29,
"_primary_term" : 3,
"found" : true,
"_source" : {
"phone" : "18823450001",
"name" : "chen zhuangyuan"
}
},
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_version" : 1,
"_seq_no" : 29,
"_primary_term" : 3,
"found" : true,
"_source" : {
"phone" : "18823450001",
"name" : "chen zhuangyuan"
}
}
]
}
另外,也可以使用简单的参数查询,通过数组指定文档的_id。
GET user_index/user_type/_mget
{
"ids":["1","2","3","4"]
}
如果没有指定查询参数,则查询索引下的所有文档信息。
GET user_index/user_type/_search
GET user_index/_search
GET _search
GET user_index/user_type/_search
{
"query": {
"match_all": {}
}
}
如搜索索引中包含elasticsearch的所有文档信息。
GET user_index/user_type/_search?q=elasticsearch
因为用户信息中只有remark字段包含了elasticsearch,因此这个查询等价于:
GET user_index/user_type/_search?q=remark:elasticsearch
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"name" : "chen zhuangyuan",
"age" : 27,
"join_date" : "2018-01-01",
"phone" : "18823450001",
"country" : "CN",
"province" : "guangdong",
"city" : "guangzhou",
"remark" : "I'm zhuangyuan,I like elasticsearch"
}
}
]
}
}
如搜索索引中包含elasticsearch的所有文档信息。
GET user_index/user_type/_search
{
"query": {
"term": {
"remark": "elasticsearch"
}
}
}
GET user_index/user_type/_search
{
"query": {
"terms": {
"remark": [
"hadoop",
"spark"
]
}
}
}
通过from和size参数,可以实现分页查询。from表示从第几条开始取,size 表示最多取多少条。from默认值是0,size默认值是10。
GET user_index/user_type/_search
{
"query": {
"match": {
"remark":"spark"
}
},
"from": 0,
"size": 1
}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.49917623,
"hits" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "2",
"_score" : 0.49917623,
"_source" : {
"name" : "li aiguo",
"age" : "25",
"join_date" : "2018-02-01",
"phone" : "18823450002",
"country" : "CN",
"province" : "guangdong",
"city" : "shenzhen",
"remark" : "I'm liaiguo,I like spark"
}
}
]
}
}
实现按照指定一个或多个字段进行排序。默认请求下,搜索结果会按照_score的得分进行排序。
GET user_index/user_type/_search
{
"query": {
"match": {
"remark":"spark"
}
},
"sort": [
{
"age": {
"order": "asc"
},
"province": {
"order": "asc"
}
}
]
}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "li aiguo",
"age" : "25",
"join_date" : "2018-02-01",
"phone" : "18823450002",
"country" : "CN",
"province" : "guangdong",
"city" : "shenzhen",
"remark" : "I'm liaiguo,I like spark"
},
"sort" : [
25,
"guangdong"
]
},
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "5",
"_score" : null,
"_source" : {
"name" : "zhao mingming",
"age" : 26,
"phone" : "18823450005",
"country" : "CN",
"province" : "shanghai",
"city" : "shanghaishi",
"remark" : "I.m from shanghai,I like spark"
},
"sort" : [
26,
"shanghai"
]
}
]
}
}
如搜索用户索引中age大于等于27且小于等于30的所有用户信息,并且结果按照年龄升序排序。
GET user_index/user_type/_search
{
"query": {
"range": {
"age": {
"gte": 27,
"lte": 30
}
}
}
, "sort": [
{
"age": {
"order": "asc"
}
}
]
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "chen zhuangyuan",
"age" : 27,
"join_date" : "2018-01-01",
"phone" : "18823450001",
"country" : "CN",
"province" : "guangdong",
"city" : "guangzhou",
"remark" : "I'm zhuangyuan,I like elasticsearch"
},
"sort" : [
27
]
},
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "zhang fulai",
"age" : 29,
"join_date" : "2018-03-01",
"phone" : "18823450003",
"country" : "CN",
"province" : "guangdong",
"city" : "shenzhen",
"remark" : "I'm liaiguo,I like hadoop"
},
"sort" : [
29
]
}
]
}
}
GET user_index/user_type/_count
Response:
{
"count" : 5,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
}
}
在项目的实际开发中,基本都是组合多条件查询来满足实际的需求。elasticsearch提供bool来实现这种需求。主要参数:
must:文档必须匹配这些条件才能被包含进来。
must_not:文档必须不匹配这些条件才能被包含进来。
should:如果满足这些语句中的任意语句将增加_score得分 ,否则无任何影响。它们主要用于修正每个文档的相关性得分。
filter:必须匹配,但它以不评分、过滤模式来进行。这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档。
例如:查询用户信息中,remark必须包含elasticsearch,并且不包含spark的用户信息。
GET user_index/user_type/_search
{
"query": {
"bool": {
"must": {
"match": {
"remark": "elasticsearch"
}
},
"must_not": {
"match":{
"remark":"spark"
}
},
"should": {
"match":{
"age":27
}
}
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.6931472,
"hits" : [
{
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_score" : 1.6931472,
"_source" : {
"name" : "chen zhuangyuan",
"age" : 27,
"join_date" : "2018-01-01",
"phone" : "18823450001",
"country" : "CN",
"province" : "guangdong",
"city" : "guangzhou",
"remark" : "I'm zhuangyuan,I like elasticsearch"
}
}
]
}
}
从elasticsearch的搜索结果显示来看,展现给我们的是一个按score得分从高到底排好序的结果集。_explain用来帮助分析文档的score是如何计算出来的。
GET user_index/user_type/_search
{
"query": {
"match": {
"remark": "elasticsearch"
}
},
"explain": true
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0594962,
"hits" : [
{
"_shard" : "[user_index][0]",
"_node" : "SQYgJvIZR7yqA3TzkURejA",
"_index" : "user_index",
"_type" : "user_type",
"_id" : "6",
"_score" : 1.0594962,
"_source" : {
"name" : "liu haoqiang",
"age" : 27,
"join_date" : "2018-06-01",
"phone" : "18823450006",
"country" : "CN",
"province" : "guangdong",
"city" : "guangzhou",
"remark" : "I'm from guangzhou,I like spark and elasticsearch"
},
"_explanation" : {
"value" : 1.059496,
"description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.059496,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 1.2039728,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 0.88,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 5.25,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
}
},
{
"_shard" : "[user_index][2]",
"_node" : "SQYgJvIZR7yqA3TzkURejA",
"_index" : "user_index",
"_type" : "user_type",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"name" : "chen zhuangyuan",
"age" : 27,
"join_date" : "2018-01-01",
"phone" : "18823450001",
"country" : "CN",
"province" : "guangdong",
"city" : "guangzhou",
"remark" : "I'm zhuangyuan,I like elasticsearch"
},
"_explanation" : {
"value" : 0.6931472,
"description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.6931472,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
}
}
]
}
}
_analyz是Elasticsearch一个非常有用的API,它可以帮助你分析每一个field或者某个analyzer/tokenizer是如何分析和索引一段文字。返回结果字段含义:
token:是一个实际被存储在索引中的词
position:指明词在原文本中是第几个位置出现的
start_offset,end_offset:表示词在原文本中占据的位置。
GET user_index/_analyze
{
"analyzer": "standard",
"text": "I'm from shenzhen,I like elasticsearch,spark and hbase"
}
Response:
{
"tokens" : [
{
"token" : "i'm",
"start_offset" : 0,
"end_offset" : 3,
"type" : "",
"position" : 0
},
{
"token" : "from",
"start_offset" : 4,
"end_offset" : 8,
"type" : "",
"position" : 1
},
{
"token" : "shenzhen",
"start_offset" : 9,
"end_offset" : 17,
"type" : "",
"position" : 2
},
{
"token" : "i",
"start_offset" : 18,
"end_offset" : 19,
"type" : "",
"position" : 3
},
{
"token" : "like",
"start_offset" : 20,
"end_offset" : 24,
"type" : "",
"position" : 4
},
{
"token" : "elasticsearch",
"start_offset" : 25,
"end_offset" : 38,
"type" : "",
"position" : 5
},
{
"token" : "spark",
"start_offset" : 39,
"end_offset" : 44,
"type" : "",
"position" : 6
},
{
"token" : "and",
"start_offset" : 45,
"end_offset" : 48,
"type" : "",
"position" : 7
},
{
"token" : "hbase",
"start_offset" : 49,
"end_offset" : 54,
"type" : "",
"position" : 8
}
]
}