Elasticsearch文档表现及服务API操作

ElasticSearch是一个基于Lucene的搜索服务器,它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。ElasticSearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。

1.Elasticsearch中的文档表现

ElasticSearch是面向文档(document oriented)的,这意味着它可以存储整个对象或文档(document)。然而它不仅仅是存储,还会索引(index)每个文档的内容使之可以被快速搜索。在ElasticSearch中,你可以对文档(而非成行成列的数据)进行索引、搜索、排序、过滤,集合及数据分析。

ElasticSearch使用 JSON作为文档序列化格式。JSON现在已经被大多语言所支持,而且已经成为NoSQL数据领域的标准格式。

ElasticSearch的一个文档不仅包含文档信息,还包含元数据--有关文档的信息。元数据的三大元素分别是:

_index:索引库,类似于关系型数据库里的“数据库”,它是我们存储和索引关联数据的地方。

_type:类型,类似于关系型数据库中的表。可以是大写或小写,不能包含下划线或逗号。

_id:与_index和_type组合时,就可以在ELasticsearch中唯一标识(类似于主键)一个文档。当创建一个文档,你可以自定义_id,也可以让Elasticsearch自动生成。

另外,元数据还包括以下信息:

_uid:文档唯一标识(_type#_id)

_source:文档原始数据

_all:所有字段的连接字符串

2.Elasticsearch中的服务URL

ElasticSearch中常用的的各种服务的URL地址,如下表所示:

功能

URL

请求方式

说明

集群相关

/_cat/health?v

GET

查看集群健康状态

/_cat/nodes?v

GET

查看节点健康状态

/_cat/indices?v

GET

查看集群所有索引

/_cluster/nodes

GET

获得集群中所有节点和信息

/_cluster/health

GET

查看集群健康状态

/_cluster/state

GET

获得集群里的所有信息(集群信息、节点信息、mapping信息等)

节点相关

/_nodes/process

GET

查看file descriptor的相关信息

/_nodes/process/stats

GET

统计节点的资源信息(内存、CPU等)

/_nodes/jvm

GET

获得各节点的虚拟机统计和配置信息

/_nodes/jvm/stats

GET

更加详细的虚拟机信息

/_nodes/http

GET

获得各个节点的http信息(如ip地址)

/_nodes/http/stats

GET

获得各个节点处理http请求的统计情况

/_nodes/thread_pool  

GET

获得各种类型的线程池

/_nodes/thread_pool/stats

GET

获得各种类型的线程池的统计信息

索引相关

/index/_search

GET,POST

索引查询

/index

PUT,DELETE

创建或操作索引

/_aliases

GET,POST

获取或操作索引的别名

/index/_settings

PUT

创建或操作设置(其中number_of_shards不可更改)

/index/_mapping

PUT

创建或操作mapping

/index/_open

POST

打开被关闭的索引

/index/_close

POST

关闭索引

/index/_refresh

POST

刷新索引(使新加内容对搜索可见)

/index/_flush

POST

刷新索引,将变动提交到lucene索引文件中并清空elasticsearch的transaction log

/index/_optimize

POST

优化segement,主要是对索引的segement进行合并

/index/_status

GET

获得索引的状态信息

/index/_segments

GET

获得索引的segments的状态信息

/index/type/id

PUT,POST,DELETE

操作指定文档(增删改查)

/index/type/id/_create

PUT

创建一个文档,如果该文件已经存在,则返回失败

/index/type/id/_update

POST

更新一个文件,如果改文件不存在,则返回失败

/index/type/_bulk

PUT

批量提交数据更新

/index/type/_mget

GTE

批量获取指定_id的文档信息

/index/_explain

GET

不执行实际搜索,而返回解释信息

/index/_analyze

GET

不执行实际搜索,根据输入的参数进行文本分析

3.ElasticSearch的URL操作

3.1 查看集群信息

3.1.1 查看集群健康状态

GET _cat/health?v

Response:

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1565253576 08:39:36  my-es.cluster green           1         1      2   2    0    0        0             0                  -                100.0%

3.1.2 查看节点健康状态

GET _cat/nodes?v

Response:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.1.199           36          63   1    0.00    0.14     0.12 mdi       *      node-1

3.1.3 查看集群所有索引

GET _cat/indices?v

Response:

health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_task_manager aP_xUt7lQD2RdQDuT5ynbw   1   0          2            0     12.5kb         12.5kb
green  open   .kibana_1            -axbsiTwRPmlIVniX-0hOA   1   0          4            1     19.8kb         19.8kb

3.1.4 查看集群健康状态

GET _cluster/health

Response:

{
  "cluster_name" : "my-es.cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 2,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

3.2 查看节点信息

3.2.1 查看file descriptor的相关信息

GET _nodes/process

Response:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "my-es.cluster",
  "nodes" : {
    "SQYgJvIZR7yqA3TzkURejA" : {
      "name" : "node-1",
      "transport_address" : "192.168.1.199:9300",
      "host" : "192.168.1.199",
      "ip" : "192.168.1.199",
      "version" : "6.8.2",
      "build_flavor" : "default",
      "build_type" : "tar",
      "build_hash" : "b506955",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "3954188288",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 9496,
        "mlockall" : false
      }
    }
  }
}

3.2.2 获得各个节点的http信息

GET _nodes/http

Response:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "my-es.cluster",
  "nodes" : {
    "SQYgJvIZR7yqA3TzkURejA" : {
      "name" : "node-1",
      "transport_address" : "192.168.1.199:9300",
      "host" : "192.168.1.199",
      "ip" : "192.168.1.199",
      "version" : "6.8.2",
      "build_flavor" : "default",
      "build_type" : "tar",
      "build_hash" : "b506955",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "3954188288",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "http" : {
        "bound_address" : [
          "192.168.1.199:9200"
        ],
        "publish_address" : "192.168.1.199:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

3.3 索引的相关操作

3.3.1创建一个索引,并设置shards和replicas的个数

PUT user_index
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 1
  }
}

Response:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "user_index"
}

3.3.2 修改索引的replicas数,shards是不能修改

PUT user_index/_settings
{
  "settings": {
    "number_of_replicas": 2
  }
}

Response:

{
  "acknowledged" : true
}

3.3.3 删除索引

DELETE user_index

Response:

{
  "acknowledged" : true
}

3.3.4 添加索引关联别名

POST _aliases
{
  "actions":[{
      "add":{"index":"user_index","alias":"user_alias"}
  }]
}

也可以这样写:

PUT user_index/_aliases
{
  "actions":[{
      "add":{"alias":"user_alias"}
  }]
}

3.3.5 删除索引关联别名

POST _aliases
{
  "actions":[{
      "remove":{"index":"user_index","alias":"user_alias"}
  }]
}

也可以这样写:

PUT user_index/_aliases
{
  "actions":[{
      "remove":{"alias":"user_alias"}
  }]
}

3.3.6查看索引别名信息

GET _aliases

Response:

{
  ".kibana_1" : {
    "aliases" : {
      ".kibana" : { }
    }
  },
  ".kibana_task_manager" : {
    "aliases" : { }
  },
  "user_index" : {
    "aliases" : {
      "user_alias" : { }
    }
  }
}

3.3.7 创建索引mapping

PUT user_index/_mapping/user_type
{
  "dynamic":false,
  "properties": {
    "name":{
      "type": "text",
      "analyzer": "standard"
    },
    "age": {
      "type": "integer"
    },
    "join_date":{
      "type": "date"
    },
    "phone":{
      "type": "keyword"
    },
    "country":{
      "type": "keyword"
    },
    "province":{
      "type": "keyword"
    },
    "city":{
      "type": "keyword"
    },
    "remark":{
      "type": "text",
      "analyzer": "whitespace"
    }
  }
}

3.3.8 添加一个doc文档,指定doc的_id。

如果没有指定_id则Elasticsearch会自动创建一个_id的值

PUT user_index/user_type/1
{
  "name":"chen zhuangyuan",
  "age":27,
  "join_date":"2018-01-01",
  "phone":"18823450001",
  "country":"CN",
  "province":"guangdong",
  "city":"guangzhou",
  "remark":"I'm zhuangyuan,I like elasticsearch"
}

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 20,
  "_primary_term" : 3
}

3.3.9 更新一个doc文档的值,完全替换更新。

这个和新增一个doc一样,如果doc存在则完全更新,doc不存在则创建。

PUT user_index/user_type/1
{
  "name":"chen zhuangyuan",
  "age":28
}

更新后_id=1的这个doc的信息如下,其他字段的值已经被清空了。

GET user_index/user_type/1

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "1",
  "_version" : 8,
  "_seq_no" : 23,
  "_primary_term" : 3,
  "found" : true,
  "_source" : {
    "name" : "chen zhuangyuan",
    "age" : 28
  }
}

3.3.10 创建一个doc文档,当且仅当文档不存在时创建,存在是返回错误。

PUT user_index/user_type/3/_create
{
  "name":"zhang fulai",
  "age":28,
  "join_date":"2018-03-01",
  "phone":"18823450003",
  "country":"CN",
  "province":"guangdong",
  "city":"shenzhen",
  "remark":"I'm liaiguo,I like hadoop"
}

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "3",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 25,
  "_primary_term" : 3
}

再次执行,则返回错误,创建失败。

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[user_type][3]: version conflict, document already exists (current version [1])",
        "index_uuid": "FT3HUBPESD6Yih2o_EddLw",
        "shard": "2",
        "index": "user_index"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[user_type][3]: version conflict, document already exists (current version [1])",
    "index_uuid": "FT3HUBPESD6Yih2o_EddLw",
    "shard": "2",
    "index": "user_index"
  },
  "status": 409
}

3.3.11 更新一个doc文档的指定字段的值。

如将_id=3的这个用户age修改为29。

POST user_index/user_type/3/_update
{
  "doc": {
     "age":29
  }
}

更新后_id=3的这个doc的信息如下:

GET user_index/user_type/3

Response:

{
  "_index" : "user_index",
  "_type" : "user_type",
  "_id" : "3",
  "_version" : 2,
  "_seq_no" : 26,
  "_primary_term" : 3,
  "found" : true,
  "_source" : {
    "name" : "zhang fulai",
    "age" : 29,
    "join_date" : "2018-03-01",
    "phone" : "18823450003",
    "country" : "CN",
    "province" : "guangdong",
    "city" : "shenzhen",
    "remark" : "I'm liaiguo,I like hadoop"
  }
}

3.3.12 批量提交_bulk。

一次提交增、删、改的文档信息,这种操作的效率减少了请求服务器的网络次数,提高了执行的效率。

PUT user_index/user_type/_bulk
{"index":{"_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"index":{"_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_id":"1"}}
{"update":{"_id":"2"}}
{"doc":{"age":"25"}}

Response:

{
  "took" : 19,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "4",
        "_version" : 7,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 33,
        "_primary_term" : 3,
        "status" : 200
      }
    },
    {
      "index" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "5",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 34,
        "_primary_term" : 3,
        "status" : 201
      }
    },
    {
      "delete" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_version" : 10,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 27,
        "_primary_term" : 3,
        "status" : 200
      }
    },
    {
      "update" : {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_version" : 8,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 35,
        "_primary_term" : 3,
        "status" : 200
      }
    }
  ]
}

另外,如果一次提交的_bulk的参数不在同一个index下,在每一个参数体里面指定index和type就可以。

PUT _bulk
{"create":{"_index":"user_index","_type":"user_type","_id":"4"}}
{"name":"guo daming","age":26,"phone":"18823450004","country":"CN","province":"beijing","city":"beijingshi","remark":"I.m from beijing,I like java"}
{"create":{"_index":"user_index","_type":"user_type","_id":"5"}}
{"name":"zhao mingming","age":26,"phone":"18823450005","country":"CN","province":"shanghai","city":"shanghaishi","remark":"I.m from shanghai,I like spark"}
{"delete":{"_index":"user_index","_type":"user_type","_id":"1"}}
{"update":{"_index":"user_index","_type":"user_type","_id":"2"}}
{"doc":{"age":"25"}}

3.3.13 Elasticsearch的文档查询

3.3.13.1 根据文档_id获取。

URL地址格式:index/type/_id

GET user_index/user_type/1

3.3.13.2 批量查询_mget。

URL地址格式:index/type/_mget。API参数是一个docs数组,数组的每个节点定义一个文档的_index、_type、_id元数据。如果你只想检索一个或几个确定的字段,也可以定义一个_source。

GET _mget
{
  "docs":[{
   "_index":"user_index",
   "_type":"user_type",
   "_id":"1",
   "_source":["name","phone"]
  },
  {
  "_index":"user_index",
  "_type":"user_type",
  "_id":"1",
  "_source":["name","phone"]
  }]
}

Response:

{
  "docs" : [
    {
      "_index" : "user_index",
      "_type" : "user_type",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 29,
      "_primary_term" : 3,
      "found" : true,
      "_source" : {
        "phone" : "18823450001",
        "name" : "chen zhuangyuan"
      }
    },
    {
      "_index" : "user_index",
      "_type" : "user_type",
      "_id" : "1",
      "_version" : 1,
      "_seq_no" : 29,
      "_primary_term" : 3,
      "found" : true,
      "_source" : {
        "phone" : "18823450001",
        "name" : "chen zhuangyuan"
      }
    }
  ]
}

另外,也可以使用简单的参数查询,通过数组指定文档的_id。

GET user_index/user_type/_mget
{
  "ids":["1","2","3","4"]
}

3.3.13.3 空查询,即查询所有。

如果没有指定查询参数,则查询索引下的所有文档信息。

GET user_index/user_type/_search
GET user_index/_search
GET _search
GET user_index/user_type/_search
{
  "query": {
    "match_all": {}
  }
}

3.3.13.4 查询字符串搜索。

如搜索索引中包含elasticsearch的所有文档信息。

GET user_index/user_type/_search?q=elasticsearch

因为用户信息中只有remark字段包含了elasticsearch,因此这个查询等价于:

GET user_index/user_type/_search?q=remark:elasticsearch

Response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        }
      }
    ]
  }
}

3.3.13.5 请求参数体搜索。

如搜索索引中包含elasticsearch的所有文档信息。

GET user_index/user_type/_search
{
  "query": {
    "term": {
      "remark": "elasticsearch"
    }
  }
}

GET user_index/user_type/_search
{
 "query": {
   "terms": {
     "remark": [
       "hadoop",
       "spark"
     ]
   }
 }
}

3.3.13.6 分页查询From/Size。

通过from和size参数,可以实现分页查询。from表示从第几条开始取,size 表示最多取多少条。from默认值是0,size默认值是10。

GET user_index/user_type/_search
{
 "query": {
   "match": {
     "remark":"spark"
   }
 },
 "from": 0,
 "size": 1
}

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.49917623,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_score" : 0.49917623,
        "_source" : {
          "name" : "li aiguo",
          "age" : "25",
          "join_date" : "2018-02-01",
          "phone" : "18823450002",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like spark"
        }
      }
    ]
  }
}

3.3.13.7 Sort排序。

实现按照指定一个或多个字段进行排序。默认请求下,搜索结果会按照_score的得分进行排序。

GET user_index/user_type/_search
{
 "query": {
   "match": {
     "remark":"spark"
   }
 },
 "sort": [
   {
     "age": {
       "order": "asc"
     },
     "province": {
       "order": "asc"
     }
   }
 ]
}

Response:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "li aiguo",
          "age" : "25",
          "join_date" : "2018-02-01",
          "phone" : "18823450002",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like spark"
        },
        "sort" : [
          25,
          "guangdong"
        ]
      },
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "zhao mingming",
          "age" : 26,
          "phone" : "18823450005",
          "country" : "CN",
          "province" : "shanghai",
          "city" : "shanghaishi",
          "remark" : "I.m from shanghai,I like spark"
        },
        "sort" : [
          26,
          "shanghai"
        ]
      }
    ]
  }
}

3.3.13.8 范围查询。

如搜索用户索引中age大于等于27且小于等于30的所有用户信息,并且结果按照年龄升序排序。

GET user_index/user_type/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 27,
        "lte": 30
      }
    }
  }
  , "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        },
        "sort" : [
          27
        ]
      },
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "zhang fulai",
          "age" : 29,
          "join_date" : "2018-03-01",
          "phone" : "18823450003",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "shenzhen",
          "remark" : "I'm liaiguo,I like hadoop"
        },

        "sort" : [
          29
        ]
      }
    ]
  }
}

3.3.13.9 查看索引中的所有文档总数。

GET user_index/user_type/_count

Response:

{
  "count" : 5,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  }
}

3.13.19 组合多条件查询

在项目的实际开发中,基本都是组合多条件查询来满足实际的需求。elasticsearch提供bool来实现这种需求。主要参数:

must:文档必须匹配这些条件才能被包含进来。

must_not:文档必须不匹配这些条件才能被包含进来。

should:如果满足这些语句中的任意语句将增加_score得分 ,否则无任何影响。它们主要用于修正每个文档的相关性得分。

filter:必须匹配,但它以不评分、过滤模式来进行。这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档。

例如:查询用户信息中,remark必须包含elasticsearch,并且不包含spark的用户信息。

GET user_index/user_type/_search
{
  "query": {
    "bool": {
      "must":  {
        "match": {
          "remark": "elasticsearch"
        }
      },
      "must_not": {
         "match":{
           "remark":"spark"
         }
      },
      "should": {
        "match":{
          "age":27
        }
      }
    }
  }
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.6931472,
    "hits" : [
      {
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 1.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        }
      }
    ]
  }
}

3.13.20 explian评分分析

      从elasticsearch的搜索结果显示来看,展现给我们的是一个按score得分从高到底排好序的结果集。_explain用来帮助分析文档的score是如何计算出来的。

GET user_index/user_type/_search
{
  "query": {
    "match": {
      "remark": "elasticsearch"
    }
  },
  "explain": true
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0594962,
    "hits" : [
      {
        "_shard" : "[user_index][0]",
        "_node" : "SQYgJvIZR7yqA3TzkURejA",
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "6",
        "_score" : 1.0594962,
        "_source" : {
          "name" : "liu haoqiang",
          "age" : 27,
          "join_date" : "2018-06-01",
          "phone" : "18823450006",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm from guangzhou,I like spark and elasticsearch"
        },
        "_explanation" : {
          "value" : 1.059496,
          "description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 1.059496,
              "description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [
                {
                  "value" : 1.2039728,
                  "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "docFreq",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "docCount",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.88,
                  "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "termFreq=1.0",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "parameter k1",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "parameter b",
                      "details" : [ ]
                    },
                    {
                      "value" : 5.25,
                      "description" : "avgFieldLength",
                      "details" : [ ]
                    },
                    {
                      "value" : 7.0,
                      "description" : "fieldLength",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[user_index][2]",
        "_node" : "SQYgJvIZR7yqA3TzkURejA",
        "_index" : "user_index",
        "_type" : "user_type",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "chen zhuangyuan",
          "age" : 27,
          "join_date" : "2018-01-01",
          "phone" : "18823450001",
          "country" : "CN",
          "province" : "guangdong",
          "city" : "guangzhou",
          "remark" : "I'm zhuangyuan,I like elasticsearch"
        },
        "_explanation" : {
          "value" : 0.6931472,
          "description" : "weight(remark:elasticsearch in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.6931472,
              "description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
              "details" : [
                {
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "docFreq",
                      "details" : [ ]
                    },
                    {
                      "value" : 2.0,
                      "description" : "docCount",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 1.0,
                  "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "termFreq=1.0",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "parameter k1",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "parameter b",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "avgFieldLength",
                      "details" : [ ]
                    },
                    {
                      "value" : 4.0,
                      "description" : "fieldLength",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

3.3.21 _analyze分词分析

      _analyz是Elasticsearch一个非常有用的API,它可以帮助你分析每一个field或者某个analyzer/tokenizer是如何分析和索引一段文字。返回结果字段含义:

token:是一个实际被存储在索引中的词

position:指明词在原文本中是第几个位置出现的

start_offset,end_offset:表示词在原文本中占据的位置。

GET user_index/_analyze
{
  "analyzer": "standard",
  "text": "I'm from shenzhen,I like elasticsearch,spark and hbase"
}

Response:

{
  "tokens" : [
    {
      "token" : "i'm",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "",
      "position" : 0
    },
    {
      "token" : "from",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "",
      "position" : 1
    },
    {
      "token" : "shenzhen",
      "start_offset" : 9,
      "end_offset" : 17,
      "type" : "",
      "position" : 2
    },
    {
      "token" : "i",
      "start_offset" : 18,
      "end_offset" : 19,
      "type" : "",
      "position" : 3
    },
    {
      "token" : "like",
      "start_offset" : 20,
      "end_offset" : 24,
      "type" : "",
      "position" : 4
    },
    {
      "token" : "elasticsearch",
      "start_offset" : 25,
      "end_offset" : 38,
      "type" : "",
      "position" : 5
    },
    {
      "token" : "spark",
      "start_offset" : 39,
      "end_offset" : 44,
      "type" : "",
      "position" : 6
    },
    {
      "token" : "and",
      "start_offset" : 45,
      "end_offset" : 48,
      "type" : "",
      "position" : 7
    },
    {
      "token" : "hbase",
      "start_offset" : 49,
      "end_offset" : 54,
      "type" : "",
      "position" : 8
    }
  ]
}

 

你可能感兴趣的:(elasticsearch)