ElasticSearch学习笔记

ElasticSearch(以下简称ES)是一个基于Apache Lucene(TM)的开源搜索引擎。无论在开源还是专有领域，Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。其使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能，但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单。

一、安装与启动(windows)

首先在官网下载zip包，下载地址：https://www.elastic.co/downloads/elasticsearch#ga-release，下载后解压，启动bin目录下的elasticsearch.bat，ElasticSearch便启动了。这时在浏览器中输入网址http://localhost:9200/?pretty，可以看到一个Json（如下），显示的是ES的版本等信息。

{
    "name": "x62D3ht",
    "cluster_name": "elasticsearch",
    "cluster_uuid": "yDPE_WTBQE6Hp5ZBydgjSw",
    "version": {
        "number": "5.6.2",
        "build_hash": "57e20f3",
        "build_date": "2017-09-23T13:16:45.703Z",
        "build_snapshot": false,
        "lucene_version": "6.6.1"
    },
    "tagline": "You Know, for Search"
}

二、索引(index)与查询

在Elasticsearch中存储数据的行为就叫做索引(indexing)，不过在索引之前，我们需要明确数据应该存储在哪里。在Elasticsearch中，文档归属于一种类型(type),而这些类型存在于索引(index)中，我们可以拿ES和传统关系型数据库做一个对比：

传统数据库	ES	说明
Databases	Indices	数据库
Tables	Types	表
Rows	Documents	记录
Columns	Fields	字段

Elasticsearch集群可以包含多个索引(indices)（数据库），每一个索引可以包含多个类型(types)（表），每一个类型包含多个文档(documents)（行），然后每个文档包含多个字段(Fields)（列）。

在这里要特别说明一下索引（index）在ES中的不同含义。

索引（名词）如上文所述，一个索引(index)就像是传统关系数据库中的数据库，它是相关文档存储的地方，index的复数是indices 或indexes。

索引（动词）「索引一个文档」表示把一个文档存储到索引（名词）里，以便它可以被检索或者查询。这很像SQL中的INSERT关键字，差别是，如果文档已经存在，新的文档将覆盖旧的文档。

倒排索引传统数据库为特定列增加一个索引，例如B-Tree索引来加速检索。Elasticsearch和Lucene使用一种叫做倒排索引(inverted index)的数据结构来达到相同目的。

索引

接下来我们通过建立一个员工目录，并对其进行索引和搜索（可以使用Postman发送请求），首先我们要创建员工目录，大概有如下操作：

为每个员工的文档(document)建立索引，每个文档包含了相应员工的所有信息。

每个文档的类型为employee。

employee类型归属于索引megacorp。

megacorp索引存储在Elasticsearch集群中。

我们只需要一个命令就能完成这些操作：

在Postman中发送PUT请求：localhost:9200//megacorp/employee/1
在body中加入如下参数(Json格式)：

{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

image.png

发送请求后就会将一条员工记录加入到ES中，在Postman中发送GET请求：localhost:9200//megacorp/employee/1就会查询到这一条记录。返回信息如下：

{
    "_index": "megacorp",
    "_type": "employee",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "first_name": "John",
        "last_name": "Smith",
        "age": 25,
        "about": "I love to go rock climbing",
        "interests": [
            "sports",
            "music"
        ]
    }
}

接下来，让我们在目录中加入更多员工信息：
发送PUT请求：localhost:9200//megacorp/employee/2，并设置body索引第二个员工文档。

{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

发送PUT请求：localhost:9200//megacorp/employee/3，并设置body索引第三个员工文档。

{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

搜索

上边我们录入了3条员工信息，可以通过如下请求搜索全部员工。
发送GET请求：localhost:9200//megacorp/employee/_search
返回信息如下：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "first_name": "Douglas",
                    "last_name": "Fir",
                    "age": 35,
                    "about": "I like to build cabinets",
                    "interests": [
                        "forestry"
                    ]
                }
            }
        ]
    }
}

可以看到我们使用_search代替原来的文档id,响应内容的数组中包含所有的3个文档，默认情况下此搜索会返回前10条结果。

查询字符串

查询字符串就像传递URL参数一样去传递查询语句，比如查询last_name为"Smith"的文档，可以发送GET请求：localhost:9200//megacorp/employee/_search?q=last_name:Smith
返回的结果如下：

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            },
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "John",
                    "last_name": "Smith",
                    "age": 25,
                    "about": "I love to go rock climbing",
                    "interests": [
                        "sports",
                        "music"
                    ]
                }
            }
        ]
    }
}

DSL语句查询

查询字符串便于通过命令进行特定的查询，但是也有一定的局限性，ES提供的更加强大的查询语言（DSL查询），DSL是以Json作为请求体进行查询，这样上面的查询可以使用如下方法：
发送POST请求：localhost:9200//megacorp/employee/_search，并设置body参数：

{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

返回的结果与之前用查询字符串查询的结果一样，

更复杂的搜索

eg.查询last_name为"smith" 并且年龄大于30的员工，发送POST请求：localhost:9200//megacorp/employee/_search，设置如下body参数：

{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {"gt": 30}
        }
      },
      "must": {
        "match": {"last_name": "Smith"}
      }
    }
  }
}

响应的内容为：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "megacorp",
                "_type": "employee",
                "_id": "2",
                "_score": 0.2876821,
                "_source": {
                    "first_name": "Jane",
                    "last_name": "Smith",
                    "age": 32,
                    "about": "I like to collect rock albums",
                    "interests": [
                        "music"
                    ]
                }
            }
        ]
    }
}

ElasticSearch学习笔记

一、安装与启动(windows)

二、索引(index)与查询

在这里要特别说明一下索引（index）在ES中的不同含义。

索引

搜索

查询字符串

DSL语句查询

更复杂的搜索

你可能感兴趣的:(ElasticSearch学习笔记)