ElasticSearch_笔记

索引:

在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前,我们需要明确数据应该存储在哪里。文档归属于一种类型(type),而这些类型存在于索引(index)中
数据库与ElasticSearch对比图:
DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch集群可以包含多个索引(indices)(数据库),每一个索引可以包含多个类型(types)(表),每一个类型包含多个文档(documents)(行),然后每个文档包含多个字段(Fields)(列)。

索引含义的区分:

你可能已经注意到索引(index)这个词在Elasticsearch中有着不同的含义,所以有必要在此做一下区分:
索引(名词) 如上文所述,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。
索引(动词) 「索引一个文档」表示把一个文档存储到索引(名词)里,以便它可以被检索或者查询。这很像SQL中的INSERT关键字,差别是,如果文档已经存在,新的文档将覆盖旧的文档。
倒排索引 传统数据库为特定列增加一个索引,例如B-Tree索引来加速检索。Elasticsearch和Lucene使用一种叫做倒排索引(inverted index)的数据结构来达到相同目的。

默认情况下,文档中的所有字段都会被索引(拥有一个倒排索引),只有这样他们才是可被搜索的。

为每个员工的文档(document)建立索引,每个文档包含了相应员工的所有信息。
每个文档的类型为employee。
employee类型归属于索引megacorp。
megacorp索引存储在Elasticsearch集群中。
我们能通过一个命令执行完成的操作:

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

我们看到path:/megacorp/employee/1包含三部分信息:
megacorp 索引名
employee 类型名
1 这个员工的ID
请求实体(JSON文档),包含了这个员工的所有信息。他的名字叫“John Smith”,25岁,喜欢攀岩。

注意:

如果让ID自动增长,可以像下面这样:
(原来是把文档存储到某个ID对应的空间,现在是把这个文档添加到某个_type下)

POST /website/blog/
{
  "title": "My second blog entry",
  "text":  "Still trying this out...",
  "date":  "2014/01/01"
}

检索:

GET /website/blog/123?pretty 

pretty的意思是对输出的格式进行美化。
返回:

{
  "_index": "website",
  "_type": "blog",
  "_id": "123",
  "_version": 2,
  "found": true,
  "_source": {
    "title": "My first blog entry",
    "text": "Just trying this out...",
    "date": "2014/01/01"
  }
}
GET /website/blog/123?_source=title,text 

对_source字段里的属性进行过滤。
返回:

{
  "_index": "website",
  "_type": "blog",
  "_id": "123",
  "_version": 2,
  "found": true,
  "_source": {
    "text": "Just trying this out...",
    "title": "My first blog entry"
  }
}
GET /website/blog/123/_source 

只查询_source字段。
返回:

{
  "title": "My first blog entry",
  "text": "Just trying this out...",
  "date": "2014/01/01"
}

=========================================================================================

GET /website/blog/_search 

检索全部信息
返回:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "123",
        "_score": 1,
        "_source": {
          "title": "My first blog entry",
          "text": "Just trying this out...",
          "date": "2014/01/01"
        }
      }
    ]
  }
}
GET /megacorp/employee/_search?q=first_name:Jane 

检索全部信息并且first_name是Jane。
返回:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

使用DSL语句查询

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

返回:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

全文检索,会检索about属性里包含love字符的员工。

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "love"
        }
    }
}

返回:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.26742277,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.26742277,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

短语搜索

目前我们可以在字段中搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases)。例如我们想要查询同时包含"rock"和"climbing"(并且是相邻的)的员工记录。
要做到这个,我们只要将match查询变更为match_phrase查询即可:

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

返回:

{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.23013961,
      "hits": [
         {
            ...
            "_score":         0.23013961,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         }
      ]
   }
}

高亮我们的搜索

很多应用喜欢从每个搜索结果中高亮(highlight)匹配到的关键字,这样用户可以知道为什么这些文档和查询相匹配。在Elasticsearch中高亮片段是非常容易的。
让我们在之前的语句上增加highlight参数:

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

返回:

{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.23013961,
      "hits": [
         {
            ...
            "_score":         0.23013961,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            },
            "highlight": {
               "about": [
                  "I love to go rock climbing" <1>
               ]
            }
         }
      ]
   }
}

你可能感兴趣的:(ElasticSearch_笔记)