Elasticsearch那些事-基础操作

Elasticsearch也是基于Lucene的全文检索库，本质也是存储数据，很多概念与MySQL类似的。

基本概念

索引库（ indices ）

索引 indices 类似数据库 Datebases， indices 是 index 的复数，代表许多的索引。

类型( type )

类型是模拟 mysql 中的 table 概念，一个索引库下可以有不同类型的索引，比如商品索引，订单索引，其数据格式不同。不过这会导致索引库混乱，因此未来版本中会移除这个概念。

文档 ( document )

存入索引库原始的数据，比如每一条商品信息，就是一个文档。

字段( field )

文档中的属性

映射配置( mappings )

字段的数据类型、属性、是否索引、是否存储等特性

分片( shard )

数据拆分后的各个部分

副本( replica )

每个分片的复制

操作

Elasticsearch 采用 Rest 风格 API ，因此其 API 就是一次 http 请求，你可以用任何工具发起 http 请求。

创建索引

在 kibana 控制面板 dev tools 下测试：

image.png

PUT ohome
{
  "settings":{
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

number_of_shards：分片数量，number_of_replicas：副本数量

查看索引

GET ohome
//得到
{
  "ohome": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "creation_date": "1575645819256",
        "number_of_shards": "1",
        "number_of_replicas": "0",
        "uuid": "Y8zm3itmQdqX4pN83VNPWg",
        "version": {
          "created": "6020499"
        },
        "provided_name": "ohome"
      }
    }
  }
}

也可以用 GET * 查询所有索引库

删除索引

DELETE ohome1
//得到
{
  "acknowledged": true
}

创建映射字段

创建映射字段相当于创建表结构

PUT ohome/_mapping/goods
{
  "properties": {
    "title":{
      "type": "text",
      "analyzer": "ik_max_word",
      "index": true,
      "store": true
    },
    "images":{
      "type": "keyword",
      "index": false
    },
    "price":{
      "type": "float"
    }
  }
}

类型名称：就是前面将的 type 的概念，类似于数据库中的不同表；
properties：就是存放字段的；
type：类型，可以是 text、keyword 、long、short、date、integer、object 等，text、keyword 都表示字符串，text 可以分词，keyword 不可以分词；
index：是否索引，默认为 true ，表示是否可以搜索
store：是否存储，默认为 false ，表示是否额外存储，在Elasticsearch中，即便store设置为false，也可以搜索到结果，原因是Elasticsearch在创建文档索引时，会将文档中的原始数据备份，保存到一个叫做_source的属性中。而且我们可以通过过滤_source来选择哪些要显示，哪些不显示。
analyzer：分词器，这里的 ik_max_word 即使用ik分词器

查看映射

GET ohome/_mapping/goods
//得到
{
  "ohome": {
    "mappings": {
      "goods": {
        "properties": {
          "images": {
            "type": "keyword",
            "index": false
          },
          "price": {
            "type": "float"
          },
          "title": {
            "type": "text",
            "store": true,
            "analyzer": "ik_max_word"
          }
        }
      }
    }
  }
}

新增数据

POST ohome/goods
{
  "title":"小米",
  "images":"http://xxxxxxx",
  "price":2000.00
}

查看数据

GET _search
{
    "query":{
        "match_all":{}
    }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "ohome",
        "_type": "goods",
        "_id": "eeEh3G4Bz21prctaJOm1",
        "_score": 1,
        "_source": {
          "title": "小米",
          "images": "http://xxxxxxx",
          "price": 2000
        }
      },
      {
        "_index": "ohome",
        "_type": "goods",
        "_id": "euEj3G4Bz21prctaGunA",
        "_score": 1,
        "_source": {
          "title": "小米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        }
      }
    ]
  }
}

_source：源文档信息，所有的数据都在里面；
_id：这条文档的唯一标示。

修改数据

PUT /ohome/goods/eeEh3G4Bz21prctaJOm1
{
    "title":"超大米手机",
    "images":"http://image.leyou.com/12479122.jpg",
    "price":3899.00
}

eeEh3G4Bz21prctaJOm1 为文档 id

删除数据

DELETE /ohome/goods/eeEh3G4Bz21prctaJOm1

查询基本语法

GET /索引库名/_search
{
    "query":{
        "查询类型":{
            "查询条件":"查询条件值"
        }
    }
}

查询类型，match_all， match，term ， range 等等。

查询所有（match_all)

GET /ohome/_search
{
    "query":{
        "match_all": {}
    }
}

匹配查询( match )

match类型查询，会把查询条件进行分词，然后进行查询,多个词条之间是or的关系。

GET /ohome/_search
{
    "query":{
        "match": {
          "title":"小米"
        }
    }
}

某些情况下，我们需要更精确查找，我们希望这个关系变成 and ，可以这样做：

GET /ohome/_search
{
    "query":{
        "match": {
          "title":{
            "query": "小米",
            "operator": "and"
          }
        }
    }
}

match 查询支持 minimum_should_match 最小匹配参数，这让我们可以指定必须匹配的词项数用来表示一个文档是否相关

GET /ohome/_search
{
    "query":{
        "match": {
          "title":{
            "query": "小米手机hh",
            "minimum_should_match": "75%"
          }
        }
    }
}

多字段查询（multi_match）

GET /ohome/_search
{
    "query":{
        "multi_match": {
          "query": "小米",
          "fields": ["title","subTitle"]
        }
    }
}

词条匹配( term )

term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些未分词的字符串

GET /ohome/_search
{
    "query":{
        "term": {
         "price": 2699
          // "price":[2699.00,2899.00,3899.00]
        }
    }
}

结果过滤

默认情况下，elasticsearch 在搜索的结果中，会把文档中保存在_source的所有字段都返回。如果我们只想获取其中的部分字段，我们可以添加_source的过滤。

GET /ohome/_search
{
  "_source": ["title"], 
    "query":{
        "term": {
         "price": 2699
        }
    }
}

也可以指定includes和excludes

includes：来指定想要显示的字段
excludes：来指定不想要显示的字段

GET /ohome/_search
{
  "_source": {
    "excludes": ["images"]
  }, 
    "query":{
        "term": {
         "price": 2699
        }
    }
}

布尔组合（bool)

bool把各种其它查询通过must（与）、must_not（非）、should（或）的方式进行组合

GET /ohome/_search
{
    "query":{
        "bool":{
            "must":     { "match": { "title": "大米" }},
            "must_not": { "match": { "title":  "电视" }},
            "should":   { "match": { "title": "手机" }}
        }
    }
}

范围查询(range)

range 查询找出那些落在指定区间内的数字或者时间。

gt：大于
gte：大于等于
lt：小于
lte：小于等于

GET /ohome/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000.0,
                "lt":   2800.00
            }
        }
    }
}

模糊查询(fuzzy)

GET /ohome/_search
{
  "query": {
    "fuzzy": {
      "title": "xiaomi"
    }
  }
}

过滤(filter)

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式：

GET /ohome/_search
{
    "query":{
        "bool":{
            "must":{ "match": { "title": "小米手机" }},
            "filter":{
                "range":{"price":{"gt":2000.00,"lt":3800.00}}
            }
        }
    }
}

排序

可以多字段

GET /ohome/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

聚合 aggregations

聚合可以让我们极其方便的实现对数据的统计、分析，实现这些统计功能的比数据库的sql要方便的多，而且查询速度非常快，可以实现实时搜索效果。

Elasticsearch 中的聚合，包含多种类型，最常用的两种，一个叫桶，一个叫度量：

桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶，例如我们根据国籍对人划分，可以得到中国桶、英国桶，日本桶……

Elasticsearch 中提供的划分桶的方式有很多：

Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
Histogram Aggregation：根据数值阶梯分组，与日期类似
Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
……

bucket aggregations 只负责对数据进行分组，并不进行计算。

度量（metrics），分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量。

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同时返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前几
Value Count Aggregation：求总数
……

GET /cars/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }，
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                }
            }
        }
    }
}

size：查询条数，这里设置为0，因为我们不关心搜索到的数据，只关心聚合结果，提高效率
aggs：声明这是一个聚合查询，是aggregations的缩写，我们在上一个aggs(popular_colors)中添加新的aggs。可见度量也是一个聚合；
popular_colors：给这次聚合起一个名字，任意。
terms：划分桶的方式，这里是根据词条划分
field：划分桶的字段，度量中是运算的字段
avg：度量的类型，这里是求平均值

阶梯分桶Histogram
我们对汽车的价格进行分组，指定间隔interval为5000：

GET /cars/_search
{
  "size":0,
  "aggs":{
    "price":{
      "histogram": {
        "field": "price",
        "interval": 5000
      }
    }
  }
}