Elastic Stack
- Beats 数据采集
- LogStash 数据转换
- ElasticSearch 存储/索引/聚合
- Kibana 数据可视化
节点角色
节点角色 | 配置+默认值 |
---|---|
Master Eligible (主节点候选) | node.master=true |
Data | node.data=true |
Machine Learning | node.ml=true && xpack.ml.enabled |
Ingest(预处理) | node.ingest=true |
Coordinating only(只用于协同) | 无 除 xpack 设置其他值全为 false; 各种节点都包含协同功能, 不能禁用; 用来处理收集数据. |
- 任何一个节点都了解集群中其他节点的节点, 可以转发请求到合适的节点;
- 默认配置下, 任何节点都可以处理 HTTP查询 和 传输数据;
Data Node 持有:
- Shards 数据
- 集群元数据+索引元数据
Master Eligible Node 持有:
- 集群元数据+索引元数据
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/modules-node.html
分片
Primary Shard 主分片 | Replica Shard 副本分片 |
---|---|
For: 水平扩展. 各个主分片被分配到多台机器 | For: 高可用. 为主分片拷贝. |
主分片数在索引创建时确定, 之后不能修改. | 可动态增减副本分片, 来调整可用性和读取性能. |
- 一个分片即一个 Lucene 实例;
状态查看 API
# 查看集群状态
GET /_cluster/health
# 查看节点列表
GET /_cat/nodes?v
# 查看分片列表
GET /_cat/shards?v
# 查看索引概况列表
GET /_cat/indices?v
# 查看改索引概况
GET /_cat/indices/movies?v
# 查看该索引的 Setting 和 Mapping
GET /movies
CRUD
- 创建文档时让 ES 自动生成 ID, 需要使用 POST, PUT 必须指明 ID.
# get: 读取 by ID
GET /users/_doc/123
GET /users/_doc/234
# create: 只新建(ID不存在的文档)
POST /users/_create/234
{
"firstName": "BD",
"lastName": "C",
"tags": ["boy", "engineer"]
}
# index: 新建 or 全量覆盖
PUT /users/_doc/123
{
"firstName": "BD",
"lastName": "C",
"tags": ["boy", "engineer"]
}
# update: 只部分修改(ID已存在的文档)
POST /users/_update/123
{
"doc": {
"firstName": "Focus"
}
}
# delete: 删除 by ID
DELETE /users/_doc/123
DELETE /users/_doc/234
# 批量操作 (出错继续执行)
POST /_bulk
{"delete": {"_index": "users", "_id": 123}}
{"create": {"_index": "users", "_id": 123}}
{"tags": ["boy", "engineer"]}
{"index": {"_index": "users", "_id": 123}}
{"firstName": "bulk FN", "lastName": "bulk LN"}
{"update": {"_index": "users", "_id": 123}}
{"doc": {"tags": ["PHP", "Ruby", "Java"]}}
# 批量读取
GET /_mget
{
"docs": [
{
"_index": "users",
"_id": 123
},
{
"_index": "users",
"_id": 234
}
]
}
倒排索引
索引 | 倒排索引 | |
---|---|---|
key | id: 123 |
term: Ming |
value | name: Xiao Ming Ming, age: 19 |
id: 123, term_frequency: 2, position: [5, 10], offset: [[5, 9], [10, 14]] |
分词器 Analyzer
Character Filter
=> Tokenizer
=> Token Filter
查询
Path | Range |
---|---|
/_search | 全部索引 |
/index1/_search | index1 |
/index1,index2/_search | index1 和 index2 |
/index*/_search | 以 index 开头的索引 |
基于 URI 的查询
# 在全部索引的任意字段搜索 Jack
GET /_search?q=Jack
# 在全部索引的title字段搜索 Jack
GET /_search?q=title:Jack
# 在 users 索引的任意字段搜索 Jack
GET /users/_search?q=Jack
# 在 users 索引的 firstName 字段搜索 Jack
GET /users/_search?q=firstName:Jack
# 在全部索引上模糊查询
GET /_search?q=ExhalX~1
# 查询要求包含 Waiting 或 Exhale
GET /_search?q=Waiting Exhale
# 查询要求包含 Waiting 和 Exhale, 并且要求 Exhale 的位置紧跟 Waiting 后面
# 位置(position)相邻, 不要求中间空格的数量
GET /_search?q="Waiting Exhale"
基于 Request Body 的查询
# 查询所有
GET /movies/_search
{
"query": {
"match_all": {}
}
}
# `_source` 选定需要返回的字段;
# `from` `size` 游标;
# script_fields 脚本
GET /movies/_search
{
"_source": [
"id",
"title"
],
"from": 0,
"size": 5,
"query": {
"match_all": {}
},
"script_fields": {
"beautiful_title": {
"script": {
"lang": "painless",
"source": "'' + doc['title.keyword'].value + ''"
}
}
}
}
# 短语匹配
# 先将文本分词, 拆成 term, (包含位置顺序)
# 要求 term 全部能搜索到, 并且位置顺序一致 (空格不要求)
GET /movies/_search
{
"query": {
"match_phrase": {
"title": "Waiting to Exhale"
}
}
}
# slop 让 term 的顺序不再严格
# slop 为 2, 颠倒临近词位置可查
# slop 为 3, 中间有连接词时颠倒位置可查
GET /movies/_search
{
"query": {
"match_phrase": {
"title": {
"query": "Exhale Waiting",
"slop": 3,
"analyzer": "simple"
}
}
}
}
Information Retrieval
- precision 查准率, 尽可能少返回无关文档 (返回中判断正确的个数/返回的总个数)
- recall 查全率, 尽量返回更多的相关文档 (返回中判断正确的个数/所有正确的个数)
- ranking 按相关度排序
true or false, positive or negative
- true positive: 判断正确, 被返回
- false positive: 判断错误, 被返回
- true negative: 判断正确, 没有返回
- false negative: 判断错误, 没有返回
Mapping
-- 简单类型:
- Text / Keyword
- Date
- Integer / Floating
- Boolean
- IPv4 / IPv6
-- 复杂类型:
- 对象类型
- 嵌套类型
-- 特殊类型:
- Geo point
- Geo shape
- Percolator
_doc Mapping Dynamic | 文档可搜索 | 字段可索引 | Mapping 可更新 |
---|---|---|---|
true default | √ | √ | √ |
false | √ | × | × |
strict | × | × | × |
- 对于已存在字段, 不能修改(除非 ReIndex);
- Mapping Dynamic 为
false
时, 新增字段会保存在_source
中, 没有Mapping的更新, 不能被搜索; - 是否能被搜索取决于 Mapping 否是有更新;
DELETE /demo
PUT /demo/_doc/1
{
"name": "Xiao Ming"
}
GET /demo/_doc/1
GET /demo/_mapping
# 修改 Mapping 的 Dynamic 属性
POST /demo/_mapping
{
"dynamic": "strict"
}
GET /demo/_mapping
# 尝试插入新的字段 (报错)
POST /demo/_update/1
{
"doc": {
"formats": [
"json",
"xml"
]
}
}
# 修改 Mapping 的 Dynamic 属性
POST /demo/_mapping
{
"dynamic": "false"
}
GET /demo/_mapping
# 尝试插入新的字段
# 新字段存入 _source, mapping 没有修改
POST /demo/_update/1
{
"doc": {
"name": "Xiao Gang",
"tags": [
"Ruby",
"Java"
]
}
}
GET /demo/_doc/1
# mapping 中存在的字段可以被索引
GET /demo/_search
{
"query": {
"match": {
"name": "xiao"
}
}
}
# 新字段不能被索引(Mapping 不存在该字段)
GET /demo/_search
{
"query": {
"match": {
"tags": "ruby"
}
}
}
Mapping Definition
"index": false
, 不创建倒排索引, 不能被索引.
创建索引 mapping, 只能使用 PUT /index_name {"mappings": {}}
.
DELETE /demo
PUT /demo
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"mobile": {
"type": "text",
"index": false
}
}
}
}
GET /demo/_mapping
POST /demo/_doc/1
{
"name": "Xiao Ming",
"mobile": "021-1234567"
}
GET /demo/_search
{
"query": {
"match": {
"name": "Xiao"
}
}
}
# Cannot search on field [mobile] since it is not indexed.
GET /demo/_search
{
"query": {
"match": {
"mobile": "021"
}
}
}
Index Options 级别
级别 | 包含内容 |
---|---|
docs | doc id |
freqs | doc id, term frequencies |
positions | doc id, term frequencies, term position |
offsets | doc id, term frequencies, term position, character offsets |
-
text
默认是 positions 级别, 其他默认为 docs
NULL Value
elasticsearch 本身不能存储空值, 默认情况下, null
和 []
都被认为是空值.
null_value
可以把空值替换为指定的"空值替身"进行索引.
_source
依旧显示原值, "空值替身" 仅在索引时有效.
"空值替身" 的数据类型要跟属性匹配.
text
不能应用该属性,keyword
可以.
DELETE /demo
PUT /demo
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"null_value": "NULLVALUE"
}
}
},
"age": {
"type": "integer",
"null_value": 18
}
}
}
}
GET /demo/_mapping
POST /demo/_doc/1
{
"name": null,
"age": null
}
GET /demo/_doc/1
GET /demo/_search
{
"query": {
"match": {
"name.raw": "NULLVALUE"
}
}
}
copy_to
_all
在新版本中已经弃用, _copy_to
可以实现类似的功能.
目标值不会出现在 _source
中.
copy_to
的目标本身也可以存值.
源值更新了, 目标索引效果会跟随更新.
DELETE /demo
PUT /demo
{
"mappings": {
"properties": {
"first": {
"type": "text",
"copy_to": "full"
},
"second": {
"type": "text",
"copy_to": "full"
},
"full": {
"type": "text"
}
}
}
}
GET /demo/_mapping
POST /demo/_doc/1
{
"full": "hi",
"first": "HELLO",
"second": "WORLD"
}
GET /demo/_doc/1
GET /demo/_search
{
"query": {
"match": {
"full": "hello"
}
}
}
POST /demo/_update/1
{
"doc": {
"first": "ruby",
"second": "java"
}
}
GET /demo/_doc/1
GET /demo/_search
{
"query": {
"match": {
"full": "hello"
}
}
}
fields
类型自动映射时, 会把字符串自动设置为:
即利用了多字段特性, 在一个属性上使用多种类型.
POST /demo
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Analyzer
处理顺序: Character Filter
=> Tokenizer
=> Token Filter
Character Filter:
负责对源文本的处理, 可以设置多个字符过滤器, 按顺序执行.
例如, 先去除 HTML 标签, 再应用自定义替换规则:
GET /_analyze
{
"char_filter": [
"html_strip",
{
"type": "mapping",
"mappings": [
"_ => -",
":( => __unhappy__",
":) => __happy__"
]
}
],
"text": [
"Hello_world :)
"
]
}
Tokenizer:
负责分隔 Term, 仅允许设置一个.
它还会负责记录 term 的 order, position, offset 信息.
GET /_analyze
{
"tokenizer": ["whitespace"],
"text": "A big Apple."
}
Token Filter:
对分隔完成的 token 执行过滤和其他操作(例如同义词), 即 Token 的标准化. 可以设置多个.
GET /_analyze
{
"tokenizer": "whitespace",
"filter": [
"lowercase",
"stop"
],
"text": "A big Apple."
}
自定义 analyzer
DELETE /demo
PUT /demo
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"my_char_filter"
],
"tokenizer": "my_tokenizer",
"filter": [
"lowercase",
"my_token_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
":) => __happy__",
":( => __unhappy__"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "[,.!?-@]"
}
},
"filter": {
"my_token_filter": {
"type": "stop",
"stopwords": [
"hi",
"hello"
]
}
}
}
}
}
GET /demo/_analyze
{
"analyzer": "my_analyzer",
"text": "Hello, @Big-Apple! :)"
}
自定义同义词分析器:
- 写入或检索时, 会进行同义词替换
-
expand
默认为true
, 意为同义词之间可相互替换; -
expand
设为false
, 将后续词映射到第一个词;
DELETE /demo
PUT /demo
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_synonym"
]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"expand": true,
"synonyms": [
"IT, Internet",
"IT, Internet Technology",
"IT, Integration Testing"
]
}
}
}
},
"mappings": {
"properties": {
"job": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}
GET /demo/_analyze
{
"analyzer": "my_analyzer",
"text": "Integration Testing"
}
POST /_bulk
{"index": {"_index": "demo", "_id": 1}}
{"job": "Testing"}
{"index": {"_index": "demo", "_id": 2}}
{"job": "IT"}
{"index": {"_index": "demo", "_id": 3}}
{"job": "Internet"}
{"index": {"_index": "demo", "_id": 4}}
{"job": "IT"}
{"index": {"_index": "demo", "_id": 5}}
{"job": "Internet Technology"}
{"index": {"_index": "demo", "_id": 6}}
{"job": "Integration Testing"}
GET /demo/_search
{
"query": {
"match": {
"job": "Testing"
}
}
}
分析器工作在两个阶段:
- 索引时期
analyzer
- 检索时期
search_analyzer
Index analyzer 判断顺序:
1. 该属性上的analyzer
mapping 参数
2. 索引 settings 中的analysis.analyzer.default
3. 默认的standard analyzer
Search analyzer 判断顺序:
1. 该检索指定的 analyzer
2. 该属性上的search_analyzer
mapping 参数
3. 索引 settings 中的analysis.analyzer.default_search
4. 该属性上的analyzer
mapping 参数
5. 默认的standard analyzer
https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html#specify-index-time-analyzer
Index Template
- 相当于全局设置, 设置在符合匹配规则的索引上.
- 仅在索引创建时有效(删除模板不影响既有索引, 新加的模板不影响原有索引).
索引创建时的设置顺序:
1. 默认的 settings / mappings;
2. 根据 index template order
的顺序, 从 0 到大依次覆盖生效;
3. 用户指定的 settings / mappings 覆盖以上.
低阶模板提供基础设置, 高阶模板提供特定设置.
# 查看所有 _template
GET /_template
GET /_template/*
# 查看特定的 _template
GET /_template/my_template
GET /_template/my*
DELETE /_template/my_template
POST /_template/my_template
{
"order": 1,
"index_patterns": [
"*"
],
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_synonym"
]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms": [
"IT, Internet",
"IT, Internet Technology",
"IT, Integration Testing"
]
}
}
}
},
"mappings": {}
}
DELETE /demo
PUT /demo
{
"mappings": {
"properties": {
"job": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "simple"
}
}
}
}
GET /demo/_mapping
GET /demo/_analyze
{
"field": "job",
"text": "Internet Technology"
}
Dynamic Template
设置在特定索引上, 提供了更方便的匹配方式.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
聚合 aggs
GET /kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"my_aggs_dest": {
"terms": {
"field": "DestCountry",
"size": 3
},
"aggs": {
"my_price": {
"stats": {
"field": "AvgTicketPrice"
}
}
}
}
}
}