使⽤场景
-
⼀般在以下⼏种情况时,我们需要重建索引
索引的 Mappings 发⽣变更:字段类型更改,分词器及字典更新
索引的 Settings 发⽣变更:索引的主分⽚数发⽣改变
集群内,集群间需要做数据迁移
-
Elasticsearch 的内置提供的 API
Update By Query:在现有索引上重建
Reindex:在其他索引上重建索引
案例 1:为索引增加⼦字段
改变 Mapping,增加⼦字段,使⽤英⽂分词器
此时尝试对⼦字段进⾏查询
虽然有数据已经存在,但是没有返回结果
# 写入文档
PUT blogs/_doc/1
{
"content":"Hadoop is cool",
"keyword":"hadoop"
}
# 查看 Mapping
GET blogs/_mapping
# 更新
PUT blogs/_mapping
{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"english" : {
"type" : "text",
"analyzer":"english"
}
}
}
}
}
# 查询 Mapping 变更前写入的文档| 没有返回结果
POST blogs/_search
{
"query": {
"match": {
"content.english": "Hadoop"
}
}
}
Update By Query
# Update所有文档
POST blogs/_update_by_query
{
}
# 查询之前写入的文档
POST blogs/_search
{
"query": {
"match": {
"content.english": "Hadoop"
}
}
}
执⾏ Update By Query
尝试对 Multi-Fields 查询查询
返回结果
案例 2:更改已有字段类型的 Mappings
ES 不允许在原有 Mapping 上对字段类型进⾏修改
只能创建新的索引,并且设定正确的字段类型,再重新导⼊数据
PUT blogs/_mapping
{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"english" : {
"type" : "text",
"analyzer" : "english"
}
}
},
"keyword" : {
"type" : "keyword"
}
}
}
Reindex API
Reindex API ⽀持把⽂档从⼀个索引拷⻉到另外 ⼀个索引
-
使⽤ Reindex API 的⼀些场景
修改索引的主分⽚数
改变字段的 Mapping 中的字段类型
集群内数据迁移 / 跨集群的数据迁移
# Reindx API
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix"
}
}
两个注意点
OP Type
_reindex 只会创建不存在的⽂档
⽂档如果已经存在,会导致版本冲突
# Reindx API,version Type Internal
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix",
"op_type": "create"
}
}
跨集群 ReIndex
- 需要修改 elasticsearch.yml,并且重启节点
reindex.remote.whitelist: "otherhost:9200, another:9200”
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200"
},
"index": "source",
"size": 100,
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
查看 Task API
GET _tasks?detailed=true&actions=*reindex
Reindx API ⽀持异步操作,执⾏只返回 Task Id
POST _reindex?wait_for_completion=false
本节回顾
Update By Query 的使⽤场景:为字段新增⼦字段;字段更换分词器,或更新分词器词库
-
Reindex API 的使⽤场景:修改字段类型
- 需要先对新索引设置 Mapping,索引的设置和映射关系不会被复制
通过查看 Task API,了解 Reindex 的状况
Remote ReIndex,需要修改 elasticsearch.yml 配置并且重启
⼀定要尽量使⽤ Index Alias 读写数据。即便发⽣ Reindex,也能够实现零停机维护
课程demo
DELETE blogs/
# 写入文档
PUT blogs/_doc/1
{
"content":"Hadoop is cool",
"keyword":"hadoop"
}
# 查看 Mapping
GET blogs/_mapping
# 修改 Mapping,增加子字段,使用英文分词器
PUT blogs/_mapping
{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"english" : {
"type" : "text",
"analyzer":"english"
}
}
}
}
}
# 写入文档
PUT blogs/_doc/2
{
"content":"Elasticsearch rocks",
"keyword":"elasticsearch"
}
# 查询新写入文档
POST blogs/_search
{
"query": {
"match": {
"content.english": "Elasticsearch"
}
}
}
# 查询 Mapping 变更前写入的文档
POST blogs/_search
{
"query": {
"match": {
"content.english": "Hadoop"
}
}
}
# Update所有文档
POST blogs/_update_by_query
{
}
# 查询之前写入的文档
POST blogs/_search
{
"query": {
"match": {
"content.english": "Hadoop"
}
}
}
# 查询
GET blogs/_mapping
PUT blogs/_mapping
{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"english" : {
"type" : "text",
"analyzer" : "english"
}
}
},
"keyword" : {
"type" : "keyword"
}
}
}
DELETE blogs_fix
# 创建新的索引并且设定新的Mapping
PUT blogs_fix/
{
"mappings": {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"english" : {
"type" : "text",
"analyzer" : "english"
}
}
},
"keyword" : {
"type" : "keyword"
}
}
}
}
# Reindx API
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix"
}
}
GET blogs_fix/_doc/1
# 测试 Term Aggregation
POST blogs_fix/_search
{
"size": 0,
"aggs": {
"blog_keyword": {
"terms": {
"field": "keyword",
"size": 10
}
}
}
}
# Reindx API,version Type Internal
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix",
"version_type": "internal"
}
}
# 文档版本号增加
GET blogs_fix/_doc/1
# Reindx API,version Type Internal
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix",
"version_type": "external"
}
}
# Reindx API,version Type Internal
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix",
"version_type": "external"
},
"conflicts": "proceed"
}
# Reindx API,version Type Internal
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fix",
"op_type": "create"
}
}
GET _tasks?detailed=true&actions=*reindex
相关阅读
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docs-reindex.html
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docs-update-by-query.html