详细文档:https://yuque.antfin-inc.com/aligamesmw/es/bkaie0
reindex是es官方自带的一个支持跨集群数据迁移的工具,也可以同数据库下进行表之间的数据同步;配置简单。
1、reindex迁移是使用数据写入时的snapshot,因此数据不能实时迁移,需要先全量,然后在增量更新,并且期间业务需要停写,读不影响;
2、一般1G的数据量大概需要10min,相对的50万条记录大概需要10min;
3、迁移只能索引对索引;
迁移命令(此为跨集群的,此为目标集群执行的指令,host要写源集群的coo节点):
# 在目标集群执行 POST _reindex?wait_for_completion=false { "source": { "remote": { "host": "http://10.26.33.57:9200" }, "index": "redis_stat_es" }, "dest": { "index": "ngrade-data_index_9" } }
跨集群是会需要配置白名单,需要将源数据集群的一个coordinate节点的"IP:PORT"添加到目标集群的coordinate节点里,并重启进程,:
# 20190627 添加 reindex 白名单 reindex.remote.whitelist: ["11.5.23.14:9200", "11.156.9.171:9200"]
如果遇到以下报错,
{ "error": { "root_cause": [ { "type": "illegal_state_exception", "reason": "scripts of type [inline], operation [update] and lang [painless] are disabled" } ], "type": "illegal_state_exception", "reason": "scripts of type [inline], operation [update] and lang [painless] are disabled" }, "status": 500
需要修改配置,参考:
---------------
同集群的,不用加remote,只有source和dest
POST _reindex?wait_for_completion=false { "source": { "index": "redis_stat_es" }, "dest": { "index": "ngrade-data_index_9" } }
URL参数说明
wait_for_completion=false,表示请求提交成功后即可返回,后台执行任务。通常reindex比较耗时,推荐后台执行。
Body字段说明
source:是源集群相关信息; remote:是源集群http协议的地址,比如我们上面配置白名单中的源集群的coordinate地址; index:源集群中的索引名称; dest:是目标集群相关信息; index:目标集群中的索引名称;
更多参数参考官网:https://www.elastic.co/guide/en/elasticsearch/reference/5.3/docs-reindex.html
------
curl提交方法举例;
curl -XPOST 'http://localhost:9200/_reindex?wait_for_completion=false&pretty' -H 'Content-Type: application/json' -d '{"source": {"index": "cyapp-song-import_usercustomclip_perf"},"dest": {"index": "cyapp-song-import_usercustomclip_readonly"}}'
检查方法:
curl -XGET http://localhost:9200/_tasks?detailed=true&actions=*reindex curl -XGET http://localhost:9200/_tasks/2iF4iwH9Qp-EDiRMGJZtuQ:1781186699
可以加“size:*” 进行限速
POST _reindex { "size": 10000, "source": { "index": "twitter", "sort": { "date": "desc" } }, "dest": { "index": "new_twitter" } }
-----------
迁移辅助命令:
GET _tasks?detailed=true&actions=*reindex #查看所有在运行的reindex任务
GET .tasks/task/_search #使用wait_for_completion=false后,ES会创建一个.task索引来存储task结果,可以查看
GET /_tasks/BASGS3wUReOITmwfZtbmag:440973 #通过taskId查看任务详情
POST _tasks/BASGS3wUReOITmwfZtbmag:440973/_cancel #取消任务
------------
附一条业务相对复杂的迁移DSL用例:
{ "conflicts": "proceed", "source": { "remote": { "host": "http://11.5.23.14:9200" }, "size": 5000, "index": "wpk2-server_gymbo", "query": { "bool": { "filter": [ { "range": { "uploadDate": { "from": "20190101", "to": "20190201", "include_uppper": false, "include_lower": true } } } ] } } }, "dest": { "index": "wpk-server_app_wbr0_201901", "op_type": "create" }, "script": { "inline": "ctx._source.wpk_appid = 'gymbo'", "lang": "painless" } }