为什么
Elasticsearch以设置副本做到高可用,为什么还需要备份呢?
其实在实际的生产环境中,一般最终的结果数据都是要备份的,这样的做的目的,就是能够以最快的速度还原数据,找回数据。
Elasticsearch备份数据有很多选择,本地,Amazon S3, HDFS, Microsoft Azure, Google Cloud Storage这些都可以。我这里选择了hdfs,因为已经有现成的环境,还有就是hdfs就是一个分布式的存储系统,也是数据高可用的呀,只要集群不椡,数据依然完整。
操作步骤
第一步:需要注册快照存储库
第二步:才能进行创建快照
前期准备
ES 安装
从官网下载linux源码包 https://www.elastic.co/downloads/elasticsearch
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz
tar -xzvf elasticsearch-7.9.2-linux-x86_64.tar.gz
cd elasticsearch-7.9.2
./bin/elasticsearch
安装hdfs 插件
sudo bin/elasticsearch-plugin install repository-hdfs
重启ES
创建index (customer)和 document 用于备份,过程略。
HDFS 准备工作
创建备份的目录
$ hdfs dfs -mkdir /es_bak
$ hdfs dfs -chmod 777 /es_bak
确认active namenode
$ hdfs haadmin -getAllServiceState
cd-lab-hdp-master-0:8020 active
cd-lab-hdp-master-2:8020 standby
注册快照存储库
官方网址: https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-hdfs-config.html
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs",
"settings": {
"uri": "hdfs://namenode:8020/",
"path": "elasticsearch/repositories/my_hdfs_repository",
"conf.dfs.client.read.shortcircuit": "true"
}
}
实例
$ curl -H "Content-Type: application/json" -XPUT localhost:9200/_snapshot/my_snapshot?pretty -d '
> {
> "type": "hdfs",
> "settings": {
> "uri": "hdfs://cd-lab-hdp-master-0:8020/",
> "path": "/es_bak",
> "conf.dfs.client.read.shortcircuit": "true",
> "conf.dfs.domain.socket.path": "/var/run/hdfs-sockets/dn"
> }
> }'
{
"acknowledged" : true
}
hdfs HA 的配置方法:
uri 需要提供 nameservice 名字,下面的例子中是 mycluster
curl -H "Content-Type: application/json" -XPUT localhost:9200/_snapshot/roy_snapshot?pretty -d '
{
"type": "hdfs",
"settings": {
"uri": "hdfs://mycluster/",
"path": "/es_bak",
"conf.dfs.nameservices": "mycluster",
"conf.dfs.ha.namenodes.mycluster": "nn1,nn2",
"conf.dfs.namenode.rpc-address.mycluster.nn1": "cd-lab-hdp-master-0:8020",
"conf.dfs.namenode.rpc-address.mycluster.nn2": "cd-lab-hdp-master-2:8020",
"conf.dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"conf.dfs.client.read.shortcircuit": "true",
"conf.dfs.domain.socket.path": "/var/run/hdfs-sockets/dn"
}
}'
检查
$ curl -H "Content-Type: application/json" localhost:9200/_snapshot/roy_snapshot?pretty
{
"roy_snapshot" : {
"type" : "hdfs",
"settings" : {
"path" : "/es_bak",
"uri" : "hdfs://mycluster/",
"conf" : {
"dfs" : {
"client" : {
"failover" : {
"proxy" : {
"provider" : {
"mycluster" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
}
}
},
"read" : {
"shortcircuit" : "true"
}
},
"ha" : {
"namenodes" : {
"mycluster" : "nn1,nn2"
}
},
"namenode" : {
"rpc-address" : {
"mycluster" : {
"nn1" : "cd-lab-hdp-master-0:8020",
"nn2" : "cd-lab-hdp-master-2:8020"
}
}
},
"domain" : {
"socket" : {
"path" : "/var/run/hdfs-sockets/dn"
}
},
"nameservices" : "mycluster"
}
}
}
}
}
创建快照
官方网址:https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-take-snapshot.html
使用格式:
PUT /_snapshot//
POST /_snapshot//
例子
$ curl -X PUT "localhost:9200/_snapshot/my_snapshot/snapshot_2?wait_for_completion=true&pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "customer",
> "ignore_unavailable": true,
> "include_global_state": false,
> "metadata": {
> "taken_by": "Roy",
> "taken_because": "backup before upgrading"
> }
> }
> '
{
"snapshot" : {
"snapshot" : "snapshot_2",
"uuid" : "fzjfDMzlTFu1ztAuNqPQlw",
"version_id" : 7100199,
"version" : "7.10.1",
"indices" : [
"customer"
],
"data_streams" : [ ],
"include_global_state" : false,
"metadata" : {
"taken_by" : "Roy",
"taken_because" : "backup before upgrading"
},
"state" : "SUCCESS",
"start_time" : "2020-12-11T03:08:43.903Z",
"start_time_in_millis" : 1607656123903,
"end_time" : "2020-12-11T03:08:44.304Z",
"end_time_in_millis" : 1607656124304,
"duration_in_millis" : 401,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
}
在HDFS 验证
$ hdfs dfs -ls /es_bak
Found 5 items
-rw-r--r-- 3 ansible hdfs 438 2020-12-11 03:08 /es_bak/index-0
-rw-r--r-- 3 ansible hdfs 8 2020-12-11 03:08 /es_bak/index.latest
drwxr-xr-x - ansible hdfs 0 2020-12-11 03:08 /es_bak/indices
-rw-r--r-- 3 ansible hdfs 234 2020-12-11 03:08 /es_bak/meta-fzjfDMzlTFu1ztAuNqPQlw.dat
-rw-r--r-- 3 ansible hdfs 322 2020-12-11 03:08 /es_bak/snap-fzjfDMzlTFu1ztAuNqPQlw.dat
$ hadoop fs -ls /es_bak/indices
Found 1 items
drwxr-xr-x - ansible hdfs 0 2020-12-11 03:08 /es_bak/indices/QuQ5jAsBQm6JXsy4vnFUsg
恢复数据
官网:https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-restore-snapshot.html
还原语法
POST /_snapshot/my_backup/snapshot_1/_restore
先删除当前的index
curl -XDELETE 'localhost:9200/customer'
确认一下
curl 'localhost:9200/_cat/indices?v'
例子
$ curl -X POST "localhost:9200/_snapshot/my_snapshot/snapshot_2/_restore?pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "customer",
> "ignore_unavailable": true,
> "include_global_state": false,
> "rename_pattern": "index_(.+)",
> "rename_replacement": "restored_index_$1",
> "include_aliases": false
> }
> '
{
"accepted" : true
}
再次查看,index 已经恢复了。
参考
https://www.cnblogs.com/zsql/p/13692734.html