记录一次使用elaticdump对es集群中的数据进行备份过程
首先我的环境是使用docker启动的es集群的单节点模式,下面为docker-compose文件
version: '2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
container_name: dockerelk_new_es
#restart: always
environment:
- node.name=es01
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- /services/elasticsearch/data:/usr/share/elasticsearch/data
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
ports:
- 9201:9200
- 9301:9300
注:这里的data目录一定要记得给上权限,因为是docker启动的,是没有elastic用户的,这里我因为是测试环境我就给了777。
elasticsearch.yml
---
# Default Elasticsearch configuration from elasticsearch-docker.
## from https://github.com/elastic/elasticsearch-docker/blob/master/build/elasticsearch/elasticsearch.yml
#
cluster.name: "docker-cluster"
network.host: 0.0.0.0
# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1
## Use single node discovery in order to disable production mode and avoid bootstrap checks
## see https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html
#
discovery.type: single-node
path.repo: /usr/share/elasticsearch/data/backup
path.data: /usr/share/elasticsearch/data
http.cors.enabled: true
http.cors.allow-origin: "*"
reindex.remote.whitelist: ["192.168.159.128:9200","192.168.159.128:9201"]
注:data和backup 前者是存放数据用的,后者是存放snapshot的仓库地址。
下面来查看一下单节点es的运行状态以及端口是否开放
[root@localhost ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9e4b87a2c962 docker.elastic.co/elasticsearch/elasticsearch:7.5.2 "/usr/local/bin/dock…" 10 days ago Up 23 hours 0.0.0.0:9202->9200/tcp, 0.0.0.0:9302->9300/tcp test-es
d750d5c11475 docker.elastic.co/elasticsearch/elasticsearch:7.5.2 "/usr/local/bin/dock…" 10 days ago Up 23 hours 0.0.0.0:9201->9200/tcp, 0.0.0.0:9301->9300/tcp d750d5c11475_dockerelk_new_es
[root@localhost ~]# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9201 0.0.0.0:* LISTEN 1711/docker-proxy
tcp 0 0 0.0.0.0:9202 0.0.0.0:* LISTEN 1603/docker-proxy
tcp 0 0 0.0.0.0:9301 0.0.0.0:* LISTEN 1699/docker-proxy
tcp 0 0 0.0.0.0:9302 0.0.0.0:* LISTEN 1592/docker-proxy
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 896/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1022/master
tcp6 0 0 :::22 :::* LISTEN 896/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1022/master
如果以上查看都没有问题的话,那么就可以开始对于es数据进行备份了。
es数据备份有两种方式,这里我都简单的介绍一下
优点:通过snapshot拍摄快照,然后定义快照备份策略,能够实现快照自动化存储,可以定义各种策略来满足自己不同的备份需求;
缺点:还原不够灵活,拍摄快照进行备份很快,但是还原的时候没办法随意进行还原,类似虚拟机快照
更详细浏览https://www.elastic.co/guide/en/elasticsearch/reference/7.x/snapshot-restore.html查看
在两个集群注册同样类型的快照仓库,方便之后还原
[root@localhost ~]# curl -X PUT "192.168.159.128:9201/_snapshot/back?pretty" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/usr/share/elasticsearch/data/backup",
"compress":"true"
}
}
'
查看仓库下所有的快照
[root@localhost ~]# curl -XGET "http://192.168.159.128:9201/_snapshot/_all?pretty" #查看所有已注册的快照仓库
[root@localhost ~]# curl -X GET "http://192.168.159.128:9201/_snapshot/back" #查看指定快照仓库
es拍摄快照备份
[root@localhost ~]# curl -XPUT http://192.168.159.128:9201/_snapshot/back/snapshot_test20180312
查看拍摄的快照
[root@localhost ~]# curl -XGET 'http://192.168.159.128:9201/_snapshot/back/_all?pretty'
{
"snapshots" : [
{
"snapshot" : "snapshot_test20180312",
"uuid" : "5htORqn4S3WQ8XoEz4y2PA",
"version_id" : 7050299,
"version" : "7.5.2",
"indices" : [
"undefined"
],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2021-02-24T06:37:25.239Z",
"start_time_in_millis" : 1614148645239,
"end_time" : "2021-02-24T06:37:26.047Z",
"end_time_in_millis" : 1614148646047,
"duration_in_millis" : 808,
"failures" : [ ],
"shards" : {
"total" : 6,
"failed" : 0,
"successful" : 6
}
}
]
}
上面是手动进行拍摄快照进行备份,es还提供一种SLM(Manage the snapshot lifecycle/快照生命周期管理),提供一种自动化的快照备份方式。
首先,创建快照备份策略
[root@localhost ~]# curl -X PUT "192.168.159.128:9201/_slm/policy/backup-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
"schedule": "0 0 0 ? * 2", #备份的时间
"name": "" , #快照的名称格式
"repository": "back", #备份的repository
"config": {
"indices": ["*"] #备份的索引
},
"retention": {
"expire_after": "14d", #快照存储的天数
"min_count": 3, #最少存多少快照
"max_count": 5 #最多存多少快照
}
}
'
注:这个快照的备份时间策略,是一个单独的策略,不用于crontab中的分时日月周。下面是对应关系:
“0 0 0 ? * 2” #? 是代表随机的一天 UTC 周日是每周的第一天 UTC时间与北京时间差8个小时
秒 分 时 某月的某日 每月 每周的第二天
既然创建好了快照策略,那咱们就查看一下咱们创建的策略
[root@localhost ~]# curl -X GET "192.168.159.128:9201/_slm/policy?pretty"
"backup-snapshots" : {
"version" : 1,
"modified_date_millis" : 1613628181765,
"policy" : {
"name" : "" ,
"schedule" : "0 0 0 ? * 2",
"repository" : "back",
"config" : {
"indices" : [
"*"
]
},
"retention" : {
"expire_after" : "14d",
"min_count" : 3,
"max_count" : 5
}
},
"next_execution_millis" : 1615161600000,
"stats" : {
"policy" : "backup-snapshots",
"snapshots_taken" : 0,
"snapshots_failed" : 0,
"snapshots_deleted" : 0,
"snapshot_deletion_failures" : 0
}
}
}
策略创建好之后,就可以每天定时查看快照了,如果着急想要查看效果的可以对schedule后面进行修改。
优点:相对于snapshot会比较灵活,可以针对不同的index进行备份或者恢复,也可以用于两个集群之间来进行数据的传输。
缺点:无论在备份还是还原的时候,需要注意的点比较多,虽然用起来比较灵活但是使用的时候需要很细心。
首先我们先安装elasticdump,我这里没有用yum安装,我发现yum安装的版本都是2x的,使用npm安装可以安装6.65.3的版本。
安装elaticdump
[root@localhost ~]# yum install epel-release
[root@localhost ~]# yum install nodejs
[root@localhost ~]# yum install npm
[root@localhost ~]# npm install elasticdump
如果直接使用elasticdump命令报错,切换到npm默认安装目录:/usr/local
在使用elaticdump备份之前,需要先做一些准备工作
创建索引
[root@localhost ~]# curl -XPUT '192.168.159.128:9201/first'
插入数据
[root@localhost ~]# curl -XPOST '192.168.159.128:9201/first/es/1?pretty' -H 'Content-Type: application/json' -d '
{
"name": "John Doe"
}'
使用的方法很简单,仅需提供input以及output即可,还有就是type(不过一般省略type也没事,mapping和data都会过去,如果要用analyzer建议使用完整传输的方式)。下面是使用elasticdump的几个场景
es数据导出到本地 .json文件里,从.json文件导入到es中反过来即可
[root@localhost ~]# elasticdump --input=http://localhost:9201/first --output=/services/elasticsearch/test3.json #first为索引名
本地.json文件导入数据到es
[root@localhost ~]# elasticdump --input=/services/elasticsearch/test3.json --output=http://localhost:9201/first
在两个集群中导入索引
[root@localhost ~]# elasticdump --input=http://ip:9200/first --output=http://127.0.0.1:9200/first
下面是一个索引完整的数据导入导出的过程
导出分词器
[root@localhost ~]# elasticdump --input=http://ip:9200/my_index --output=http://127.0.0.1:9200/my_index --type=analyzer
导出映射mapping
[root@localhost ~]# elasticdump --input=http://ip:9200/ --output=http://127.0.0.1:9200/ --all=true --type=mapping
导出全部数据
[root@localhost ~]# elasticdump --input=http://ip:9200/ --output=http://127.0.0.1:9200/ --all=true --type=data
如果集群配置了x-pack认证
[root@localhost ~]# elasticdump --input=http://user:password@ip:9200/ --output=http://user:[email protected]:9200/ --all=true --type=data
[root@localhost ~]# /usr/local/elasticdump --input=http://192.168.159.128:9201/first/ --output=/services/elasticsearch/test3.json --type=data
Fri, 19 Feb 2021 11:05:28 GMT | starting dump
Fri, 19 Feb 2021 11:05:28 GMT | Error Emitted => {"error":"Content-Type header [] is not supported","status":406}
Fri, 19 Feb 2021 11:05:28 GMT | Total Writes: 0
Fri, 19 Feb 2021 11:05:28 GMT | dump ended with error (get phase) => Error: {"error":"Content-Type header [] is not supported","status":406}
这个问题只有在导出data的时候才会出现,在导出analyzer和mapping的时候不会出现。
因为系统自带的nodejs版本太低,为3.10.10,升级nodejs即可
[root@localhost ~]# npm install -g n
[root@localhost ~]# n latest
下面是三个我写的es备份的脚本,三个各有千秋;第二个和第三个都是从第一个分离出来的,但是功能比第一个要多一点。
es-bak-case.sh
#!/usr/bin/bash
input="http://10.106.107.143:9202"
output="http://10.106.107.143:9201"
all_index="curl -X GET "http://10.106.107.143:9202/_cat/indices?v" | awk '{print$3}' | awk 'NR >= 2{print}' > ./index.txt"
menu() {
cat <<-EOF
+-------------------------------------------------------------+
+ 1.sync report +
+ 2.sync service +
+ 3.sync all +
+ 4.exit +
+-------------------------------------------------------------+
EOF
}
menu
while :
do
echo "Please enter index option for backup: "
read num
if [ $num -gt 4 ];then
echo "please enter again!!!"
echo "Please enter index option for backup: "
fi
case $num in
1) echo "Backing up report,wait for a moment"
curl -X DELETE "$output/dsp-rpt-gender" > /dev/null
elasticdump --input=$input/report --output=$output/report --limit=1000 > /dev/null
echo "Backup complete!"
;;
2) echo "Backing up service,wait for a moment"
elasticdump --input=$input/service --output=$output/service --limit=1000 > /dev/null
;;
3) echo "Backing up all index,wait for a moment"
curl -X GET "http://10.106.107.143:9202/_cat/indices?v" | awk '{print$3}' | awk 'NR >= 2{print}' > ./index.txt
cat ./index.txt | while read line
do
curl -X DELETE "$output/$line"
elasticdump --input=$input/$line --output=$output/$line --limit=1000 > /dev/null
done
;;
4) echo "欢迎使用,祝君工作顺利"
exit
;;
esac
done
es-bak.sh 脚本执行路径:/services/es-bak
#!/usr/bin/bash
date=`date +%Y%m%d`
input="http://10.106.107.143:9202"
output=/services/es-bak/$date/index
tar_path=/services/es-bak/tar
#date=`date +%Y%m%d`
mkdir -p /services/es-bak/$date/index
curl -X GET "$input/_cat/indices?v" | awk '{print$3}' | awk 'NR >= 2{print}' > /services/es-bak/$date/index.txt
cat /services/es-bak/$date/index.txt | while read line
do
# echo "Exporting files to the specified directory, please wait..."
elasticdump --input=$input/$line --output=/services/es-bak/$date/index/es-bak-$date-$line.json --limit=1000 > /dev/null
# echo "Export complete!"
done
#echo "Import complete!"
#打包
tar -cvzf $tar_path/es-bak-$date.tar.gz $date > /dev/null
#回收机制
find $output -mtime +14 -name *.json -exec rm -rf {} \;
es-restore.sh 脚本执行路径:/services/es-bak
#!/usr/bin/bash
echo "请输入还原的时间(格式为'20210301'): "
read date
input=/services/es-bak/$date/index
output="http://10.106.107.143:9201"
tar_path=/services/es-bak/tar
tar -xzf $tar_path/es-bak-$date.tar.gz -C /services/es-bak/
cat /services/es-bak/$date/index.txt | while read line
do
# echo "Importing from file to es, please wait..."
get=`curl -X GET "10.106.107.143:9201/_cat/indices?v" | grep $line | wc -l`
if [ $get -eq 0 ];then
elasticdump --input=$input/es-bak-$date-$line.json --output=$output/$line --limit=1000 > /dev/null
else
curl -X DELETE "$output/$line" > /dev/null
elasticdump --input=$input/es-bak-$date-$line.json --output=$output/$line --limit=1000 > /dev/null
fi
done
#echo "Import complete!"
#清除解压目录
rm -rf /services/es-bak/$date
如果想要全部自动化,可以把交互改成传参即可。
参考文档
https://www.cnblogs.com/JimShi/p/11244126.html
https://github.com/elasticsearch-dump/elasticsearch-dump