ETCD数据库异常:mvcc: database space exceeded解决

ETCD数据库异常:mvcc: database space exceeded解决

  • 问题来源:在k8s集群中给node打标签发现报错

    [root@master1]# kubectl label node  30.4.228.20 env=prod
    Error from server: etcdserver: mvcc: database space exceeded
    
  • 环境信息

    etcd集群:30.4.228.19,30.4.228.20,30.4.228.22 (配置了安全加密)
    

原因分析:

  • etcd服务未设置自动压缩参数(auto-compact)
  • etcd 默认不会自动 compact,需要设置启动参数,或者通过命令进行compact,如果变更频繁建议设置,否则会导致空间和内存的浪费以及错误。Etcd v3 的默认的 backend quota 2GB,如果不 compact,boltdb 文件大小超过这个限制后,就会报错:”Error: etcdserver: mvcc: database space exceeded”,导致数据无法写入。

处理过程:

  • 1、 获取旧版本号 :
[root@etcd1]# export ETCDCTL_API=3   #使用 api version 3
[root@etcd1]# rev=$(/usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd.pem \ 
--key=/etc/etcd/ssl/etcd-key.pem \ 
--endpoints="https://127.0.0.1:2379" \
endpoint status --write-out="json" \ 
| egrep -o '"revision":[0-9]*' \ 
| egrep -o '[0-9].*') 

[root@etcd1]# echo $rev 
  • 2 、压缩旧版本
[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.20:2379" compact $rev

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.20:2379" defrag 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.20:2379" alarm list 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.19:2379" compact $rev

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.19:2379" defrag 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.22:2379" compact $rev

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.22:2379" defrag 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.22:2379" alarm list 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.22:2379" alarm disarm
  • 3、查看告警
[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.22:2379" alarm list

[root@etcd1]#  /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.19:2379" alarm list 

[root@etcd1]# /usr/local/bin/etcdctl --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://30.4.228.20:2379" alarm list

etcd相关命令:

1、设置etcd配额:

# 设置16MB的配额
etcd --quota-backend-bytes=$((16*1024*1024))

2、触发配额耗尽:

while [1];do dd if=dev/urandom bs=1024 count=1024 \
 | ETCDCTL_API=3 etcdctl put key || break; done
...
Error: rpc error: code = 8 desc = etcdserver: mvcc:database space exceeded

3、确认数据空间超出配额:

ETCDCTL_API=3 etcdctl --write-out=table endpoint status
----------------+------------------+-----------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        |  VERSION  | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+-----------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | bf9071f4639c75cc | 2.3.0+git | 18 MB   | true      |         2 |       3332 |
+----------------+------------------+-----------+---------+-----------+-----------+------------+

4、查看告警:

ETCDCTL_API=3 etcdctl alarm list

5、整合压缩、碎片整理:

1.获取当前etcd数据的修订版本(revision)

rev=$(ETCDCTL_API=3 etcdctl --endpoints=:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')

2.整合压缩旧版本数据

ETCDCTL_API=3 etcdctl compact $rev

3.执行碎片整理

ETCDCTL_API=3 etcdctl defrag

4.解除告警

ETCDCTL_API=3 etcdctl alarm disarm

5.备份以及查看备份数据信息

ETCDCTL_API=3 etcdctl snapshot save backup.db
ETCDCTL_API=3 etcdctl snapshot status backup.db

你可能感兴趣的:(etcd)