Kuboard突然无法访问提示:Failed to connect to the database

一、背景

没有做任何特殊操作,突然kuboard访问时,提示如下信息:

{
  "message": "Failed to connect to the database.",
  "type": "Internal Server Error"
}

二、排查过程

此处kuboard为docker部署的,查看kuboard的运行情况,提示Up 6 months 正在运行

docker ps | grep kuboard

查看kuboard容器的日志:

docker logs -f  --tail=10  容器ID
[root@nb003 ~]# docker logs -f  --tail=10  a2caf8010e75
time="2023-09-23T05:15:08Z" level=error msg="failed to rotate keys: etcdserver: mvcc: database space exceeded"
{"level":"warn","ts":"2023-09-23T13:15:12.504+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-8f19b170-257f-4a30-942f-1be1122e3be0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
time="2023-09-23T05:15:12Z" level=error msg="Storage health check failed: create auth request: etcdserver: mvcc: database space exceeded"

日志如上,发现提示ResourceExhausted desc = etcdserver: mvcc: database space exceeded,这表示etcd服务磁盘空间不足了,默认的空间配额限制为2G,超出空间配额限制就会影响服务,所以需要定期清理。
故查看数据映射的空间大小,找到自己的kuboard-data,查看
etcd db占用空间大小,发现从9月23日11点57的时候就是2GB了。已经达到默认的空间配额限制为2G的最大值。

[root@nb003 snap]# cd /data/kuboard-data/etcd-data/member/snap
[root@nb003 snap]# pwd
/data/kuboard-data/etcd-data/member/snap
[root@nb003 snap]# ls -lrth
总用量 2.1G
-rw-r--r-- 1 root root 8.0K 919 11:49 0000000000000005-00000000005c4542.snap
-rw-r--r-- 1 root root 8.0K 920 08:41 0000000000000005-00000000005c6c53.snap
-rw-r--r-- 1 root root 8.0K 921 05:33 0000000000000005-00000000005c9364.snap
-rw-r--r-- 1 root root 8.0K 922 02:26 0000000000000005-00000000005cba75.snap
-rw-r--r-- 1 root root 8.0K 923 07:13 0000000000000005-00000000005ce186.snap
-rw------- 1 root root 2.0G 923 11:57 db

进入kuboard容器内部,查看etcd的情况:可以看到在ERRORS列里同样也提示了一个警告alarm:NOSPACE空间不足

[root@nb003 snap]# docker exec -it a2caf8010e75 bash
root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|       ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| http://127.0.0.1:2379 | 59a9c584ea2c3f35 |  3.4.14 |  2.1 GB |      true |      false |         6 |    6089300 |            6089300 |   memberID:6460912315094810421 |
|                       |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
root@a2caf8010e75:/# 

三、解决办法

在kuboard容器中依次做如下操作:

# 备份db
etcdctl snapshot save backup.db
# 查看当前版本
rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
# 压缩旧版本
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
# 整理多余的空间
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 defrag
# 取消告警信息(之前有nospace的告警)
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 alarm disarm
# 再次查看etcd的状态(发现ERROR字段已为空)
ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status

详细过程及其输出如下:

root@a2caf8010e75:/# etcdctl snapshot save backup.db
{"level":"info","ts":1695447648.315712,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"backup.db.part"}
{"level":"info","ts":"2023-09-23T13:40:48.317+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1695447648.3172774,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2023-09-23T13:41:03.646+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1695447663.8131642,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"2.1 GB","took":15.497392681}
{"level":"info","ts":1695447663.8132935,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"backup.db"}
Snapshot saved at backup.db
root@a2caf8010e75:/# rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
root@a2caf8010e75:/#  ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
compacted revision 6077603
root@a2caf8010e75:/#  ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 defrag
Finished defragmenting etcd member[http://127.0.0.1:2379]
root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 alarm disarm
memberID:6460912315094810421 alarm:NOSPACE 
root@a2caf8010e75:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|       ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://127.0.0.1:2379 | 59a9c584ea2c3f35 |  3.4.14 |  127 kB |      true |      false |         6 |    6089454 |            6089454 |        |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
root@a2caf8010e75:/# 

四、访问验证结果

浏览器访问kuboard(ip:30080)验证,访问正常
Kuboard突然无法访问提示:Failed to connect to the database_第1张图片
查看etcd db的占用情况:发现大小变为156K
Kuboard突然无法访问提示:Failed to connect to the database_第2张图片

END

你可能感兴趣的:(K8S+Docker,kuboard提示连不上数据库,space,exceeded,NOSPACE,kuboard无法访问,kuboard空间不足)