mongodb节点一直处于recovering状态问题修复

mongoDB版本:5.0.4

该节点mongod服务日志一直在刷如下日志

{"t":{"$date":"2023-06-19T15:24:50.156+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"192.168.xx.xx2:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1679574780,"i":1}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1685543933,"i":49}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2023-06-19T07:25:50.156Z"}}}
{"t":{"$date":"2023-06-19T15:24:50.157+08:00"},"s":"I",  "c":"REPL",     "id":21799,   "ctx":"ReplCoordExtern-0","msg":"Sync source candidate chosen","attr":{"syncSource":"192.168.xx.xx1:27017"}}
{"t":{"$date":"2023-06-19T15:24:50.157+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"192.168.xx.xx1:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1679574780,"i":1}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1685543941,"i":9}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2023-06-19T07:25:50.157Z"}}}
{"t":{"$date":"2023-06-19T15:24:50.157+08:00"},"s":"I",  "c":"REPL",     "id":21798,   "ctx":"ReplCoordExtern-0","msg":"Could not find member to sync from"}
{"t":{"$date":"2023-06-19T15:25:27.576+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1687159527:576881][80855:0x7f73a7094700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 8008, snapshot max: 8008 snapshot count: 0, oldest timestamp: (1679574480, 1) , meta checkpoint timestamp: (1679574780, 1) base write gen: 83012912"}}
{"t":{"$date":"2023-06-19T15:25:50.165+08:00"},"s":"I",  "c":"REPL",     "id":21799,   "ctx":"BackgroundSync","msg":"Sync source candidate chosen","attr":{"syncSource":"192.168.xx.xx2:27017"}}
{"t":{"$date":"2023-06-19T15:25:50.166+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"192.168.xx.xx2:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1679574780,"i":1}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1685543933,"i":49}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2023-06-19T07:26:50.166Z"}}}
{"t":{"$date":"2023-06-19T15:25:50.166+08:00"},"s":"I",  "c":"REPL",     "id":21799,   "ctx":"ReplCoordExtern-0","msg":"Sync source candidate chosen","attr":{"syncSource":"192.168.xx.xx1:27017"}}
{"t":{"$date":"2023-06-19T15:25:50.167+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"192.168.xx.xx1:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1679574780,"i":1}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1685543941,"i":9}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2023-06-19T07:26:50.167Z"}}}
{"t":{"$date":"2023-06-19T15:25:50.167+08:00"},"s":"I",  "c":"REPL",     "id":21798,   "ctx":"ReplCoordExtern-0","msg":"Could not find member to sync from"}

主要是这一句

We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp 意思就是说这个节点上的数据过于陈旧,无法实现主从同步。

那就是数据版本落后太多了。

解决办法是先备份集群数据,然后再重做这个节点。步骤如下:

  1. 备份
mongodump    --host=192.168.x.x    --port=20000  --authenticationDatabase admin   --username=root    --password="xxxxx"    -d databaseName --out=./databaseName 
  1. 关停有问题的节点
    systemctl stop mongod-shard1.service

  2. 删除问题节点数据
    比如我这个节点是shard1,则三处shard1数据目录下的所有数据
    rm -rf /data/mongodb/shard1/data/*

  3. 重新启动该节点
    systemctl start mongod-shard1.service

刚开始启动的时候,该节点状态会处于STARTUP2的状态,这表名它正在从主节点复制数据,如果现在去查看节点监控,会发现其入口带宽占用比较大,相对于primary节点出口带宽也比较大。
等数据同步完就正常了
mongodb节点一直处于recovering状态问题修复_第1张图片

接下来需要优化的工作:

  1. 添加副本集状态监控,只要不是primary或者secondary就告警;推荐使用mongodb_exporter

一些可能会使用到的命令:

rs.status();

mongorestore --host mongodb1.example.net --port 27017 --username user --password “pass” /opt/backup/mongodump-2011-10-24

Reference:

  • https://stackoverflow.com/questions/14371239/why-a-member-of-mongodb-keep-recovering
  • https://dba.stackexchange.com/questions/77881/mongo-db-replica-set-stuck-at-recovering-state

你可能感兴趣的:(MongoDB,mongodb,数据库)