mongodb--解决error RS102 too stale to catch up

今天在做mongodb测试验证时,日志报错,导致主从不同步了如:
PRIMARY> rs.status()
{
        "set" : "shard1",
        "date" : ISODate("2012-07-26T02:26:03Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.30.31:27017",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 46826,
                        "optime" : {
                                "t" : 1342791618000,
                                "i" : 562
                        },
                        "optimeDate" : ISODate("2012-07-20T13:40:18Z"),
                        "lastHeartbeat" : ISODate("2012-07-26T02:26:02Z"),
                        "pingMs" : 0,
                        "errmsg" : "error RS102 too stale to catch up"
                },
                {
                        "_id" : 1,
                        "name" : "192.168.30.103:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
              %2nbsp;         "optime" : {
                                "t" : 1343208110000,
                                "i" : 549
                        },
                        "optimeDate" : ISODate("2012-07-25T09:21:50Z"),
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.30.33:27017",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 46804,
                        "optime" : {
                                "t" : 0,
                                "i" : 0
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2012-07-26T02:26:02Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

日志信息:
turn:1 reslen:155 0ms
Thu Jul 26 09:39:54 [conn2940] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:54 [conn2940] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:55 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }
Thu Jul 26 09:39:55 [conn2941] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:56 [conn2940] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:56 [conn2940] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:57 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }
Thu Jul 26 09:39:57 [conn2941] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:58 [rsSync] replSet syncing to: 192.168.30.103:27017
Thu Jul 26 09:39:58 BackgroundJob starting: ConnectBG
Thu Jul 26 09:39:58 [rsSync] replHandshake res not: 1 res: { ok: 1.0 }
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up, at least from 192.168.30.103:27017
Thu Jul 26 09:39:58 [rsSync] replSet our last optime : Jul 20 21:40:18 50095fc2:232
Thu Jul 26 09:39:58 [rsSync] replSet oldest at 192.168.30.103:27017 : Jul 25 15:28:41 500fa029:262a
Thu Jul 26 09:39:58 [rsSync] replSet See http://www.mongodb.org/display/D ... +Replica+Set+Member
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up
Thu Jul 26 09:39:58 [journal] lsn set 44019576
Thu Jul 26 09:39:58 [conn2940] end connection 192.168.30.33:59026
Thu Jul 26 09:39:58 [initandlisten] connection accepted from 192.168.30.33:59037 #2942
Thu Jul 26 09:39:58 [conn2942] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:58 [conn2942] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:59 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }

查了一些资料,没有的很好的解决办法:

该如何处理?

幸运的是官方文档 Resyncing a Very Stale Replica Set Member 告诉了问题所在,OPLOG(operation log 的简称)。OPLOG 是用于 Replica Set的 PRIMARY 和 SECONDARY 之间同步数据的系统 COLLECTION。OPLOG 的数据大小是有峰值的,64 位机器默认为 ~19G(19616.9029296875MB),通过 db.printReplicationInfo() 可以查看到: (这里19G,和我测试的有出入,configured oplog size:  11230.146875MB)

configured oplog size: 19616.9029296875MB (OPLOG 大小)

log length start to end: 15375secs (4.27hrs) (OPLOG 中操作最早与最晚操作的时间差)

oplog first event time: Thu Jul 07 2011 21:03:29 GMT+0800 (CST)

oplog last event time: Fri Jul 08 2011 01:19:44 GMT+0800 (CST)

now: Thu Jul 07 2011 17:20:16 GMT+0800 (CST)

要了解上面参数更详细的含义可以看下 mongo_vstudio.cpp 源代码, JS 的噢

https://github.com/mongodb/mongo/blob/master/shell/mongo_vstudio.cpp

当 PRIMARY 有大量操作的时候,OPLOG 里就会插入相应的大量文档。每条文档就是一个操作,有插入(i)、更新(u)、删除(d)。

test:PRIMARY> db.oplog.rs.find()

{ “ts” : { “t” : 1310044124000, “i” : 11035 }, “h” : NumberLong(“-2807175333144039203″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6612″), … } }

{ “ts” : { “t” : 1310044124000, “i” : 11036 }, “h” : NumberLong(“5285197078463590243″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6613″), … } }

ts: the time this operation occurred.

h: a unique ID for this operation. Each operation will have a different value in this field.

op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

由于 OPLOG 的大小是有限制的,所以 SECONDARY 的同步可能无法更上 PRIMARY 插入的速度。这时候当我们查看 rs.status() 状态的时候就会出现 “error RS102 too stale to catch up” 的错误。

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

解决办法:

Resyncing a Very Stale Replica Set Member 给出了当我们遇到 Error RS102 错误时,该做些什么事。还可以根据 Halted Replication 中的 Increasing the OpLog Size ,调整 OPLOG 的大小为适当的值。我测试中我把OPLOG的值调为20000

This indicates that you’re adding data to the database at a rate of 524MB/hr. If an initial clone takes 10 hours, then the oplog should be at least 5240MB, so something closer to 8GB would make for a safe bet.

最后在数据继续插入的情况下,使用 rs.remove() 移除 2 个SECONDARY 后,插入又恢复了原来的速度。剩下就是插完后再重新同步 SECONDARY。

>mongo insert in 0.62605094909668 Secs. memory 164.25 MB

>mongo insert in 0.63488984107971 Secs. memory 164 MB

>mongo insert in 0.64394617080688 Secs. memory 164.25 MB

>mongo insert in 0.61102414131165 Secs. memory 164 MB

>mongo insert in 0.64304113388062 Secs. memory 164.25 MB

最后看到高人处看到这个方法,实践是没有问题的,不过根据数据量的大小,是需要耗时的,不过在standby上,还好不影响生产性能。也比较耗资源,如:

 

top - 11:15:04 up 149 days, 23:15,  8 users,  load average: 12.37, 8.09, 2.77

Tasks: 390 total,   1 running, 386 sleeping,   2 stopped,   1 zombie

Cpu0  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu3  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu4  : 60.6%us,  3.6%sy,  0.0%ni, 10.9%id, 24.5%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu5  :  3.3%us,  0.3%sy,  0.0%ni, 91.7%id,  4.6%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu6  :  1.0%us,  0.0%sy,  0.0%ni, 94.7%id,  4.3%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu7  :  2.6%us,  0.3%sy,  0.0%ni, 91.7%id,  4.0%wa,  0.3%hi,  1.0%si,  0.0%st

Mem:  16410952k total, 16155884k used,   255068k free,    49356k buffers

Swap:  2096440k total,   283840k used,  1812600k free, 13972792k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                        

13818 root      15   0 56.4g 5.7g 5.7g S 68.0 36.7   6:31.75 mongod 

cpu占用资源也是比较大的。有没有更好的方法来处理这个故障,欢迎讨论

You don't need to repair, simply perform a full resync.

On the secondary, you can:

stop the failed mongod

delete all data in the dbpath (including subdirectories)

restart it and it will automatically resynchronize itself

Follow the instructions here.

What's happened in your case is that your secondaries have become stale, i.e. there is no common point in their oplog and that of the oplog on the primary. Look at thisdocument, which details the various statuses. The writes to the primary member have to be replicated to the secondaries and your secondaries couldn't keep up until they eventually went stale. You will need to consider resizing your oplog.

Regarding oplog size, it depends on how much data you insert/update over time. I would chose a size which allows you many hours or even days of oplog.

Additionally, I'm not sure which O/S you are running. However, for 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog. If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space. For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog and for 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.

How big are records and how many do you want? It depends on whether this data insertion is something typical or something abnormal that you were merely testing.

For example, at 2000 documents per second for documents of 1KB, that would net you 120MB per minute and your 5GB oplog would last about 40 minutes. This means if the secondary ever goes offline for 40 minutes or falls behind by more than that, then you are stale and have to do a full resync.

I recommend reading the Replica Set Internals document here. You have 4 members in your replica set, which is not recommended. You should have an odd number for thevoting election (of primary) process, so you either need to add an arbiter, another secondary or remove one of your secondaries.

Finally, here's a detailed document on RS administration.

 

一些解释

Replica Set status状态说明:

0 Starting up, phase 1

1 Primary

2 Secondary

3 Recovering

4 Fatal error

5 Starting up, phase 2

6 Unknown state

7 Arbiter

8 Down

health 健康度:

0 Server is down

1 Server is up

 

 

你可能感兴趣的:(Database)