MongoDB分片集群shard节点宕掉后的处理

方案:用新sharding替换宕掉的sharding

一、新建shard节点并启动(假设新sharding的端口号为27028)

二、在mongos上删除原shard节点的配置信息

    >  use admin

    >  db.runCommand({removeShard:"10.201.81.105:27018"})

若出现以下错误信息:

    mongos> db.runCommand({removeShard:"10.201.81.105:27018"})

    {

        "msg" : "draining ongoing",

        "state" : "ongoing",

        "remaining" : {

        "chunks" : NumberLong(0),

        "dbs" : NumberLong(1)

    },

    "note" : "you need to drop or movePrimary these databases",

    "dbsToMove" : [

        "mgotest2"

      ],

    "ok" : 1,

    "operationTime" : Timestamp(1560822195, 1),

    "$clusterTime" : {

            "clusterTime" : Timestamp(1560822195, 1),

            "signature" : {

                   "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

                   "keyId" : NumberLong(0)

             }

        }

    }

注意:如果删除的片是数据库的大本营(基片),必须手动移动或删除数据库,用moveprimary命令,上面的示例中就提示10201.81.105:27018是mgotest2库的大本营(primary),这个信息可以通过查看config.databases看到:

mongos> use config

    switched to db config

mongos> db.databases.find()

    { "_id" : "mgotest2", "primary" : "shardsvr2", "partitioned" : false, "version" : { "uuid" : UUID("360e1a26-b6f9-44a5-90ea-a148bf854e59"), "lastMod" : 1 } }

此操作需要主节点10.201.81.105:27018正常运行状态下进行,否则会出现以下错误

mongos> use admin

    switched to db admin

mongos> db.runCommand({"moveprimary":"mgotest2","to":"10.201.81.218:27018"})

{

    "ok" : 0,

    "errmsg" : "Could not find host matching read preference { mode: \"primary\" } for set shardsvr2",

    "code" : 133,

    "codeName" : "FailedToSatisfyReadPreference",

    "operationTime" : Timestamp(1560823671, 1),

    "$clusterTime" : {

        "clusterTime" : Timestamp(1560823671, 1),

        "signature" : {

                "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

                "keyId" : NumberLong(0)

        }

    }

}

若主节点10.201.81.105:27018不能正常运行,则进行以下操作

mongos> use config

    switched to db config

mongos> db.shards.find()

    { "_id" : "shardsvr1", "host" : "shardsvr1/10.201.81.218:27018", "state" : 1 }

    { "_id" : "shardsvr2", "host" : "shardsvr2/10.201.81.105:27018", "state" : 1, "draining" : true }

mongos> db.databases.find()

    { "_id" : "mgotest2", "primary" : "shardsvr2", "partitioned" : false, "version" : { "uuid" : UUID("360e1a26-b6f9-44a5-90ea-a148bf854e59"), "lastMod" : 1 } }

mongos> db.databases.remove({"_id" : "mgotest2"})

    WriteResult({ "nRemoved" : 1 })

进行以上步骤后,可以正常删除原shard节点的配置信息

三、添加新shard节点的配置信息

mongos> sh.addShard("shardsvr1/10.201.81.105:27028")

{

    "shardAdded" : "shardsvr1",

    "ok" : 1,

    "operationTime" : Timestamp(1560825183, 1),

    "$clusterTime" : {

            "clusterTime" : Timestamp(1560825183, 1),

            "signature" : {

                    "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),

                    "keyId" : NumberLong(0)

            }

    }

}

你可能感兴趣的:(MongoDB分片集群shard节点宕掉后的处理)