一个mongo php sdk长连接的问题

问题:当使用php driver进行压测的时候,有时候发现会出现异常,出现No candidate servers found的错误。

环境:server端用了三台机器做replicaSet,mongod用的机器名标识。 php driver在new Mongoclient时使用array('readPreference' => MongoClient::RP_NEAREST, 'replicaSet' => 'test001'))参数,server用的ip来标识。

经过排查,发现产生的原因是replicaSet机制导致的。

出现"No candidate servers found"出现的原因是因为无法与服务端连立连接导致,为什么无法连接连接呢?用netstat -na发现有大量的TIME_WAIT导致,导致web机上端口不够用,所以无法建立新的连接,从而就出现了这个错误。

我们知道mongo php driver使用的是长连接,为什么还会连接这么多连接呢?

经过分析php driver的代码,在new MongoClient时,如果指明了replicaSet参数,则建连接时会调用mongo_get_read_write_connection_replicaset,在这个函数中会调用mongo_discover_topology来发现集群中的新节点,这个函数中会调用mongo_connection_ismaster来探测新节点以节确定是否是master还是secondary等。

static void mongo_discover_topology(mongo_con_manager *manager, mongo_servers *servers)
{
        int i, j;
        char *hash;
        mongo_connection *con;
        char *error_message;
        char *repl_set_name = servers->options.repl_set_name ? strdup(servers->options.repl_set_name) : NULL;
        int nr_hosts;
        char **found_hosts = NULL;
        char *tmp_hash;
        int   res;

        for (i = 0; i < servers->count; i++) {
                hash = mongo_server_create_hash(servers->server[i]);
                mongo_manager_log(manager, MLOG_CON, MLOG_FINE, "discover_topology: checking ismaster for %s", hash);
                con = mongo_manager_connection_find_by_hash(manager, hash);

                if (!con) {
                        mongo_manager_log(manager, MLOG_CON, MLOG_WARN, "discover_topology: couldn't create a connection for %s", hash);
                        free(hash);
                        continue;
                }

                res = mongo_connection_ismaster(manager, con, &servers->options, (char**) &repl_set_name, (int*) &nr_hosts, (char***) &found_hosts, (char**) &error_message, servers->server[i]);
                switch (res) {
                        case 0:
                                /* Something is wrong with the connection, we need to remove
                                 * this from our list */
                                mongo_manager_log(manager, MLOG_CON, MLOG_WARN, "discover_topology: ismaster return with an error for %s:%d: [%s]", servers->server[i]->host, servers->server[i]->port, error_message);
                                free(error_message);
                                mongo_manager_connection_deregister(manager, con);
                                break;

                        case 3:
                                mongo_manager_log(manager, MLOG_CON, MLOG_WARN, "discover_topology: ismaster worked, but we need to remove the seed host's connection");
                                mongo_manager_connection_deregister(manager, con);
                                /* Break intentionally missing */

在mongo_connection_ismaster函数中,会向server发送查询包,server端会返回它所知道的集群中其它hosts以及它自己的hostname。然后driver会有一个判断,判断server返回的hostname与连接的机器是否是相同,如果不同(因为driver写的是ip,mongod用的是hostname,所以会不同),就认为当前连接的主机不在这个集群内,会close这个连接。

所以每次new Mongoclient时,都会重连基于ip的server,然后被close,导致很多time_wait状态的socket,进而使得无法连接到服务器。

        we_think_we_are = mongo_server_hash_to_server(con->hash);
        if (strcmp(connected_name, we_think_we_are) == 0) {
                mongo_manager_log(manager, MLOG_CON, MLOG_FINE, "ismaster: the server name matches what we thought it'd be (%s).", we_think_we_are);
        } else {
                mongo_manager_log(manager, MLOG_CON, MLOG_WARN, "ismaster: the server name (%s) did not match with what we thought it'd be (%s).", connected_name, we_think_we_are);
                /* We reset the name as the server responded with a different name than
                 * what we thought it was */
                free(server->host);
                server->host = mcon_strndup(connected_name, strchr(connected_name, ':') - connected_name);
                server->port = atoi(strchr(connected_name, ':') + 1);
                retval = 3;
        }


解决方法:

1 修改driver,对hostname名称不进行判定

2 或者将driver中的和mongod都统一用hostname来标识服务

你可能感兴趣的:(一个mongo php sdk长连接的问题)