通过历史监控我们可以发现用户在频繁使用短连接的时候Redis的cpu使用率有显著的上升
通过扁鹊查看但是Redis的cpu运行情况如下
从扁鹊我们可以看到Redis在freeClient的时候会频繁调用listSearchKey,并且该函数占用了百分30左右的调用量,如果我们可以优化降低该调用,短连接性能将得到具体提升。
通过以上分析我们可以知道Redis在释放链接的时候频繁调用了listSearchKey,通过查看Redis关闭客户端源码如下:
void freeClient(redisClient *c) {
listNode *ln;
/* If this is marked as current client unset it */
if (server.current_client == c) server.current_client = NULL;
/* If it is our master that's beging disconnected we should make sure
* to cache the state to try a partial resynchronization later.
*
* Note that before doing this we make sure that the client is not in
* some unexpected state, by checking its flags. */
if (server.master && c->flags & REDIS_MASTER) {
redisLog(REDIS_WARNING,"Connection with master lost.");
if (!(c->flags & (REDIS_CLOSE_AFTER_REPLY|
REDIS_CLOSE_ASAP|
REDIS_BLOCKED| REDIS_UNBLOCKED)))
{
replicationCacheMaster(c);
return;
}
}
/* Log link disconnection with slave */
if ((c->flags & REDIS_SLAVE) && !(c->flags & REDIS_MONITOR)) {
redisLog(REDIS_WARNING,"Connection with slave %s lost.",
replicationGetSlaveName(c));
}
/* Free the query buffer */
sdsfree(c->querybuf);
c->querybuf = NULL;
/* Deallocate structures used to block on blocking ops. */
if (c->flags & REDIS_BLOCKED)
unblockClientWaitingData(c);
dictRelease(c->bpop.keys);
freeClientArgv(c);
/* Remove from the list of clients */
if (c->fd != -1) {
ln = listSearchKey(server.clients,c);
redisAssert(ln != NULL);
listDelNode(server.clients,ln);
}
/* When client was just unblocked because of a blocking operation,
* remove it from the list of unblocked clients. */
if (c->flags & REDIS_UNBLOCKED) {
ln = listSearchKey(server.unblocked_clients,c);
redisAssert(ln != NULL);
listDelNode(server.unblocked_clients,ln);
}
...
...
...
/* Release other dynamically allocated client structure fields,
* and finally release the client structure itself. */
if (c->name) decrRefCount(c->name);
zfree(c->argv);
freeClientMultiState(c);
sdsfree(c->peerid);
if (c->pause_event > 0) aeDeleteTimeEvent(server.el, c->pause_event);
zfree(c);
}
从源码我们可以看到Redis在释放链接的时候遍历server.clients查找到对应的redisClient对象然后调用listDelNode把该redisClient对象从server.clients删除,代码如下:
/* Remove from the list of clients */
if (c->fd != -1) {
ln = listSearchKey(server.clients,c);
redisAssert(ln != NULL);
listDelNode(server.clients,ln);
}
查看server.clients为List结构,而redis定义的List为双端链表,我们可以在createClient的时候将redisClient的指针地址保留再freeClient的时候直接删除对应的listNode即可,无需再次遍历server.clients,代码优化如下:
redisClient *createClient(int fd) {
redisClient *c = zmalloc(sizeof(redisClient));
/* passing -1 as fd it is possible to create a non connected client.
* This is useful since all the Redis commands needs to be executed
* in the context of a client. When commands are executed in other
* contexts (for instance a Lua script) we need a non connected client. */
if (fd != -1) {
anetNonBlock(NULL,fd);
anetEnableTcpNoDelay(NULL,fd);
if (server.tcpkeepalive)
anetKeepAlive(NULL,fd,server.tcpkeepalive);
if (aeCreateFileEvent(server.el,fd,AE_READABLE,
readQueryFromClient, c) == AE_ERR)
{
close(fd);
zfree(c);
return NULL;
}
}
...
if (fd != -1) {
c->client_list_node = listAddNodeTailReturnNode(server.clients,c);
}
return c;
}
freeClient修改如下:
/* Remove from the list of clients */
if (c->fd != -1) {
if (c->client_list_node != NULL) listDelNode(server.clients,c->client_list_node);
}
在同一台物理机上启动优化前后的Redis,分别进行压测,压测命令如下:
redis-benchmark -h host -p port -k 0 -t get -n 100000 -c 8000
其中-k 代表使用短连接进行测试
原生Redis-server结果:
99.74% <= 963 milliseconds
99.78% <= 964 milliseconds
99.84% <= 965 milliseconds
99.90% <= 966 milliseconds
99.92% <= 967 milliseconds
99.94% <= 968 milliseconds
99.99% <= 969 milliseconds
100.00% <= 969 milliseconds
10065.42 requests per second
优化后Redis-server结果
99.69% <= 422 milliseconds
99.72% <= 423 milliseconds
99.80% <= 424 milliseconds
99.82% <= 425 milliseconds
99.86% <= 426 milliseconds
99.89% <= 427 milliseconds
99.94% <= 428 milliseconds
99.96% <= 429 milliseconds
99.97% <= 430 milliseconds
100.00% <= 431 milliseconds
13823.61 requests per second
我们可以看到优化之后的Redis-server性能在短连接多的场景下提升了百分30%以上。