百万长连接并发的限制因素

目录
  • 1. 一个TCP连接默认占内存大小
  • 2. 如何调整
  • 附录一:报错信息
  • 附录二:查询收发滑动窗口大小的代码(前文用到的gtop)
  • 附录三:参考网址

百万长连接并发的限制因素
(1) CPU:使用top,然后按1查看,如果有逻辑CPU跑到100%,那就是受限了。多线程或线程绑定CPU都可以;
(2) 内存:本文主要讨论内存限制;

1. 一个TCP连接默认占内存大小

针对长连接来讲,监听和connect的过程是几乎不消耗内存的。内存主要消耗在滑动窗口的读写缓存上。
使用附录中的代码,可以看到默认一个TCP连接占用的内存大小有多大:

[root@localhost test]# ./gtop 
recv_buf = 85k
send_buf = 16k

可以看到一个TCP连接,默认占用的内存大小=85k+16k=101KB,那么我们可以计算了32GB内存,按照内存使用率70%来算,应该能支持的
稳定长连接数=32GB*70%/101KB=232,555≈23万
PS:因为connect并不耗内存,而内存有可能是会被共享的,所以最初的读写并不是每个TCP连接一下就分配了101KB的内存,而是最终会分配这么多。socket读写越频繁越大,内存降的越快。

2. 如何调整

如果根据自己的业务,我们可以调整这个缓冲区大小。
比如,我们的场景是简历长连接保持会话,每个报文1KB左右,1min发一次,显然用不到这么大的发送和接收缓冲区。
可以从程序中设置:(建议从程序中设置)

//c++代码
int nRecvBuf=16*1024; 
setsockopt(s,SOL_SOCKET,SO_RCVBUF,(const char*)&nRecvBuf,sizeof(int)); 
int nSendBuf=32*1024;
setsockopt(s,SOL_SOCKET,SO_SNDBUF,(const char*)&nSendBuf,sizeof(int));

或者在系统参数中设置:/etc/sysctl.conf(比较不建议这样设置)

#/etc/sysctl.conf
net.ipv4.tcp_rmem = 4096        87380   2063281
net.ipv4.tcp_wmem = 4096        16384   2063281
net.core.wmem_default = 388608
net.core.rmem_default = 388608
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

真正起作用的是87380、16384两个参数所在的位置。设置之后用sysctl -p同步

[root@localhost test]# sysctl -p

同步之后,可以用刚才那个gtop再取一下参数看是否生效了。

附录一:报错信息

未调内存之前压力测试30万的时候,发现内存一点点减少,但是使用ps和top等工具并没有查看到任何耗费内存比较多的进程或线程。
当系统内存低于600MB时,系统宕机(并非真的死机,ssh连不上,usb串口连不上),报错如下

Centos Linux ( )	
Kernel 3 ,10.0-514.e17.64 an 	
loadtest3 login :[ 16221.310569]Out of memory :Kill process 1183 ( tuned)score or sacrif ice child 	
[ 16221.3406031Killed process 1183 ( tuned)total-vm :562728kB,anon-rss :0kB ,file-rss :0kB ,shmem-rss :0kB 	
[ 16221.357102]of memory :Kill process 926 ( )score or sacrif ice child 	
[ 16221.357130]Killed process 926 ( polkitd)total-vm :538128kB ,anon-rss :0kB ,file-rss :0kB ,shmem-rss :	
[ 16221.3599841of memory :Kill process 932 ( gmain)score or sacrifice child 	
[ 16221.360826]Killed process 932 ( gmain)total-vm :538428kB ,anon-rss :0kB ,file-rss :0kB ,shmem-rss :	
[ 16221.3627391of memory :Kill process 33234 ( dstat)score 0 or sacrifice child 	
[ 16221.362766]Killed process 33234 ( dstat)total-vm :150232kB ,anon-rss :0kB ,file-rss :0kB,shmem-rss 	
[ 16230,318655]Out of memory :process 937 ( NetworkMlanagerscore or sacrif ice child 	
6230.310689 ]Killed process 937 ( NetworkManager)total- :,anon-rss :0kB ,file-rss :,shm 	
6230.311222 ]Out of memory :Kill process 948 ( gdbus)score or sacrifice child 	
16230.311222	
16238.3112501K	
Killed process ( gdbus)total-um :450644kB ,anon-rss :,file-rss :4kB ,shmem-rss :0kB 	
16448.675291	
75291 ]INFO :task /::31 blocked for more than 120 seconds 	
[16448.675316]	
53161 echo > /proc/sys/kernel/hung_task_timeout_secs"disables this messag 	
16448.675401	
[16448.675426	
401 ]INFO :task fsnotify mark :189 blocked for more than seconds 	
than seconds 	
16448.675494	
"echo> /proc/sys/kernel/hungtask_timeoutsecs "disables this message 	
[16448.675518	
4 ]INFO :task kworker /:1 :209 blocked for more than 120 seconds 	
16448.675588	
] "echo0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message 	
16448.675611	
INFO :task gdbus :948 blocked for more than seconds 	
[16448.675789]	
echo 0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message 	
[16448.675734]	
INFO :task kworker /:2 :blocked for more than seconds 	
[16568.679395	
echo > /proc/sys/kerne/hung task timeout secs "disables this messag 	
[16568.679424	
INFO :task kworker /4:0 :blocked for more than 120 seconds 	
[16568.679522	
echo > /proc/sys/kernel/hungtask_timeout_secs"disables this messa 	
16568.679553]	
INFO :task fsnotify mark :189 blocked for more than 120 seconds 	
16568.679639	
echo > /proc/sys/kernel/hungtasktimeout _secs"disables this message 	
[16568.679669	
INFO :task kworker /u96:1 :209 blocked for more than seconds 	
[16568.679760	
echo > /proc/sys//hungtask "disables this message 	
[16568.679788	
INFO :task gdbus :948 blocked for more than 120 seconds 	
[16568.6798931	
echo 0 > /proc/sys/kernel/hungtask_timeout_secs"disables this message 	
[16568.679923	
INFO :task kworker /4:2 :16027 blocked for more than 120 seconds 	
] "echo0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message 

附录二:查询收发滑动窗口大小的代码(前文用到的gtop)

编译命令:g++ main.cpp -o gtop

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
#define MAXLINE  4096 
#define OPEN_MAX  16 
#define SERV_PORT  1555 

const int EPOLL_MAX_FDSIZE = 0x4000;

 
int main()
{
    int i , maxi ,listenfd , connfd , sockfd ,epfd, nfds;
    int n;
    char buf[MAXLINE];
    struct epoll_event ev, events[EPOLL_MAX_FDSIZE];  
    socklen_t clilen;
    struct pollfd client[OPEN_MAX];
 
    struct sockaddr_in cliaddr , servaddr;
    listenfd = socket(AF_INET , SOCK_STREAM , 0);
    memset(&servaddr,0,sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(SERV_PORT);
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	
    int opt_rc_val;
    socklen_t opt_rc_len = sizeof(opt_rc_val);
    int opt_sd_val;
    socklen_t opt_sd_len = sizeof(opt_sd_val);
	
    getsockopt(listenfd,SOL_SOCKET,SO_RCVBUF,&opt_rc_val, &opt_rc_len); 
    getsockopt(listenfd,SOL_SOCKET,SO_SNDBUF,&opt_sd_val, &opt_sd_len);
    
    printf("recv_buf = %dk\n", opt_rc_val / 1024);
    printf("send_buf = %dk\n", opt_sd_val / 1024);
	
    return 0;
}

附录三:参考网址

http://www.net-add.com/devops/sre/cdn/28.html

你可能感兴趣的:(百万长连接并发的限制因素)