socket Tcp编程笔记--心跳包的实现逻辑

项目原版本使用的是boost实现的客户端访问服务端的socket,开始只是知道使用到了心跳包来保活,具体心跳包如何实现的,第一步明白的是隔10秒钟会发送给服务端心跳包,服务端隔10秒钟也会发给客户端心跳包,但是后来又有了疑问,那服务端和客户端是否需要计数的机制呢?比如说客户端在3次没有收到服务端的心跳包之后,认为服务端断开;而服务端在3次没有收到客户端的心跳包之后,认为客户端断开?是否需要此类的计数呢?

左看右看程序,没有发现类似的计数,只是发现有判断是否当前有需要发送的数据或者正在发送数据的状态,如果没有菜发送心跳包,开始以为是因为自己对boost.ASio的内部实现没有完全了解的缘故,猜想可能Boost.ASio实现的牛B吧,这些计数可能有实现了?于是在加断点调试进Boost.ASIO的源代码中进行查看,发现Boost.ASio的内部只不过也是IOCP的原理罢了,也并没有找到相关计数的代码。怎么回事呢?后来自己郁闷了一段时间之后,看拔掉网线之后判断掉线的代码会断点到哪里,发现是在

socket_.async_read_some(boost::asio::buffer(buffer_),
boost::bind(&CTcpClient::handle_read, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));

之后的const boost::system::error_code& error 中error有错误发生,之后才调用的相关的掉线的逻辑代码,难道是async_read_some有超时的机制吗?后来就查询async_read_some的timeout也没有什么心跳包相关的有价值的资料,同样看他的源码实现也没有用于断网的timeout的默认配置,而是需要手动加dealinetime的定时器,那么是怎么回事呢?还是不太明白

后来继续郁闷了一段时间之后,想到,既然有10秒钟发送心跳包的实现逻辑,那么我把心跳吧发送的给停掉会怎么样呢?还会有async_read_some返回出错的现象吗?于是动手注释掉10秒钟的心跳包来调试程序,发现拔掉网线之后,不再出现断网的程序判断了,会一直认为是在线状态,现在又开始猜测难道是只有间隔一定时间不断的write数据之后,async_read_some才会读取到相关的网络断开的消息吗?经过周一上班后查看使用别的简版的socket的库之后,验证果然是这样,于是结论就出来了,如下:


必须不断的write数据,才能在recv中读取到返回值为0的网络断开的数据,否则recv会一直在那里读取着,不返回。这可能就是socket的机制吧,也就是我对socket的编程机制还不是完全熟悉的缘故造成的,以后准备恶补一下socket/tcp/ip相关的机制方面的资料或者书籍,声明一下,我们项目中使用的都是非阻塞式的socket。


后来通过查资料得知,通过recv返回值为0来判断对方网络断开貌似是有漏洞的,有的情况下还是判断不出来连接是否真的已经断开,最安全的做法还得是有计数或者计时的机制,如下是资料:(注意红色标注部分)

TCP is meant to be robust in the face of a harsh network; 
TCP注定是在混乱的网络环境中能够保持健壮性,这个是因为TCP就是那么设计的。


even though TCP provides what looks like a persistent end-to-end connection, it's all just a lie, each packet is really just a unique, unreliable datagram.
尽管TCP在表面看来是提供了一个持续的点对点的连接,但那仅仅是一个假象。每个数据包都是一个独立不可靠的数据报。


The connections are really just virtual conduits created with a little state tracked at each end of the connection (Source and destination ports and addresses, and local socket). The network stack uses this state to know which process to give each incoming packet to and what state to put in the header of each outgoing packet.
所谓连接,只是一个由连接的每个端点保存的一些状态构成的虚拟的管道。网络栈通过这些状态,知道把传入的数据包传给那个进程,也知道把什么状态放到发出的数据包的包头。


Because of the underlying — inherently connectionless and unreliable — nature of the network, the stack will only report a severed connection when the remote end sends a FIN packet to close the connection, or if it doesn't receive an ACK response to a sent packet (after a timeout and a couple retries).


Because of the asynchronous nature of asio, the easiest way to be notified of a graceful disconnection is to have an outstanding async_read which will returnerror::eof immediately when the connection is closed. But this alone still leaves the possibility of other issues like half-open connections and network issues going undetected.
由于asio的异步的天性,最容易发现断开连接的方法是在 async_read 里面收到 error::eof。

前提是得时不时的async_write_some心跳包数据才行,否则根本拔掉网线之后根本捕获不到async_read_some的返回值,或者说是async_read_some会一直在那里读取数据不返回


The most effectively way to work around unexpected connection interruption is to use some sort of keep-alive or ping. This occasional attempt to transfer data over the connection will allow expedient detection of an unintentionally severed connection.


The TCP protocol actually has a built-in keep-alive mechanism which can be configured in asio using asio::tcp::socket::keep_alive. The nice thing about TCP keep-alive is that it's transparent to the user-mode application, and only the peers interested in keep-alive need configure it. The downside is that you need OS level access/knowledge to configure the timeout parameters, they're unfortunately not exposed via a simple socket option and usually have default timeout values that are quite large (7200 seconds on Linux).
Probably the most common method of keep-alive is to implement it at the application layer, where the application has a special noop or ping message and does nothing but respond when tickled. This method gives you the most flexibility in implementing a keep-alive strategy.


你可能感兴趣的:(socket Tcp编程笔记--心跳包的实现逻辑)