带着这些疑问,首先对man page中的SSL_read的定义做了了解。
SSL_read(3SSL) OpenSSL SSL_read(3SSL)
NAME
SSL_read - read bytes from a TLS/SSL connection.
SYNOPSIS
#include
int SSL_read(SSL *ssl, void *buf, int num);
DESCRIPTION
SSL_read() tries to read num bytes from the specified ssl into the buffer buf.
NOTES
If necessary, SSL_read() will negotiate a TLS/SSL session, if not already explicitly performed by SSL_connect(3) or SSL_accept(3). If the peer requests a re-negotiation, it will be
performed transparently during the SSL_read() operation. The behaviour of SSL_read() depends on the underlying BIO.
For the transparent negotiation to succeed, the ssl must have been initialized to client or server mode. This is being done by calling SSL_set_connect_state(3) or SSL_set_accept_state()
before the first call to an SSL_read() or SSL_write(3) function.
SSL_read() works based on the SSL/TLS records. The data are received in records (with a maximum record size of 16kB for SSLv3/TLSv1). Only when a record has been completely received, it can
be processed (decryption and check of integrity). Therefore data that was not retrieved at the last call of SSL_read() can still be buffered inside the SSL layer and will be retrieved on
the next call to SSL_read(). If num is higher than the number of bytes buffered, SSL_read() will return with the bytes buffered. If no more bytes are in the buffer, SSL_read() will trigger
the processing of the next record. Only when the record has been received and processed completely, SSL_read() will return reporting success. At most the contents of the record will be
returned. As the size of an SSL/TLS record may exceed the maximum packet size of the underlying transport (e.g. TCP), it may be necessary to read several packets from the transport layer
before the record is complete and SSL_read() can succeed.
If the underlying BIO is blocking, SSL_read() will only return, once the read operation has been finished or an error occurred, except when a renegotiation take place, in which case a
SSL_ERROR_WANT_READ may occur. This behaviour can be controlled with the SSL_MODE_AUTO_RETRY flag of the SSL_CTX_set_mode(3) call.
If the underlying BIO is non-blocking, SSL_read() will also return when the underlying BIO could not satisfy the needs of SSL_read() to continue the operation. In this case a call to
SSL_get_error(3) with the return value of SSL_read() will yield SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE. As at any time a re-negotiation is possible, a call to SSL_read() can also cause
write operations! The calling process then must repeat the call after taking appropriate action to satisfy the needs of SSL_read(). The action depends on the underlying BIO. When using a
non-blocking socket, nothing is to be done, but select() can be used to check for the required condition. When using a buffering BIO, like a BIO pair, data must be written into or retrieved
out of the BIO before being able to continue.
SSL_pending(3) can be used to find out whether there are buffered bytes available for immediate retrieval. In this case SSL_read() can be called without blocking or actually receiving new
data from the underlying socket.
仔细阅读第三段可以了解到SSL_read的工作机制,SSL_read()函数基于ssl/tls记录实现,数据被接收到记录中,在SSLv3和TLSv1协议中,单个记录的大小最大为16kB,只有当一条记录被完整的读取之后,才能够被解析(包括解密和鉴权)。所以ssl层可能缓冲一些数据,每次SSL_read()读取的字节数可能不是一个record中的全部数据,但SSL_read从socket读取是以record为最小单位读取的,这些上一个record中未读取的数据会在未被这次SSL_read读取时,缓存在ssl层,等待下一次SSL_read()调用时读取。
如果num(即预期要读取的大小)大于实际已经缓存的字节大小,SSL_read()会返回实际已经缓存的字节大小。如果缓存中已经没有数据,则SSL_read()会触发读tcp流并生成下一个record。只有当一个record记录被完整的读取和解密鉴权处理之后,SSL_read() 才会返回成功。因为单个SSL/TLS record记录的大小可能会超过更底层协议(比如TCP)的单个包的最大大小,故而为了保持一条record记录的完整性和SSL_read()能成功返回,可能会需要从传输层读取多个包来保证。
如果底层的io套接字为阻塞套接字,那SSL_read只有在一次完整的读操作完成或者异常时,才会返回,除非发生了重协商,就会出现返回SSL_ERROR_WANT_READ的错误。不过这个行为也可以通过调用SSL_CTX_set_mode(3)接口设置 SSL_MODE_AUTO_RETRY 标记来控制。
如果底层io是非阻塞套接字,SSL_read()也会在底层BIO 不能满足SSL_read()的需求时返回。在这种情况下,通过SSL_get_error(3)调用 SSL_read()的返回值就会返回 SSL_ERROR_WANT_READ 或SSL_ERROR_WANT_WRITE。在任何时候都可能出现重协商,故而这个时候调用SSL_read() 可能会导致写操作,因为需要发送协商内容。在采取合适措施满足SSL_read()的执行条件后,调用过程需要被重复执行,具体措施取决于底层的BIO类型,在使用非阻塞套接字时,不需要做额外操作,不过select()可以用来检测合适满足条件(即SSL_read()的底层套接字可读)。当使用带缓冲区的IO时,比如BIO pair,后续操作之前,数据必须要被写入或者从BIO中读取。
从上述说明可以知道,理论上ssl返回的数据包并不是流式的,而是封包式的,客户端发包时调用对应sdk的ssl接口发送的包,在接收时会以完整的一条记录形式抛出,但上层的读确实可以按照流式的方式来读取。
单个SSL record的数据如下图所示:
该SSL包的类型为23,即数据类型,协议为TLS1.2,内容为8274字节,由7个TCP包组包而成,构成一个完整的SSL包。从序号2942可以看到,该tcp包为1506字节,但对于这个ssl包而言,这个包有效的内容只有669字节,可以知道这个tcp包同时包含了上一个SSL包的一部分和下一个SSL包的一部分,可见TCP包确实是完全流式传输的。
从上可知,一般而言,SSL_read读取到的一般会是一个完整的record,因为本身已经经过了一层ssl的协议处理,而不会像tcp一样,recv读取的内容纯粹是一个流。
参考openssl-1.0.2l的源码,梳理SSL_read的接口调用流程如下:
SSL_read的代码实现如下:
int SSL_read(SSL *s, void *buf, int num)
{
if (s->handshake_func == 0) {
SSLerr(SSL_F_SSL_READ, SSL_R_UNINITIALIZED);
return -1;
}
if (s->shutdown & SSL_RECEIVED_SHUTDOWN) {
s->rwstate = SSL_NOTHING;
return (0);
}
return (s->method->ssl_read(s, buf, num));
}
可知ssl_read为method结构体成员的函数回调指针。SSL为外部初始化的入参,由SSL_new 接口初始化,定义如下
SSL *SSL_new(SSL_CTX *ctx);
SSL_new的实现在ssl_lib.c中,其method结构体成员的初始化方式如下:
SSL *SSL_new(SSL_CTX *ctx)
{
SSL *s;
if (ctx == NULL) {
SSLerr(SSL_F_SSL_NEW, SSL_R_NULL_SSL_CTX);
return (NULL);
}
if (ctx->method == NULL) {
SSLerr(SSL_F_SSL_NEW, SSL_R_SSL_CTX_HAS_NO_DEFAULT_SSL_VERSION);
return (NULL);
}
...
s->method = ctx->method;
...
}
即由SSL_CTX中的method定义。SSL_CTX一般由如下形式定义,在外部初始化时定义,
pSslCtx = SSL_CTX_new(TLSv1_2_client_method());
其定义也位于ssl_lib.c中
SSL_CTX *SSL_CTX_new(const SSL_METHOD *meth)
{
SSL_CTX *ret = NULL;
if (meth == NULL) {
SSLerr(SSL_F_SSL_CTX_NEW, SSL_R_NULL_SSL_METHOD_PASSED);
return (NULL);
}
#ifdef OPENSSL_FIPS
if (FIPS_mode() && (meth->version < TLS1_VERSION)) {
SSLerr(SSL_F_SSL_CTX_NEW, SSL_R_ONLY_TLS_ALLOWED_IN_FIPS_MODE);
return NULL;
}
#endif
if (SSL_get_ex_data_X509_STORE_CTX_idx() < 0) {
SSLerr(SSL_F_SSL_CTX_NEW, SSL_R_X509_VERIFICATION_SETUP_PROBLEMS);
goto err;
}
ret = (SSL_CTX *)OPENSSL_malloc(sizeof(SSL_CTX));
if (ret == NULL)
goto err;
memset(ret, 0, sizeof(SSL_CTX));
ret->method = meth;
...
}
可知method定义来自 TLSv1_2_client_method() 即根据具体的协议类型来定义。
Source insight无法直接找到TLSv1_2_client_method函数的定义,搜索全局可以发现TLSv1_2_client_method函数由IMPLEMENT_tls_meth_func宏定义定义,在t1_clnt.h中
static const SSL_METHOD *tls1_get_client_method(int ver);
static const SSL_METHOD *tls1_get_client_method(int ver)
{
if (ver == TLS1_2_VERSION)
return TLSv1_2_client_method();
if (ver == TLS1_1_VERSION)
return TLSv1_1_client_method();
if (ver == TLS1_VERSION)
return TLSv1_client_method();
return NULL;
}
IMPLEMENT_tls_meth_func(TLS1_2_VERSION, TLSv1_2_client_method,
ssl_undefined_function,
ssl3_connect,
tls1_get_client_method, TLSv1_2_enc_data)
通过查看IMPLEMENT_tls_meth_func的定义可知,对照SSL_METHOD 方法的定义SSL_read中调用的 s->method->ssl_read函数为 ssl3_read函数
# define IMPLEMENT_tls_meth_func(version, func_name, s_accept, s_connect, \
s_get_meth, enc_data) \
const SSL_METHOD *func_name(void) \
{ \
static const SSL_METHOD func_name##_data= { \
version, \
tls1_new, \
tls1_clear, \
tls1_free, \
s_accept, \
s_connect, \
ssl3_read, \
ssl3_peek, \
ssl3_write, \
ssl3_shutdown, \
ssl3_renegotiate, \
ssl3_renegotiate_check, \
ssl3_get_message, \
ssl3_read_bytes, \
ssl3_write_bytes, \
ssl3_dispatch_alert, \
ssl3_ctrl, \
ssl3_ctx_ctrl, \
ssl3_get_cipher_by_char, \
ssl3_put_cipher_by_char, \
ssl3_pending, \
ssl3_num_ciphers, \
ssl3_get_cipher, \
s_get_meth, \
tls1_default_timeout, \
&enc_data, \
ssl_undefined_void_function, \
ssl3_callback_ctrl, \
ssl3_ctx_callback_ctrl, \
}; \
return &func_name##_data; \
}
针对ssl3_read函数的调用流程梳理如下
上述代码流程的处理核心流程在两个环节。
一是ssl3_read_bytes,如果record中没有数据,才会触发从record中读取数据,此处的rr即为SSL *s中存储的最近一次的record的内容和偏移量
int ssl3_read_bytes(SSL *s, int type, unsigned char *buf, int len, int peek)
{
int al, i, j, ret;
unsigned int n;
SSL3_RECORD *rr;
void (*cb) (const SSL *ssl, int type2, int val) = NULL;
...
start:
s->rwstate = SSL_NOTHING;
/*-
* s->s3->rrec.type - is the type of record
* s->s3->rrec.data, - data
* s->s3->rrec.off, - offset into 'data' for next read
* s->s3->rrec.length, - number of bytes.
*/
rr = &(s->s3->rrec);
/* get new packet if necessary */
if ((rr->length == 0) || (s->rstate == SSL_ST_READ_BODY)) {
ret = ssl3_get_record(s);
if (ret <= 0)
return (ret);
}
...
}
然后会从record中拷贝数据,如果预期读取数据n大于record的长度,则只会读取record中的长度,如果预期读取的数据n不到record的长度,则会读取n字节,然后调整record记录中已经读取的偏移off。下一次会从off处开始读取。
if (type == rr->type) { /* SSL3_RT_APPLICATION_DATA or
* SSL3_RT_HANDSHAKE */
/*
* make sure that we are not getting application data when we are
* doing a handshake for the first time
*/
if (SSL_in_init(s) && (type == SSL3_RT_APPLICATION_DATA) &&
(s->enc_read_ctx == NULL)) {
al = SSL_AD_UNEXPECTED_MESSAGE;
SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_APP_DATA_IN_HANDSHAKE);
goto f_err;
}
if (len <= 0)
return (len);
if ((unsigned int)len > rr->length)
n = rr->length;
else
n = (unsigned int)len;
memcpy(buf, &(rr->data[rr->off]), n);
if (!peek) {
rr->length -= n;
rr->off += n;
if (rr->length == 0) {
s->rstate = SSL_ST_READ_HEADER;
rr->off = 0;
if (s->mode & SSL_MODE_RELEASE_BUFFERS
&& s->s3->rbuf.left == 0)
ssl3_release_read_buffer(s);
}
}
return (n);
}
二是ssl3_get_record,该函数从socket中读取实际的数据,并执行解密和鉴权的操作,然后得到一条完整的record数据,从代码流程得知一次至多只会读取一条record记录。
根据上文对openssl中SSL_read源码的分析,可以看到代码实现的逻辑与我们观察到的现象一致。如果record中还存在数据,那本次的SSL_read函数调用的读取只会从record中剩余的数据中获取,而不会重新读取一条新的record,只有等到下一次读取发现record缓存中的结果为0时,才会解析下一条ssl record的内容。