redis源码分析1---结构体---简单动态字符串sds
redis的底层数据结构主要有简单动态字符串,链表,字典,跳跃表,整数集合,压缩列表,对象等组成。
这些数据结构的实现直接影响redis的表现效果,所以第一部分我先打算从这几个角度来对redis的底层数据结构
从源码上进行分析,具体的实现等。
1 SDS定义
redis大量的数据表现都是以字符串的形式。redis中使用了自己定义的字符串结构,我们先从整体上理解这一部分是怎么样实现的。
首先sds的声明如下
SDS中声明了长度,剩余空间,异界用于保存字符串的数组;
举例说明
2 SDS和C字符串的区别
既然在结构体SDS增加了两个属性,那么区别就很明显了
区别1)常数复杂度获取字符串长度
区别2)杜绝缓冲区溢出
SDS在在数据处理之前会先检查空间是否足够,不够再分配;
区别3)减少修改字符串时带来的内存重新分配
这个点我们一般都不打能做好,涉及到内存的分配就容易出现这样的问题,为什么会遇到这样的问题?
因为:
②惰性空间释放
顾名思义,就是晚一点释放;具体的做法是当SDS的API需要缩短SDS保存的字符串时
,程序不立即使用内存重分配来回收缩短后多出来的字节,而是使用free属性将这些自己的数量记录下来
并等到将来使用;
区别4)二进制安全
就是能够保存除了文本以外的数据,比如图片,音频,视频等。
总的来说
4.2 初始化
/* * 根据给定的初始化字符串 init 和字符串长度 initlen * 创建一个新的 sds * * 参数 * init :初始化字符串指针 * initlen :初始化字符串的长度 * * 返回值 * sds :创建成功返回 sdshdr 相对应的 sds * 创建失败返回 NULL * * 复杂度 * T = O(N) */ sds sdsnewlen(const void *init, size_t initlen) { struct sdshdr *sh; // 根据是否有初始化内容,选择适当的内存分配方式 // T = O(N) if (init) { // zmalloc 不初始化所分配的内存 sh = zmalloc(sizeof(struct sdshdr)+initlen+1); } else { // zcalloc 将分配的内存全部初始化为 0 sh = zcalloc(sizeof(struct sdshdr)+initlen+1); } // 内存分配失败,返回 if (sh == NULL) return NULL; // 设置初始化长度 sh->len = initlen; // 新 sds 不预留任何空间 sh->free = 0; // 如果有指定初始化内容,将它们复制到 sdshdr 的 buf 中 // T = O(N) if (initlen && init) memcpy(sh->buf, init, initlen); // 以 \0 结尾 sh->buf[initlen] = '\0'; // 返回 buf 部分,而不是整个 sdshdr return (char*)sh->buf; } 那么可以使用下面方式调用:mystring = sdsnewlen("abc",3"); 还有另一种没有给定长度的初始化 sds sdsnew(const char *init) { size_t initlen = (init == NULL) ? 0 : strlen(init); return sdsnewlen(init, initlen); }
复制一个字符串,释放一个字符串的空间如下
sds sdsdup(const sds s) { return sdsnewlen(s, sdslen(s)); } /* * 释放给定的 sds * * 复杂度 * T = O(N) */ /* Free an sds string. No operation is performed if 's' is NULL. */ void sdsfree(sds s) { if (s == NULL) return; zfree(s-sizeof(struct sdshdr)); }
注意这里面使用到了zfree,zmalloc,zrealloc内存分配函数,我将在后面详细说明,在这里只需要理解和C语言中free。malloc等函数有类似的功能就行了。
更新长度,主要是需要修改free的长度和可用空间的长度。
4.3 空间分配和回收
* 对 sds 中 buf 的长度进行扩展,确保在函数执行之后,
* buf 至少会有 addlen + 1 长度的空余空间
* (额外的 1 字节是为 \0 准备的)
*
* 返回值
* sds :扩展成功返回扩展后的 sds
* 扩展失败返回 NULL
*
* 复杂度
* T = O(N)
/* * 回收 sds 中的空闲空间, * 回收不会对 sds 中保存的字符串内容做任何修改。 * * 返回值 * sds :内存调整后的 sds * * 复杂度 * T = O(N) */ /* Reallocate the sds string so that it has no free space at the end. The * contained string remains not altered, but next concatenation operations * will require a reallocation. * * After the call, the passed sds string is no longer valid and all the * references must be substituted with the new pointer returned by the call. */ sds sdsRemoveFreeSpace(sds s) { struct sdshdr *sh; sh = (void*) (s-(sizeof(struct sdshdr))); // 进行内存重分配,让 buf 的长度仅仅足够保存字符串内容 // T = O(N) sh = zrealloc(sh, sizeof(struct sdshdr)+sh->len+1); // 空余空间为 0 sh->free = 0; return sh->buf; } 我在前面说过相对c类型字符串,在分配空间会存在差异,具体实现是如下的 根据 incr 参数,增加 sds 的长度,缩减空余空间, * 并将 \0 放到新字符串的尾端 * * This function is used in order to fix the string length after the * user calls sdsMakeRoomFor(), writes something after the end of * the current string, and finally needs to set the new length. * * 这个函数是在调用 sdsMakeRoomFor() 对字符串进行扩展, * 然后用户在字符串尾部写入了某些内容之后, * 用来正确更新 free 和 len 属性的。 * * Note: it is possible to use a negative increment in order to * right-trim the string. * * 如果 incr 参数为负数,那么对字符串进行右截断操作。 * * Usage example: * * Using sdsIncrLen() and sdsMakeRoomFor() it is possible to mount the * following schema, to cat bytes coming from the kernel to the end of an * sds string without copying into an intermediate buffer: * * 以下是 sdsIncrLen 的用例: * * oldlen = sdslen(s); * s = sdsMakeRoomFor(s, BUFFER_SIZE); * nread = read(fd, s+oldlen, BUFFER_SIZE); * ... check for nread <= 0 and handle it ... * sdsIncrLen(s, nread); * * 复杂度 * T = O(1)
4.4 字符串的转换操作
根据不同的类型,转换成字符串
#define SDS_LLSTR_SIZE 21 int sdsll2str(char *s, long long value) { char *p, aux; unsigned long long v; size_t l; /* Generate the string representation, this method produces * an reversed string. */ v = (value < 0) ? -value : value; p = s; do { *p++ = '0'+(v%10); v /= 10; } while(v); if (value < 0) *p++ = '-'; /* Compute length and add null term. */ l = p-s; *p = '\0'; /* Reverse the string. */ p--; while(s < p) { aux = *s; *s = *p; *p = aux; s++; p--; } return l; } /* Identical sdsll2str(), but for unsigned long long type. */ int sdsull2str(char *s, unsigned long long v) { char *p, aux; size_t l; /* Generate the string representation, this method produces * an reversed string. */ p = s; do { *p++ = '0'+(v%10); v /= 10; } while(v); /* Compute length and add null term. */ l = p-s; *p = '\0'; /* Reverse the string. */ p--; while(s < p) { aux = *s; *s = *p; *p = aux; s++; p--; } return l; } /* Create an sds string from a long long value. It is much faster than: * * sdscatprintf(sdsempty(),"%lld\n", value); */ // 根据输入的 long long 值 value ,创建一个 SDS sds sdsfromlonglong(long long value) { char buf[SDS_LLSTR_SIZE]; int len = sdsll2str(buf,value); return sdsnewlen(buf,len); } /* * 打印函数,被 sdscatprintf 所调用 * * T = O(N^2) */ /* Like sdscatpritf() but gets va_list instead of being variadic. */ sds sdscatvprintf(sds s, const char *fmt, va_list ap) { va_list cpy; char staticbuf[1024], *buf = staticbuf, *t; size_t buflen = strlen(fmt)*2; /* We try to start using a static buffer for speed. * If not possible we revert to heap allocation. */ if (buflen > sizeof(staticbuf)) { buf = zmalloc(buflen); if (buf == NULL) return NULL; } else { buflen = sizeof(staticbuf); } /* Try with buffers two times bigger every time we fail to * fit the string in the current buffer size. */ while(1) { buf[buflen-2] = '\0'; va_copy(cpy,ap); // T = O(N) vsnprintf(buf, buflen, fmt, cpy); if (buf[buflen-2] != '\0') { if (buf != staticbuf) zfree(buf); buflen *= 2; buf = zmalloc(buflen); if (buf == NULL) return NULL; continue; } break; } /* Finally concat the obtained string to the SDS string and return it. */ t = sdscat(s, buf); if (buf != staticbuf) zfree(buf); return t; } /* * 打印任意数量个字符串,并将这些字符串追加到给定 sds 的末尾 * * T = O(N^2) */ /* Append to the sds string 's' a string obtained using printf-alike format * specifier. * * After the call, the modified sds string is no longer valid and all the * references must be substituted with the new pointer returned by the call. * * Example: * * s = sdsempty("Sum is: "); * s = sdscatprintf(s,"%d+%d = %d",a,b,a+b). * * Often you need to create a string from scratch with the printf-alike * format. When this is the need, just use sdsempty() as the target string: * * s = sdscatprintf(sdsempty(), "... your format ...", args); */ sds sdscatprintf(sds s, const char *fmt, ...) { va_list ap; char *t; va_start(ap, fmt); // T = O(N^2) t = sdscatvprintf(s,fmt,ap); va_end(ap); return t; } /* This function is similar to sdscatprintf, but much faster as it does * not rely on sprintf() family functions implemented by the libc that * are often very slow. Moreover directly handling the sds string as * new data is concatenated provides a performance improvement. * * However this function only handles an incompatible subset of printf-alike * format specifiers: * * %s - C String * %S - SDS string * %i - signed int * %I - 64 bit signed integer (long long, int64_t) * %u - unsigned int * %U - 64 bit unsigned integer (unsigned long long, uint64_t) * %% - Verbatim "%" character. */ sds sdscatfmt(sds s, char const *fmt, ...) { struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr))); size_t initlen = sdslen(s); const char *f = fmt; int i; va_list ap; va_start(ap,fmt); f = fmt; /* Next format specifier byte to process. */ i = initlen; /* Position of the next byte to write to dest str. */ while(*f) { char next, *str; size_t l; long long num; unsigned long long unum; /* Make sure there is always space for at least 1 char. */ if (sh->free == 0) { s = sdsMakeRoomFor(s,1); sh = (void*) (s-(sizeof(struct sdshdr))); } switch(*f) { case '%': next = *(f+1); f++; switch(next) { case 's': case 'S': str = va_arg(ap,char*); l = (next == 's') ? strlen(str) : sdslen(str); if (sh->free < l) { s = sdsMakeRoomFor(s,l); sh = (void*) (s-(sizeof(struct sdshdr))); } memcpy(s+i,str,l); sh->len += l; sh->free -= l; i += l; break; case 'i': case 'I': if (next == 'i') num = va_arg(ap,int); else num = va_arg(ap,long long); { char buf[SDS_LLSTR_SIZE]; l = sdsll2str(buf,num); if (sh->free < l) { s = sdsMakeRoomFor(s,l); sh = (void*) (s-(sizeof(struct sdshdr))); } memcpy(s+i,buf,l); sh->len += l; sh->free -= l; i += l; } break; case 'u': case 'U': if (next == 'u') unum = va_arg(ap,unsigned int); else unum = va_arg(ap,unsigned long long); { char buf[SDS_LLSTR_SIZE]; l = sdsull2str(buf,unum); if (sh->free < l) { s = sdsMakeRoomFor(s,l); sh = (void*) (s-(sizeof(struct sdshdr))); } memcpy(s+i,buf,l); sh->len += l; sh->free -= l; i += l; } break; default: /* Handle %% and generally %. */ s[i++] = next; sh->len += 1; sh->free -= 1; break; } break; default: s[i++] = *f; sh->len += 1; sh->free -= 1; break; } f++; } va_end(ap); /* Add null-term */ s[i] = '\0'; return s; } /* * 对 sds 左右两端进行修剪,清除其中 cset 指定的所有字符 * * 比如 sdsstrim(xxyyabcyyxy, "xy") 将返回 "abc" * * 复杂性: * T = O(M*N),M 为 SDS 长度, N 为 cset 长度。 */ /* Remove the part of the string from left and from right composed just of * contiguous characters found in 'cset', that is a null terminted C string. * * After the call, the modified sds string is no longer valid and all the * references must be substituted with the new pointer returned by the call. * * Example: * * s = sdsnew("AA...AA.a.aa.aHelloWorld :::"); * s = sdstrim(s,"A. :"); * printf("%s\n", s); * * Output will be just "Hello World". */ sds sdstrim(sds s, const char *cset) { struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr))); char *start, *end, *sp, *ep; size_t len; // 设置和记录指针 sp = start = s; ep = end = s+sdslen(s)-1; // 修剪, T = O(N^2) while(sp <= end && strchr(cset, *sp)) sp++; while(ep > start && strchr(cset, *ep)) ep--; // 计算 trim 完毕之后剩余的字符串长度 len = (sp > ep) ? 0 : ((ep-sp)+1); // 如果有需要,前移字符串内容 // T = O(N) if (sh->buf != sp) memmove(sh->buf, sp, len); // 添加终结符 sh->buf[len] = '\0'; // 更新属性 sh->free = sh->free+(sh->len-len); sh->len = len; // 返回修剪后的 sds return s; } /* * 按索引对截取 sds 字符串的其中一段 * start 和 end 都是闭区间(包含在内) * * 索引从 0 开始,最大为 sdslen(s) - 1 * 索引可以是负数, sdslen(s) - 1 == -1 * * 复杂度 * T = O(N) */ /* Turn the string into a smaller (or equal) string containing only the * substring specified by the 'start' and 'end' indexes. * * start and end can be negative, where -1 means the last character of the * string, -2 the penultimate character, and so forth. * * The interval is inclusive, so the start and end characters will be part * of the resulting string. * * The string is modified in-place. * * Example: * * s = sdsnew("Hello World"); * sdsrange(s,1,-1); => "ello World" */ void sdsrange(sds s, int start, int end) { struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr))); size_t newlen, len = sdslen(s); if (len == 0) return; if (start < 0) { start = len+start; if (start < 0) start = 0; } if (end < 0) { end = len+end; if (end < 0) end = 0; } newlen = (start > end) ? 0 : (end-start)+1; if (newlen != 0) { if (start >= (signed)len) { newlen = 0; } else if (end >= (signed)len) { end = len-1; newlen = (start > end) ? 0 : (end-start)+1; } } else { start = 0; } // 如果有需要,对字符串进行移动 // T = O(N) if (start && newlen) memmove(sh->buf, sh->buf+start, newlen); // 添加终结符 sh->buf[newlen] = 0; // 更新属性 sh->free = sh->free+(sh->len-newlen); sh->len = newlen; } /* * 将 sds 字符串中的所有字符转换为小写 * * T = O(N) */ /* Apply tolower() to every character of the sds string 's'. */ void sdstolower(sds s) { int len = sdslen(s), j; for (j = 0; j < len; j++) s[j] = tolower(s[j]); } /* * 将 sds 字符串中的所有字符转换为大写 * * T = O(N) */ /* Apply toupper() to every character of the sds string 's'. */ void sdstoupper(sds s) { int len = sdslen(s), j; for (j = 0; j < len; j++) s[j] = toupper(s[j]); }
当然还有少数部分我并没有弄上来。包括最后还有一个sds的测试函数 ,大家可以看源代码好好分析;
对整个源代码的风格可以好好体会。我也下学习中,共勉!