为什么80%的码农都做不了架构师?>>>
##一. 字符串的思考 开发者经常需要使用字符串进行操作,相对于 C/C++,很多语言提供了完备的字符串解析类库,正是由于这一点,许多开发者也就偏好这些语言。
大多数时候,C 库的字符串操作函数都是够用的,你需要的功能基本能够实现依靠这些函数实现。
strcat strcmp strcasecmp strtok strdup strcpy strlen strchr strdod ...
一旦涉及到二进制数据,或者一些自定义的功能,标准的字符串操作也是力不从心的。
很多基于 C/C++ 的项目大多会实现自己的字符串解析模块,比如在 nginx 中,ngx_str_t 就是 nginx 的字符串类型:
typedef struct {
size_t len;
u_char *data;
} ngx_str_t;
其中 data 是字符串开始的地址,而 len 是字符串的长度。标准的 C 字符串解析大多依赖于字符串末尾的 '\0' 截断,而 ngx_str_t 并不依赖 '\0' 所以可以存储大量的 '\0' ngx_str_t 在解析 http 包体的时候,几乎可以避免字符串的拷贝,在解析 HTTP 请求时,只需将对应的元素的起始地址以及元素的长度复制给一个 ngx_str_t 变量即可。
在 .Net 的开源项目 coreclr 中,同样的封装了字符串类型 StringBuffer:
class StringBuffer {
wchar_t* m_buffer;
size_t m_capacity;
size_t m_length;
StringBuffer(const StringBuffer&);
StringBuffer& operator =(const StringBuffer&);
public:
StringBuffer() : m_capacity(0), m_buffer(0), m_length(0) {
}
~StringBuffer() {
delete[] m_buffer;
}
const wchar_t* CStr() const {
return m_buffer;
}
void Append(const wchar_t* str, size_t strLen) {
if (!m_buffer) {
m_buffer = new wchar_t[4096];
m_capacity = 4096;
}
if (m_length + strLen + 1 > m_capacity) {
size_t newCapacity = m_capacity * 2;
wchar_t* newBuffer = new wchar_t[newCapacity];
wcsncpy_s(newBuffer, newCapacity, m_buffer, m_length);
delete[] m_buffer;
m_buffer = newBuffer;
m_capacity = newCapacity;
}
wcsncpy_s(m_buffer + m_length, m_capacity - m_length, str, strLen);
m_length += strLen;
}
};
StringBuffer 使用 Append 添加字符串, 当预先分配的空间不足时,便引发扩容,然后将字符串复制到新的内存区域,并释放旧的内存区域,将数据指针指向新的内存地址,修改最大容量。扩容时,大小增长一倍,C++ 的 std::string 也是利用的这一策略,然后将字符串通过字符串拷贝函数strncpy(wcsncpy)函数进行拷贝。 nginx 的动态数组类型 ngx_array_t 的扩容策略也是这样的。不过 ngx_array_t 需要使用 memcpy 进行拷贝。如果字符串中有 '\0' 应该使用 memcpy。
strncpy 与 memcpy 的大致实现如下,memcpy 不关心字符串是否有 '\0' 也不会将未后面的内存清零。
void *memcpy(void *_Dest,const void *src,size_t count){
char *left=(char*) _Dest;
char *right=(char*)src;
int i=0;
for(;i
在做 NGINX 分布式时,早期,我曾经做过一个动态数组的实现,并且运行正常,后来便有了实现C的 StringBuilder的想法。
于是我就写了 StringBuilde, 后来又写了 StringBuffer 如下。 ##二. 纯 C 实现的 StringBuilder StringBuilder.h:
/*
*/
#ifndef SMART_STRING_BUILDER_H
#define SMART_STRING_BUILDER_H
#include
#include
#include
#include
#define STRING_BUILDER_RESIZE_L1 64
#define STRING_BUILDER_RESIZE_L2 128
#define STRING_BUILDER_RESIZE_L3 256
#define STRING_BUILDER_RESIZE_L4 512
#define STRING_BUILDER_RESIZE_L5 1024
#define STRING_BUILDER_RESIZE_L6 2048
#define STRING_BUILDER_RESIZE_L7 4096
#define STRING_BUILDER_RESIZE_L8 8192
#define RESIZE_DEFAULT STRING_BUILDER_RESIZE_L7
#ifdef _WIN32
#define BASECALL __cdecl
#else
#define BASECALL
#endif
#ifdef __cplusplus
extern "C" {
#endif
typedef uint32_t uint_t;
typedef struct StringBuilder {
char *data;
size_t size;
size_t msize;
uint_t resize; /// Resize
} StringBuilder;
/*
Default
StringBuilder stb={NULL,0,0,STRING_BUILDER_RESIZE_L7};
*/
// Alloc Function,
// Full Alloc ,StringBuilder aslo malloc
StringBuilder *StringBuilderNew(size_t resize);
// StringBuilder also exists
bool StringBuilderAlloc(StringBuilder *stb, size_t size);
void StringBuilderRelease(StringBuilder *stb, bool channel);
bool SrringBuilderClean(StringBuilder *stb);
int BASECALL StringBuilderFormat(StringBuilder *stb, const char *format, ...);
/////StringBuilder First insert data;
bool StringBuilderCreate(StringBuilder *stb, char *data, size_t len);
bool StringBuilderCreateConst(StringBuilder *stb, const char *data);
/////StringBuilder Append Data
bool StringBuilderAppend(StringBuilder *stb, char *ap, size_t len);
bool StringBuilderAppendConst(StringBuilder *stb, const char *ap);
/////StringBuilder Put char to stb;
bool StringBuilderPutc(StringBuilder *stb, unsigned char ch);
/*
* StringBuilderDup return a new StringBuilder Pointer,
* Must Free StringBuilderRelease(stb,true);
*/
StringBuilder *StringBuilderDup(const StringBuilder *str);
/*
* StringBuilderMove ,Move origin data to accpet,and release accpet.
*/
bool StringBuilderMove(StringBuilder *origin, StringBuilder *accpet);
bool StringBuilderSwitch(StringBuilder *left, StringBuilder *right);
/*
* Equal two StringBuilder length=>size Data is equal.
*/
bool StringBuilderEqual(StringBuilder *left, StringBuilder *right);
const char *StringBuilderConst(StringBuilder *stb);
/*
* Your should not invoke this function ,normal.
*/
size_t StringBuilderResize(StringBuilder *stb, size_t resize);
#ifdef __cplusplus
}
#endif
#endif
StringBuilder.c:
#include
#include
#include
#include
#include
#include
#include "StringBuilder.h"
StringBuilder *StringBuilderNew(size_t resize) {
StringBuilder *stb = (StringBuilder *)malloc(sizeof(StringBuilder));
if (stb == NULL)
return NULL;
stb->size = 0;
stb->resize = (resize == 0) ? RESIZE_DEFAULT : resize;
stb->data = (char *)malloc(sizeof(char) * resize);
if (stb->data == NULL) {
free(stb);
return NULL;
}
stb->msize = resize;
return stb;
}
bool StringBuilderAlloc(StringBuilder *stb, size_t size) {
assert(stb);
if (stb->data || stb->size > 0 || stb->msize >0) {
return false;
}
if (size == 0) {
size_t m_size = stb->resize > 0 ? stb->resize : RESIZE_DEFAULT;
stb->data = (char *)malloc(sizeof(char) * m_size);
stb->msize = m_size;
} else {
stb->data = (char *)malloc(sizeof(char) * size);
stb->msize = size;
}
return true;
}
void StringBuilderRelease(StringBuilder *stb, bool channel) {
if (stb) {
if (stb->data) {
free(stb->data);
}
if (channel) {
free(stb);
}
}
}
bool SrringBuilderClean(StringBuilder *stb) {
assert(stb);
if (stb->data) {
free(stb->data);
stb->data = NULL;
return true;
}
return false;
}
size_t StringBuilderResize(StringBuilder *stb, size_t resize) {
assert(stb);
size_t mresize;
if (resize == 0) {
mresize = stb->resize;
} else {
mresize = resize;
}
char *data = (char *)malloc(sizeof(char) * mresize);
if (data == NULL) {
return 0;
}
if (stb->size > 0) {
if (memcpy(data, (const char *)(stb->data), stb->size) == NULL) {
free(data);
return 0;
}
free(stb->data);
stb->data = data;
}
stb->msize += mresize;
return stb->msize;
}
static bool StringBuilderInit(StringBuilder *stb, char *data,
size_t len) {
if (memcpy(stb->data, (const char *)data, len)) {
stb->size = len;
return true;
}
return false;
}
bool StringBuilderCreate(StringBuilder *stb, char *data, size_t len) {
assert(stb);
assert(data);
if (len >= stb->msize && stb->msize + stb->resize - 1 > len) {
if (StringBuilderResize(stb, 0) == 0)
return false;
} else if (stb->msize + stb->resize <= len) {
if (StringBuilderResize(stb, len % 2 ? (len + 1) : len) == 0)
return false;
}
return StringBuilderInit(stb, data, len);
}
bool StringBuilderCreateConst(StringBuilder *stb, const char *data) {
assert(stb);
assert(data);
size_t len = strlen(data);
return StringBuilderCreate(stb, (char *)data, len);
}
bool StringBuilderAppend(StringBuilder *stb, char *ap, size_t len) {
size_t space = stb->resize + stb->msize - stb->size;
if (len >= space) {
if (StringBuilderResize(stb, len % 2 ? (len + 1) : len) == 0) {
return false;
}
} else if (len < space && len >= stb->msize - stb->size) {
if (StringBuilderResize(stb, 0) == 0) {
return false;
}
}
char *index = &(stb->data[stb->size]);
memcpy(index, (const char *)ap, len);
stb->size+=len;
return true;
}
bool StringBuilderAppendConst(StringBuilder *stb, const char *ap) {
if (!ap)
return false;
size_t len = strlen(ap);
return StringBuilderAppend(stb, (char *)ap, len);
}
bool StringBuilderPutc(StringBuilder *stb, unsigned char ch) {
assert(stb);
if (stb->size + 1 >= stb->msize) {
if (StringBuilderResize(stb, 0) == 0) {
return false;
}
}
stb->data[stb->size] = (char)ch;
stb->size++;
return true;
}
StringBuilder *StringBuilderDup(const StringBuilder *str) {
StringBuilder *stb2 = (StringBuilder *)malloc(sizeof(StringBuilder));
stb2->size = str->size;
stb2->msize = str->msize;
stb2->resize = str->resize;
stb2->data = (char *)malloc(sizeof(char) * str->msize);
if (stb2->data == NULL) {
free(stb2);
return NULL;
}
memcpy(stb2->data, (const char *)str->data, str->size);
return stb2;
}
/*
* StringBuilderMove ,Move origin data to accpet,and release accpet.
*/
bool StringBuilderMove(StringBuilder *origin, StringBuilder *accpet) {
assert(origin);
assert(accpet);
StringBuilderRelease(accpet, false);
accpet->data = origin->data;
accpet->size = origin->size;
accpet->msize = origin->msize;
accpet->resize = origin->resize;
memset(origin, 0, sizeof(StringBuilder));
return accpet->data != NULL;
}
bool StringBuilderSwitch(StringBuilder *left, StringBuilder *right) {
assert(left);
assert(right);
if (left == right) {
return false;
}
StringBuilder stmp = {NULL, 0, 0, 0};
memcpy(&stmp, left, sizeof(StringBuilder));
memcpy(left, right, sizeof(StringBuilder));
memcpy(right, &stmp, sizeof(StringBuilder));
return true;
}
bool StringBuilderEqual(StringBuilder *left, StringBuilder *right) {
if (left == right) {
return true;
}
if (left == NULL || right == NULL) {
return false;
}
if (left->size != right->size) {
return false;
}
const char *l = (const char *)left->data;
const char *r = (const char *)right->data;
size_t n = left->size;
for (; n && *l == *r; n--, l++, r++)
;
return n ? *l - *r : 0;
}
const char *StringBuilderConst(StringBuilder *stb) {
assert(stb);
if (stb->msize > stb->size) {
stb->data[stb->size] = '\0';
return (const char *)stb->data;
}
if (StringBuilderResize(stb, 0) == 0) {
return NULL;
}
stb->data[stb->size] = '\0';
return (const char *)stb->data;
}
static int BASECALL StringBuilderVlPrintf(StringBuilder *stb, const char *format,
va_list ap) {
int ret;
char *index = stb->data + stb->size;
size_t sz = stb->msize - stb->size;
#ifdef _WIN32
ret= vsnprintf_s(index, sz, sz - 1,format, ap);
#else
ret= vsnprintf(index, sz format, ap);
#endif
if (ret < 0){
if (StringBuilderResize(stb, 0) == 0)
return -1;
index = stb->data + stb->size;
sz = stb ->msize - stb->size;
#ifdef _WIN32
ret = vsnprintf_s(index, sz, sz - 1, format, ap);
#else
ret = vsnprintf(index, sz format, ap);
#endif
}
if (ret > 0){
stb->size += (size_t)ret;
}
return ret;
}
/*
* Format to StringBuilder buffer
*/
int BASECALL StringBuilderFormat(StringBuilder *stb, const char *format, ...) {
int ret;
va_list ap;
va_start(ap, format);
ret = StringBuilderVlPrintf(stb, format, ap);
va_end(ap);
return ret;
}
StringBuilder 可以使用普通的结构体初始化如:
StringBuilder stb={NULL,0,0,1024};
这个时候并没有分配内存,所以需要使用 StringBuilderAlloc分配内存。
你也可以使用 StringBuilderNew 来构造一个 StringBuilder 指针,任何 StringBuilder 对象在终止时都需要使用 StringBuilderRelease 释放内存,StringBuilderRelease 的第二个参数为 true 时 接受的是 StringBuilderNew 构建的对象指针。否则是 普通的 StringBuilder 对象地址。
StringBuilderAppend 追加字符串,有字符常量版本,内部都是使用 memcpy 实现,不处理 NULL 字符。
StringBuilderPutc 追加字符。
StringBuilderMove StringBuilderSwitch 都是一些切换或者转移的函数,参数都是两个 StringBuilder 对象指针。
StringBuilderEqual 比较两个字符串有效值是否相等,扩容操作不会影响二者的比较。
StringBuilderFormat 是标准的格式化输入函数,将数据格式化输入到 StringBuilder 的字符串中,内部使用 vsnprintf 实现,会进行有一次扩容。
StringBuilderConst 会返回一个 const char* 指针,将 stb->data[this->size]='\0' 如果字符串中存在 NULL 字符,标准的 C 字符串函数会发生截断。
##三. C++ 实现的 StringBuffer StringBuilder 需要手动调用 StringBuilderRelease ,总的来说,操作还是比较麻烦,如果使用 C++,利用 RIIA 可以做的更好,于是我就初步编写了一个 StringBuffer 类,为什么不是 std::string ,string 并不够简单,StringBuffer 我可以深度定制,自由扩展。
StringBuffer.h
/*
*
*/
#include "StringBuffer.h"
#include
#include
#include
#include
#include
#include
#define WriteLine(...) printf(__VA_ARGS__) + (putchar('\n') != EOF ? 1: 0);
#if defined(_MSC_VER) && _MSC_VER >= 1600
#define InlinePrint(s,z,f,v) vsnprintf_s(s,z,_TRUNCATE,f,v)
#define VscPrintf(x,y) _vscprintf(x,y)
#else
#define InlinePrint(s,z,f,v) vsnprintf(s,z,f,v)
inline int vscprintf(const char *format, va_list argptr)
{
return(vsnprintf(0, 0, format, argptr));
}
#define VscPrintf(x,y) vscprintf(x,y)
#endif
bool StringFastCopy(uchar *_Dest, uchar *_Src, size_t count)
{
if (!_Dest || !_Src) return false;
auto n = count / sizeof(size_t);
auto l = reinterpret_cast(_Dest);
auto r = reinterpret_cast(_Src);
for (size_t i = 0; i < n; i++) {
l[i] = r[i];
}
auto k = count%sizeof(size_t);
_Dest += sizeof(size_t)*n;
_Src += sizeof(size_t)*n;
for (size_t m = 0; m < k; m++) {
_Dest[m] = _Src[m];
}
return true;
}
static bool StringBufferCopy(uchar *_Dest, uchar * _Src, size_t count)
{
if (!_Dest || !_Src) return false;
for (size_t i = 0; i < count; i++) {
_Dest[i] = _Src[i];
}
return true;
}
char UpperChar(char ch)
{
if (ch <= 'z'&&ch >= 'a') {
return ch ^ 0x20;
}
return ch;
}
static int StringBufferCompare(uchar *r, uchar *l, size_t n)
{
if (n%sizeof(size_t) == 0) {
auto i = reinterpret_cast(r);
auto j = reinterpret_cast(l);
auto z = n / sizeof(size_t);
while (--z&&*i == *j) {
i++;
j++;
}
return static_cast(*i - *j);
}
while (--n&&*r == *l) {
r++;
l++;
}
return (*r - *l);
}
static size_t CStringSize(const uchar *str)
{
const uchar *eos = str;
while (*eos++);
return eos - str - 1;
}
StringBuffer::StringBuffer(rsize_t resize) :size(0), capacity(0), resize(resize)
{
if (resize == 0) {
data = nullptr;
return;
}
this->data = (uchar *)malloc(resize);
assert(this->data);
this->size = 0;
this->capacity = resize;
}
StringBuffer::~StringBuffer()
{
if (this->data) {
free(this->data);
}
}
bool StringBuffer::Resize(rsize_t rsize)
{
size_t rs = (rsize == 0 ? this->resize : rsize);
rs += this->capacity;
uchar *p = (uchar*)malloc(sizeof(uchar)*rs);
if (p == nullptr) return false;
if (memcpy(p, this->data, this->size) == nullptr) {
free(p);
return false;
}
free(this->data);
this->data = p;
this->capacity = rs;
return true;
}
bool StringBuffer::Empty()
{
return this->size == 0;
}
rsize_t StringBuffer::ResizeSet(rsize_t rs)
{
this->resize = rs;
return this->resize;
}
size_t StringBuffer::Size()
{
return this->size;
}
size_t StringBuffer::Capactiy()
{
return this->capacity;
}
bool StringBuffer::Realloc(size_t newsize)
{
auto p = realloc(this->data, newsize);
if (p) {
this->data = reinterpret_cast(p);
return true;
}
return false;
}
int StringBuffer::PrintVa(const char *format, va_list ap)
{
int w = -1;
auto needBytes = VscPrintf(format, ap);
auto newsize = this->size + needBytes;
if (newsize >= this->capacity) {
auto requireSize = newsize - this->capacity;
auto rs = (requireSize / 4 + 1) * 4;
if (this->Resize(rs) != true)return -1;
}
auto begin = reinterpret_cast(this->data + this->size);
w = InlinePrint(begin, this->capacity - this->size, format, ap);
return w;
}
int StringBuffer::Format(const char *format, ...)
{
int ret;
va_list ap;
va_start(ap, format);
ret = this->PrintVa(format, ap);
va_end(ap);
return ret;
}
int StringBuffer::Format2(size_t maxBytes, const char *format, ...)
{
auto max = maxBytes + this->size;
if (max >= this->capacity) {
auto rs = ((max - this->capacity) / 4 + 1) * 4;
this->Resize(rs);
}
auto c = reinterpret_cast(this->data + this->size);
int ret;
va_list ap;
va_start(ap, format);
ret = InlinePrint(c, maxBytes, format, ap);
va_end(ap);
if (ret > 0) {
this->size += ret;
}
return ret;
}
size_t StringBuffer::Print(FILE *file)
{
auto len = fwrite(const_cast(data), sizeof(char), this->size, file);
fflush(file);
return len;
}
bool StringBuffer::Putc(char ch)
{
if (this->size + 1 >= this->capacity) {
this->Resize();
}
this->data[this->size] = ch;
this->size++;
return true;
}
bool StringBuffer::Append(char *s, size_t len)
{
if (!s || len == 0)return false;
if (this->size + len + 1 > this->capacity) {
rsize_t sz = (len / this->resize + 1)*this->resize;
this->Resize(sz);
}
auto index = this->data + this->size;
if (memmove(index, s, len) == nullptr)
return false;
this->size += len;
return true;
}
bool StringBuffer::Append(const char *s)
{
auto len = CStringSize(reinterpret_cast(s));
char *data = const_cast(s);
return this->Append(data, len);
}
bool StringBuffer::Append(std::initializer_list il)
{
bool br = false;
for (auto &i : il) {
if ((br = this->Append(i)) != true)
return false;
}
return br;
}
const char * StringBuffer::Get()
{
if (this->size + 1 >= this->capacity) {
this->Resize();
}
this->data[this->size] = '\0';
return reinterpret_cast(this->data);
}
bool StringBuffer::Move(StringBuffer &other)
{
other.data = this->data;
other.capacity = this->capacity;
other.size = this->size;
other.resize = this->resize;
this->data = nullptr;
this->capacity = 0;
this->size = 0;
this->resize = 0;
return true;
}
bool StringBuffer::Switch(StringBuffer &right)
{
StringBuffer tmp;
this->Move(tmp);
right.Move(*this);
tmp.Move(right);
return true;
}
bool StringBuffer::Transfer(uchar **receive, size_t *osize)
{
if (!receive || !osize) return false;
if (!this->data || this->size == 0) return false;
*receive = this->data;
*osize = this->size;
this->capacity = 0;
this->size = 0;
this->data = nullptr;
return true;
}
bool StringBuffer::operator == (const StringBuffer &right)
{
if (this == &right)return true;
if (this->data = right.data) return true;
if (this->size != right.size) return false;
return StringBufferCompare(this->data, right.data, this->size) == 0;
}
StringBuffer &StringBuffer::operator += (const char *s)
{
this->Append(s);
return *this;
}
###成员函数 构造函数拥有默认参数,一般可以如下使用:
StringBuffer b1;
StringBuffer b2(2048);
Format 函数和 StringBuilderFormat 类似,一般会先获得需要格式化的缓存区长度,然后按需扩容,最后格式化,这个效率并不是最佳。
Format2 函数要求使用者输入格式化最大的缓存区设置,大多数情况下效率优于 Format.
Resize 函数使用 memcpy 大多数的CRT都对 memcpy 进行了优化,效率远远高于字符串拷贝。
Print 函数参数为 FILE*, 默认值为 stdout, 使用 fwrite, 这样就可以输入到文件或者标准输出,或者其他。 Transfer 转移数据的控制权,这个在某些方面有很大的用处。当构建完二进制和字符串混合包体后,将输入转移给下一步处理的模块,比如 git 构建 POST 包体,然后然 HTTP模块发送给客户端。
一些其他有用的功能也在进一步实现。
下一步是内存池的整合。
##结束语 无