STL 源码剖析 #string#
最近打算好好看看STL源码实现...
各种定义找不到头都大了.
首先你需要一个帮手,ctags不行我们就用global(什么东西自己搞定,这么不介绍了).
在STL库的路径下 bits/stringfwd.h你能找到一下定义
你会发现我们常用的标准库类string实质上是basic_string
class string的定义有2000+行,我们这里不会给出所有细节,仅就个人觉得重要和感兴趣的地方进行分析.
你看,这里类的设计者把对于和string类的各种相关变量类型都做了声明,value_type, size_type ...
class _Rep用来表示字符串
string类总是用_M_length + 1个字符并且确保最后一个为'\0'空字符结尾来表示字符串.
而申请内存的时候总是会多申请一些(_M_capacity),_M_capacity 大于等于 _M_length.
上图是struct _Rep_base的定义,我们可以看到这里使用的是struct 而不是class,之前已经处于private标号中,所以整个结构体都是私有的.
这是里实质上储存字符数据的地方在_M_p指向的指针处.
这里是用户访问内存相关数据成员的接口:
下图是作者注释中给出的string类的内存布局图
"骚年你醒醒啊!喂!"
这个地方有个超级好玩的东东.
你会发现这里对于结构体 struct _Alloc_hider这东西不是分配在栈上的,在堆上啊!!这里利用了C++动态的内存分配器(看那个_Alloc(__a))而_M_p这个指针是在栈上的. 这里致谢@凯旋冲锋 简直不能再帅...很冷静的分析出了作者的意图
利用这种内存布局,出现了一种很有意思的用法,通过成员_M_p指针,指向结构体_Rep头部.
这里再次强调.你会发现作者用了左右两列来表示变量,是故意的.
左侧的在栈上,指针_M_p也在栈上!而指针指向堆上内存,结构体_Rep的最后部分!
你会发现这货居然索引了一个-1的元素.什么鬼~
这里把这个指针强制类型转换成了指向 _Rep的指针.
这里返回的指针位于堆上class _Rep的指向字符串的部分.
这种技巧就和之前遇到过的一种技巧神似.
只是这里用在了class上,而不是struct ,并且是通过别的指针指向的最后可访问的位置.
注意到struct _Rep是 _Rep_base的派生类.
之后我们还能看到这里提供了string类的容量接口capacity()
这里STL的作者写了两个功能同样的接口,可能是为了便于程序员调用.size()与length()同一个意义,都返回当前字符串的长度,但是不包括空字符,这种风格和C的strlen()保持了一致.
我们写个小demo测试一下
#include <iostream> #include <string> using namespace std; int main() { string str("hello world"); cout << "capacity" << str.capacity() << endl; cout << "size " << str.size() << endl; cout << "length " << str.length() << endl; cout << "max_size" << str.max_size() << endl; return 0; }
你会看到各种构造函数的实现:
/** * @brief Construct an empty string using allocator @a a. */ explicit basic_string(const _Alloc& __a); /** * @brief Construct string with copy of value of @a str. * @param __str Source string. */ basic_string(const basic_string& __str); /** * @brief Construct string as copy of a substring. * @param __str Source string. * @param __pos Index of first character to copy from. * @param __n Number of characters to copy (default remainder). */ basic_string(const basic_string& __str, size_type __pos, size_type __n = npos); /** * @brief Construct string as copy of a substring. * @param __str Source string. * @param __pos Index of first character to copy from. * @param __n Number of characters to copy. * @param __a Allocator to use. */ basic_string(const basic_string& __str, size_type __pos, size_type __n, const _Alloc& __a); /** * @brief Construct string initialized by a character %array. * @param __s Source character %array. * @param __n Number of characters to copy. * @param __a Allocator to use (default is default allocator). * * NB: @a __s must have at least @a __n characters, '\\0' * has no special meaning. */ basic_string(const _CharT* __s, size_type __n, const _Alloc& __a = _Alloc()); /** * @brief Construct string as copy of a C string. * @param __s Source C string. * @param __a Allocator to use (default is default allocator). */ basic_string(const _CharT* __s, const _Alloc& __a = _Alloc()); /** * @brief Construct string as multiple characters. * @param __n Number of characters. * @param __c Character to use. * @param __a Allocator to use (default is default allocator). */值得一提的是最后我们常用的字符串类初始化方式是给它一个C类型的字符串,作为初始化的参数.而这里其实就是调用的这里
basic_string(const _CharT* __s, const _Alloc& __a = _Alloc());上面还给出了其他用字符串类引用做初始化参数的构造函数.
我们还看到了clear(), empty接口
clear()用于擦出字符串,使之为空.而empty()怎用于检测字符串是否为空.
#include <iostream> #include <string> using namespace std; int main() { string str("hello"); cout << "capacity" << str.capacity() << endl; cout << "size " << str.size() << endl; cout << "length " << str.length() << endl; cout << "max_size" << str.max_size() << endl; if (str.empty() == true) { cout << "empty" << endl; } else { cout << "Not empty" << endl; } str.clear(); if (str.empty() == true) { cout << "empty" << endl; } else { cout << "Not empty" << endl; } return 0; }
析构函数的实现
/** * @brief Destroy the string instance. */ ~basic_string() _GLIBCXX_NOEXCEPT { _M_rep()->_M_dispose(this->get_allocator()); }
对于赋值符号的重载:
/** * @brief Assign the value of @a str to this string. * @param __str Source string. */ basic_string& operator=(const basic_string& __str) { return this->assign(__str); } /** * @brief Copy contents of @a s into this string. * @param __s Source null-terminated string. */ basic_string& operator=(const _CharT* __s) { return this->assign(__s); } /** * @brief Set value to string of length 1. * @param __c Source character. * * Assigning to a character makes this string length 1 and * (*this)[0] == @a c. */ basic_string& operator=(_CharT __c) { this->assign(1, __c); return *this; }我们会发现各种assign()调用.
那就扒一扒这个assgin的实现.
/** * @brief Set value to a substring of a string. * @param __str The string to use. * @param __pos Index of the first character of str. * @param __n Number of characters to use. * @return Reference to this string. * @throw std::out_of_range if @a pos is not a valid index. * * This function sets this string to the substring of @a __str * consisting of @a __n characters at @a __pos. If @a __n is * is larger than the number of available characters in @a * __str, the remainder of @a __str is used. */ basic_string& assign(const basic_string& __str, size_type __pos, size_type __n) { return this->assign(__str._M_data() + __str._M_check(__pos, "basic_string::assign"), __str._M_limit(__pos, __n)); } /** * @brief Set value to a C substring. * @param __s The C string to use. * @param __n Number of characters to use. * @return Reference to this string. * * This function sets the value of this string to the first @a __n * characters of @a __s. If @a __n is is larger than the number of * available characters in @a __s, the remainder of @a __s is used. */ basic_string& assign(const _CharT* __s, size_type __n); /** * @brief Set value to contents of a C string. * @param __s The C string to use. * @return Reference to this string. * * This function sets the value of this string to the value of @a __s. * The data is copied, so there is no dependence on @a __s once the * function returns. */ basic_string& assign(const _CharT* __s) { __glibcxx_requires_string(__s); return this->assign(__s, traits_type::length(__s)); } /** * @brief Set value to multiple characters. * @param __n Length of the resulting string. * @param __c The character to use. * @return Reference to this string. * * This function sets the value of this string to @a __n copies of * character @a __c. */ basic_string& assign(size_type __n, _CharT __c) { return _M_replace_aux(size_type(0), this->size(), __n, __c); } /** * @brief Set value to a range of characters. * @param __first Iterator referencing the first character to append. * @param __last Iterator marking the end of the range. * @return Reference to this string. * * Sets value of string to characters in the range [__first,__last). */ template<class _InputIterator> basic_string& assign(_InputIterator __first, _InputIterator __last) { return this->replace(_M_ibegin(), _M_iend(), __first, __last); }
中间变量__tmp指向的字符串作为_M_p新的值(通过调用_M_data设置).
一般的, 我们常常利用oprator[ ]去下标索引字符串里面的某些特定的字符.这里[ ]又是如何实现的呢?
string str("hello");
cout << str[1] << endl;
看下面: 之前我们已经知道了_M_data()是一个private调用,会返回实际的储存字符串的地址.
这里会返回一个指针,而对于指针的下标索引就是编译器内置实现的了,和C语言一样~
值得注意的是这种下标索引的方式不会在程序运行时检测是否越界问题.所以还是和C的风格一样,需要谨慎使用.
C++ STL库为我们提供了一种安全的方式去访问字符串中的元素.at
对,你没有看错,就是at 哈哈
其实就是多了一层检测,如果你试图访问的索引超出了字符串的最大长度,就会抛出异常.
只要没事,就会按照下标索引的方式进行访问,所以!!!提倡使用at去访问字符串元素,别直接用[ ]了!!
如果你用的C++11
那么还有几个新增的接口提供使用.
有几个append函数重载了几个..然后下面是其中的一个.其他几个思想相同.
先检查储存字符串的容量是否还够,不够就去通过reserve申请内存,然后把参数str的数据_M_data()返回的指针指向的数据拷贝到当前调用append类的尾部.
template<typename _CharT, typename _Traits, typename _Alloc> basic_string<_CharT, _Traits, _Alloc>& basic_string<_CharT, _Traits, _Alloc>:: append(const basic_string& __str) { const size_type __size = __str.size(); if (__size) { const size_type __len = __size + this->size(); if (__len > this->capacity() || _M_rep()->_M_is_shared()) this->reserve(__len); _M_copy(_M_data() + this->size(), __str._M_data(), __size); _M_rep()->_M_set_length_and_sharable(__len); } return *this; }
下面是push_back的实现,和上面append的想法差不多,只是这时候只有一个字符而已.策略几乎一样.
/** * @brief Append a single character. * @param __c Character to append. */ void push_back(_CharT __c) { const size_type __len = 1 + this->size(); if (__len > this->capacity() || _M_rep()->_M_is_shared()) this->reserve(__len); traits_type::assign(_M_data()[this->size()], __c); _M_rep()->_M_set_length_and_sharable(__len); }
STL 库还为string类重载了 += operator
str += "I am EOF\n";
cout << str << endl;
如下:实质上是基于append和push_back的
看下面这行代码
cout << str.insert(3, "42") << endl;
有很方便的insert接口,去实现字符串类的插入操作.
以下是部分insert的实现(还有各种重载啊!!我靠( ‵o′)凸)
同样insert的实现在bits/basic_string.tcc里面
检查插入的位置__pos对不对,是否越界,由_M_check实现
_M_check_length检查剩余的内存还够不够insert,不够直接抛异常
如果
string类还提供了各种find()查找方法.
/** * @brief Find position of a C substring. * @param __s C string to locate. * @param __pos Index of character to search from. * @param __n Number of characters from @a s to search for. * @return Index of start of first occurrence. * * Starting from @a __pos, searches forward for the first @a * __n characters in @a __s within this string. If found, * returns the index where it begins. If not found, returns * npos. */ size_type find(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find position of a string. * @param __str String to locate. * @param __pos Index of character to search from (default 0). * @return Index of start of first occurrence. * * Starting from @a __pos, searches forward for value of @a __str within * this string. If found, returns the index where it begins. If not * found, returns npos. */ size_type find(const basic_string& __str, size_type __pos = 0) const _GLIBCXX_NOEXCEPT { return this->find(__str.data(), __pos, __str.size()); } /** * @brief Find position of a C string. * @param __s C string to locate. * @param __pos Index of character to search from (default 0). * @return Index of start of first occurrence. * * Starting from @a __pos, searches forward for the value of @a * __s within this string. If found, returns the index where * it begins. If not found, returns npos. */ size_type find(const _CharT* __s, size_type __pos = 0) const { __glibcxx_requires_string(__s); return this->find(__s, __pos, traits_type::length(__s)); } /** * @brief Find position of a character. * @param __c Character to locate. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for @a __c within * this string. If found, returns the index where it was * found. If not found, returns npos. */ size_type find(_CharT __c, size_type __pos = 0) const _GLIBCXX_NOEXCEPT; /** * @brief Find last position of a string. * @param __str String to locate. * @param __pos Index of character to search back from (default end). * @return Index of start of last occurrence. * * Starting from @a __pos, searches backward for value of @a * __str within this string. If found, returns the index where * it begins. If not found, returns npos. */ size_type rfind(const basic_string& __str, size_type __pos = npos) const _GLIBCXX_NOEXCEPT { return this->rfind(__str.data(), __pos, __str.size()); } /** * @brief Find last position of a C substring. * @param __s C string to locate. * @param __pos Index of character to search back from. * @param __n Number of characters from s to search for. * @return Index of start of last occurrence. * * Starting from @a __pos, searches backward for the first @a * __n characters in @a __s within this string. If found, * returns the index where it begins. If not found, returns * npos. */ size_type rfind(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find last position of a C string. * @param __s C string to locate. * @param __pos Index of character to start search at (default end). * @return Index of start of last occurrence. * * Starting from @a __pos, searches backward for the value of * @a __s within this string. If found, returns the index * where it begins. If not found, returns npos. */ size_type rfind(const _CharT* __s, size_type __pos = npos) const { __glibcxx_requires_string(__s); return this->rfind(__s, __pos, traits_type::length(__s)); } /** * @brief Find last position of a character. * @param __c Character to locate. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for @a __c within * this string. If found, returns the index where it was * found. If not found, returns npos. */ size_type rfind(_CharT __c, size_type __pos = npos) const _GLIBCXX_NOEXCEPT; /** * @brief Find position of a character of string. * @param __str String containing characters to locate. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for one of the * characters of @a __str within this string. If found, * returns the index where it was found. If not found, returns * npos. */ size_type find_first_of(const basic_string& __str, size_type __pos = 0) const _GLIBCXX_NOEXCEPT { return this->find_first_of(__str.data(), __pos, __str.size()); } /** * @brief Find position of a character of C substring. * @param __s String containing characters to locate. * @param __pos Index of character to search from. * @param __n Number of characters from s to search for. * @return Index of first occurrence. * * Starting from @a __pos, searches forward for one of the * first @a __n characters of @a __s within this string. If * found, returns the index where it was found. If not found, * returns npos. */ size_type find_first_of(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find position of a character of C string. * @param __s String containing characters to locate. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for one of the * characters of @a __s within this string. If found, returns * the index where it was found. If not found, returns npos. */ size_type find_first_of(const _CharT* __s, size_type __pos = 0) const { __glibcxx_requires_string(__s); return this->find_first_of(__s, __pos, traits_type::length(__s)); } /** * @brief Find position of a character. * @param __c Character to locate. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for the character * @a __c within this string. If found, returns the index * where it was found. If not found, returns npos. * * Note: equivalent to find(__c, __pos). */ size_type find_first_of(_CharT __c, size_type __pos = 0) const _GLIBCXX_NOEXCEPT { return this->find(__c, __pos); } /** * @brief Find last position of a character of string. * @param __str String containing characters to locate. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for one of the * characters of @a __str within this string. If found, * returns the index where it was found. If not found, returns * npos. */ size_type find_last_of(const basic_string& __str, size_type __pos = npos) const _GLIBCXX_NOEXCEPT { return this->find_last_of(__str.data(), __pos, __str.size()); } /** * @brief Find last position of a character of C substring. * @param __s C string containing characters to locate. * @param __pos Index of character to search back from. * @param __n Number of characters from s to search for. * @return Index of last occurrence. * * Starting from @a __pos, searches backward for one of the * first @a __n characters of @a __s within this string. If * found, returns the index where it was found. If not found, * returns npos. */ size_type find_last_of(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find last position of a character of C string. * @param __s C string containing characters to locate. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for one of the * characters of @a __s within this string. If found, returns * the index where it was found. If not found, returns npos. */ size_type find_last_of(const _CharT* __s, size_type __pos = npos) const { __glibcxx_requires_string(__s); return this->find_last_of(__s, __pos, traits_type::length(__s)); } /** * @brief Find last position of a character. * @param __c Character to locate. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for @a __c within * this string. If found, returns the index where it was * found. If not found, returns npos. * * Note: equivalent to rfind(__c, __pos). */ size_type find_last_of(_CharT __c, size_type __pos = npos) const _GLIBCXX_NOEXCEPT { return this->rfind(__c, __pos); } /** * @brief Find position of a character not in string. * @param __str String containing characters to avoid. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for a character not contained * in @a __str within this string. If found, returns the index where it * was found. If not found, returns npos. */ size_type find_first_not_of(const basic_string& __str, size_type __pos = 0) const _GLIBCXX_NOEXCEPT { return this->find_first_not_of(__str.data(), __pos, __str.size()); } /** * @brief Find position of a character not in C substring. * @param __s C string containing characters to avoid. * @param __pos Index of character to search from. * @param __n Number of characters from __s to consider. * @return Index of first occurrence. * * Starting from @a __pos, searches forward for a character not * contained in the first @a __n characters of @a __s within * this string. If found, returns the index where it was * found. If not found, returns npos. */ size_type find_first_not_of(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find position of a character not in C string. * @param __s C string containing characters to avoid. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for a character not * contained in @a __s within this string. If found, returns * the index where it was found. If not found, returns npos. */ size_type find_first_not_of(const _CharT* __s, size_type __pos = 0) const { __glibcxx_requires_string(__s); return this->find_first_not_of(__s, __pos, traits_type::length(__s)); } /** * @brief Find position of a different character. * @param __c Character to avoid. * @param __pos Index of character to search from (default 0). * @return Index of first occurrence. * * Starting from @a __pos, searches forward for a character * other than @a __c within this string. If found, returns the * index where it was found. If not found, returns npos. */ size_type find_first_not_of(_CharT __c, size_type __pos = 0) const _GLIBCXX_NOEXCEPT; /** * @brief Find last position of a character not in string. * @param __str String containing characters to avoid. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for a character * not contained in @a __str within this string. If found, * returns the index where it was found. If not found, returns * npos. */ size_type find_last_not_of(const basic_string& __str, size_type __pos = npos) const _GLIBCXX_NOEXCEPT { return this->find_last_not_of(__str.data(), __pos, __str.size()); } /** * @brief Find last position of a character not in C substring. * @param __s C string containing characters to avoid. * @param __pos Index of character to search back from. * @param __n Number of characters from s to consider. * @return Index of last occurrence. * * Starting from @a __pos, searches backward for a character not * contained in the first @a __n characters of @a __s within this string. * If found, returns the index where it was found. If not found, * returns npos. */ size_type find_last_not_of(const _CharT* __s, size_type __pos, size_type __n) const; /** * @brief Find last position of a character not in C string. * @param __s C string containing characters to avoid. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for a character * not contained in @a __s within this string. If found, * returns the index where it was found. If not found, returns * npos. */ size_type find_last_not_of(const _CharT* __s, size_type __pos = npos) const { __glibcxx_requires_string(__s); return this->find_last_not_of(__s, __pos, traits_type::length(__s)); } /** * @brief Find last position of a different character. * @param __c Character to avoid. * @param __pos Index of character to search back from (default end). * @return Index of last occurrence. * * Starting from @a __pos, searches backward for a character other than * @a __c within this string. If found, returns the index where it was * found. If not found, returns npos. */ size_type find_last_not_of(_CharT __c, size_type __pos = npos) const _GLIBCXX_NOEXCEPT;
各种重载, 总有你喜欢的那一款~\(≧▽≦)/~啦啦啦~
下面是个调用find的小demo: 从字符偏置7处开始检查,一直要找到1个'e' ,有意思的是,如果你要确实找到了e,但是个数不够,那么find会一直找下去,知道遇到所有用于str的储存空间用完为止!
#include <iostream> #include <string> using namespace std; int main() { string str("hello world e"); cout << str.find("e", 7, 1) << endl; return 0; }
这里会输出14,我们是起始位置就故意错过了第一个e, 我们要找到第二个e,find帮助我们返回这个e索引
一开始完全摸不着头脑,后来慢慢就好了,无非就是砸时间嘛~
利用global把相关的变量路径找到,然后找定义就容易多了,ctags对于C++的变量定义感觉完全不行啊...
再者,对于string类,实质上实现是basic_string,而这个集中在bits/basic_string.h 和bits/stringfwd.h 还有 bits/basic_string.tcc 一些和类构造有关的实现在header file里面,和字符串操作算法相关的比方说find compare等等都在 tcc文件里面实现.
欢迎提出任何建议和批评指导!(当然我知道,笔记这东西一般都是写给自己看的...哈哈有些乱. 不过还是希望看到的人如果有兴趣可以一起交流讨论关于STL实现的问题)
有点感触,C++难可能一方面是很多人浮躁(我也浮躁还懒QAQ),很多时候还是只知道调用,具体的实现不知道,空中楼阁搭建起来的东东就晃荡,一晃荡就坑爹了...也就难得定位代码究竟出了怎样奇葩的bug.
陈皓说 90%的坑是C的,C++只新增了10%的坑.把C++当做猛兽来对待.互勉~
于2015.03.30下午5时更新