红黑树的实现与讨论
红黑树是比较常用的二叉查找树,也是比较难实现的查找树。他的实质是把2-3-4树(即4阶的B树)化为二叉树的表示形式,具有较好的性能。一般算法书(包括CLRS的算法导论)讲述red-black tree都讲得很头晕,特别是各种不同情况的讨论。最好看看红黑树的作者之一sedgewick的pdf( 参见此处)。 另外,cnblogs的abatei兄也把原理讲得很清楚( 点击这里)。由于red-black tree的复杂性,sedgewick在08年重新审视了红黑树的实现, 提出了 left-leaning red-black tree . 大大简化了red-black tree的实现难度,代码非常简洁清晰。不过外国某人实现了LLRB,插入和删除的效率大概比标准红黑树慢一倍左右,搜索效率相当。
自上而下的递归版本的代码量少很多,而且很清晰,但是效率不及自下以上的迭代实现。
为了进一步节省空间,可以把color信息合并到parent指针中。这是由于指针以4字节对齐,最低两位必然为0,颜色值只有红,黑两种状态,于是可以利用最低的1bit保存颜色信息,只需要定义一些 位运算的宏,便可以节省了4byte/结点的空间 。
#define
rb_parent(r) ((struct rb_node *)((r)->rb_parent_color & ~3))
#define rb_color(r) ((r)->rb_parent_color & 1)
#define rb_is_red(r) (!rb_color(r))
#define rb_is_black(r) rb_color(r)
#define rb_set_red(r) do { (r)->rb_parent_color &= ~1; } while (0)
#define rb_set_black(r) do { (r)->rb_parent_color |= 1; } while (0)
#define rb_color(r) ((r)->rb_parent_color & 1)
#define rb_is_red(r) (!rb_color(r))
#define rb_is_black(r) rb_color(r)
#define rb_set_red(r) do { (r)->rb_parent_color &= ~1; } while (0)
#define rb_set_black(r) do { (r)->rb_parent_color |= 1; } while (0)
自己根据linux kernel中的rbtree.c(非递归),做了一层薄薄的C++模板封装,并对比了不同实现的效率。(本实现中没有把color合并到指针中)
(我的rbtree源代码下载)
测试环境:
Intel celeron 2.4GHz / 1GB 内存 / winXP
编译环境: VC++ 6.0 和 Dev c++ 4.9.9
结果:
测试数据分别是50000和5000个随机样本, 时间为所有样本操作总和,单位为ms
my rbtree | gcc STL map | VC6 STL map | 线性搜索 | |
insert | 106.2 / 5.11 | 173.6 / 8.17 | 402.7 / 40.16 | 0 |
search | 73.9 / 1.69 | 89.7 / 3.28 | 226.9 / 17.8 | 3936.8 / 32.8 |
delete | 131.1 / 5.07 | 177.1 / 9.56 | 579.9 / 47.9 |
对于5000个以内的样本,搜索5000个元素总共才用了几毫秒,已经可以满足一般key查找应用了。
由于rbtree的实现复杂性,AVL树和Treaps的实际效率并不比red-black tree差,但是实现要简单多了。( 参见此处)
在本测试例子中,key是一个范围内的整数,用hash table会快得到,这样不太公平,因此就没测了。而且hash的数据是无序的,不支持范围查找。
PS: VC6的PJ版STL实现可以扔进垃圾桶了。STL的实现还是用SGI版本的比较好。
以下是benchmark的代码
1
struct
SampleType
{
2 char ch[8];
3} ;
4
5 struct SampleType testvalue;
6
7 void benchmark_once( int nkeys)
8 {
9 int i;
10 int* key_set[3]; //key_set[0] for insert, 1 for search, 2 for erase
11 RbTree<int, SampleType> my_rbt;
12 std::map<int, SampleType> stl_map;
13 std::map<int, SampleType>::iterator stl_iter;
14
15 //generate test sequence
16 for (i=0; i<3; i++) {
17 key_set[i] = new int[nkeys];
18 if (key_set[i] == NULL) exit(0);
19 }
20 for (i=0; i<nkeys; i++) {
21 key_set[0][i] = key_set[1][i] = key_set[2][i] = i;
22 }
23
24 srand(time(NULL));
25 random_shuffle(key_set[0], nkeys); //insert
26 random_shuffle(key_set[1], nkeys); //search
27 random_shuffle(key_set[2], nkeys); //erase
28
29 //begin test
30
31 LARGE_INTEGER litmp;
32 LONGLONG qbegin, qend;
33 double df_freq, df_time;
34 QueryPerformanceFrequency(&litmp);
35 df_freq = (double)litmp.QuadPart;
36
37 //my rbtree ////////////////////////////
38
39 //insert benchmark
40 QueryPerformanceCounter(&litmp);
41 qbegin = litmp.QuadPart;
42 for (i=0; i<nkeys; i++) {
43 my_rbt.insert(key_set[0][i], testvalue);
44 }
45
46 QueryPerformanceCounter(&litmp);
47 qend = litmp.QuadPart;
48 df_time = (double)(qend - qbegin) / df_freq;
49 printf("my rbtree insert: %lf ms\n", df_time*1000);
50
51 //search benchmark
52 QueryPerformanceCounter(&litmp);
53 qbegin = litmp.QuadPart;
54
55 for (i=0; i<nkeys; i++) {
56 my_rbt.search(key_set[1][i]);
57 }
58 QueryPerformanceCounter(&litmp);
59 qend = litmp.QuadPart;
60 df_time = (double)(qend - qbegin) / df_freq;
61 printf("my rbtree search: %lf ms\n", df_time*1000);
62
63 //erase benchmark
64 QueryPerformanceCounter(&litmp);
65 qbegin = litmp.QuadPart;
66
67 for (i=0; i<nkeys; i++) {
68 my_rbt.erase(key_set[2][i]);
69 }
70 QueryPerformanceCounter(&litmp);
71 qend = litmp.QuadPart;
72 df_time = (double)(qend - qbegin) / df_freq;
73 printf("my rbtree erase: %lf ms\n", df_time*1000);
74
75 //VC++6 version STL ////////////////////////////
76 //insert test
77 QueryPerformanceCounter(&litmp);
78 qbegin = litmp.QuadPart;
79 for (i=0; i<nkeys; i++) {
80 stl_map.insert(std::make_pair(key_set[0][i], testvalue));
81 }
82
83 QueryPerformanceCounter(&litmp);
84 qend = litmp.QuadPart;
85 df_time = (double)(qend - qbegin) / df_freq;
86 printf("stl_map insert: %lf ms\n", df_time*1000);
87
88 //search test
89 QueryPerformanceCounter(&litmp);
90 qbegin = litmp.QuadPart;
91 for (i=0; i<nkeys; i++) {
92 stl_map.find(key_set[1][i]);
93 }
94
95 QueryPerformanceCounter(&litmp);
96 qend = litmp.QuadPart;
97 df_time = (double)(qend - qbegin) / df_freq;
98 printf("stl_map search: %lf ms\n", df_time*1000);
99
100 //erase test
101 QueryPerformanceCounter(&litmp);
102 qbegin = litmp.QuadPart;
103
104 for (i=0; i<nkeys; i++) {
105 stl_map.erase(key_set[2][i]);
106 }
107 QueryPerformanceCounter(&litmp);
108 qend = litmp.QuadPart;
109 df_time = (double)(qend - qbegin) / df_freq;
110 printf("stl_map erase: %lf ms\n", df_time*1000);
111
112 //clear up sequence
113 for (i=0; i<3; i++) {
114 delete[] key_set[i];
115 }
116}
2 char ch[8];
3} ;
4
5 struct SampleType testvalue;
6
7 void benchmark_once( int nkeys)
8 {
9 int i;
10 int* key_set[3]; //key_set[0] for insert, 1 for search, 2 for erase
11 RbTree<int, SampleType> my_rbt;
12 std::map<int, SampleType> stl_map;
13 std::map<int, SampleType>::iterator stl_iter;
14
15 //generate test sequence
16 for (i=0; i<3; i++) {
17 key_set[i] = new int[nkeys];
18 if (key_set[i] == NULL) exit(0);
19 }
20 for (i=0; i<nkeys; i++) {
21 key_set[0][i] = key_set[1][i] = key_set[2][i] = i;
22 }
23
24 srand(time(NULL));
25 random_shuffle(key_set[0], nkeys); //insert
26 random_shuffle(key_set[1], nkeys); //search
27 random_shuffle(key_set[2], nkeys); //erase
28
29 //begin test
30
31 LARGE_INTEGER litmp;
32 LONGLONG qbegin, qend;
33 double df_freq, df_time;
34 QueryPerformanceFrequency(&litmp);
35 df_freq = (double)litmp.QuadPart;
36
37 //my rbtree ////////////////////////////
38
39 //insert benchmark
40 QueryPerformanceCounter(&litmp);
41 qbegin = litmp.QuadPart;
42 for (i=0; i<nkeys; i++) {
43 my_rbt.insert(key_set[0][i], testvalue);
44 }
45
46 QueryPerformanceCounter(&litmp);
47 qend = litmp.QuadPart;
48 df_time = (double)(qend - qbegin) / df_freq;
49 printf("my rbtree insert: %lf ms\n", df_time*1000);
50
51 //search benchmark
52 QueryPerformanceCounter(&litmp);
53 qbegin = litmp.QuadPart;
54
55 for (i=0; i<nkeys; i++) {
56 my_rbt.search(key_set[1][i]);
57 }
58 QueryPerformanceCounter(&litmp);
59 qend = litmp.QuadPart;
60 df_time = (double)(qend - qbegin) / df_freq;
61 printf("my rbtree search: %lf ms\n", df_time*1000);
62
63 //erase benchmark
64 QueryPerformanceCounter(&litmp);
65 qbegin = litmp.QuadPart;
66
67 for (i=0; i<nkeys; i++) {
68 my_rbt.erase(key_set[2][i]);
69 }
70 QueryPerformanceCounter(&litmp);
71 qend = litmp.QuadPart;
72 df_time = (double)(qend - qbegin) / df_freq;
73 printf("my rbtree erase: %lf ms\n", df_time*1000);
74
75 //VC++6 version STL ////////////////////////////
76 //insert test
77 QueryPerformanceCounter(&litmp);
78 qbegin = litmp.QuadPart;
79 for (i=0; i<nkeys; i++) {
80 stl_map.insert(std::make_pair(key_set[0][i], testvalue));
81 }
82
83 QueryPerformanceCounter(&litmp);
84 qend = litmp.QuadPart;
85 df_time = (double)(qend - qbegin) / df_freq;
86 printf("stl_map insert: %lf ms\n", df_time*1000);
87
88 //search test
89 QueryPerformanceCounter(&litmp);
90 qbegin = litmp.QuadPart;
91 for (i=0; i<nkeys; i++) {
92 stl_map.find(key_set[1][i]);
93 }
94
95 QueryPerformanceCounter(&litmp);
96 qend = litmp.QuadPart;
97 df_time = (double)(qend - qbegin) / df_freq;
98 printf("stl_map search: %lf ms\n", df_time*1000);
99
100 //erase test
101 QueryPerformanceCounter(&litmp);
102 qbegin = litmp.QuadPart;
103
104 for (i=0; i<nkeys; i++) {
105 stl_map.erase(key_set[2][i]);
106 }
107 QueryPerformanceCounter(&litmp);
108 qend = litmp.QuadPart;
109 df_time = (double)(qend - qbegin) / df_freq;
110 printf("stl_map erase: %lf ms\n", df_time*1000);
111
112 //clear up sequence
113 for (i=0; i<3; i++) {
114 delete[] key_set[i];
115 }
116}