树是计算机中非常重要的一种数据结构,树的存储方式可以提高数据的存储、读取效率。
树的种类有很多种,有二叉树、完全二叉树、红黑树、B树、B+树、满二叉树、二叉排序树、平衡二叉树、AVL平衡二叉树、堆等。
二叉树就是每个结点最多有两个子结点;二叉树的结点分为左结点和右结点。
二叉树的每一层的结点数量都达到最大值,则这个二叉树称为满二叉树。一棵深度为n的满二叉树,结点数量有:
2 n − 1 2^n-1 2n−1
叶子结点只能出现在最下层和次下层,最后一层的叶子结点集中在左边,倒数第二层的叶子结点在右边连续,这样的二叉树称为完全二叉树。
二叉树的每个结点最多有两个子节点,那么当有1023个节点时,二叉树至少需要10层;查找数据时这会增加数据的对比次数。
如果使用二叉树对磁盘数据进行组织,把二叉树每个节点都存储在磁盘,那么每次数据对比后寻找下一个结点又是一次磁盘寻址,这是一个极其耗时的过程。
因此,衍生出一种降层高的数据结构:多叉树。
多叉树和B树区别:多叉树说明树的结构有多个叉,而B树的所有叶子节点在同一层。
B+树:B+树的内结点不存储数据,仅作索引;所有数据都是存储在叶子结点上。
而B树的所有结点都是存储数据的。
B+树更适合做磁盘索引,性能优于B树;因为B+的内结点不存储数据。
一颗M阶B数T,满足一下条件:
(1)每个结点至多拥有M颗子树。
(2)根结点至少拥有两颗子树。
(3)除了根结点以外,其余每个分支结点至少拥有M/2颗子树。
(4)所有叶子结点在同一层上。
(5)有K颗子树的分支结点则存在K-1个关键字,关键字按照递增顺序进行排序。
(6)关键字数量满足ceil(M/2)-1 <=n<=M-1。
typedef int KEY_VALUE;
typedef struct _btree_node {
//int keys[2 * SUB_M - 1]; // 存储关键字,M-1
KEY_VALUE *keys; // 存储关键字
// void *value; // 存储数据
struct _btree_node **childrens; // 子树,M
int num; // 已存储的key数量
int leaf; // 是否为叶子结点
}btree_node;
typedef struct _btree {
btree_node *root;
int t; // M阶,t=M/2
}btree;
// 创建结点
btree_node *btree_create_node(int t,int leaf)
{
btree_node *node = (btree_node *)calloc(1, sizeof(btree_node));
if (node == NULL)
return NULL;
node->keys = (KEY_VALUE *)calloc(1, (2 * t - 1)*sizeof(KEY_VALUE));
if (node->keys == NULL)
{
free(node);
return NULL;
}
node->childrens = (btree_node **)calloc(1, (2 * t)*sizeof(btree_node*));
if (node->childrens == NULL)
{
free(node->keys);
free(node);
return NULL;
}
node->leaf = leaf;
node->num = 0;
return node;
}
// 创建根结点
void btree_create(btree *T, int t) {
T->t = t;
btree_node *x = btree_create_node(t, 1);
T->root = x;
}
void btree_destroy_node(btree_node *node)
{
if (node == NULL)
return;
if (node->childrens != NULL)
free(node->childrens);
if (node->keys != NULL)
free(node->keys);
free(node);
}
假设一颗M阶B数T,当B树的结点关键字达到M时,需要分裂。分裂时将M/2结点放到父结点,0 ~ (M/2-1)组成一个子树,(M/2+1) ~ (M-1)组成一个子树;当父结点结点关键字达到M时也要分裂。
结点分裂:
根结点分裂,M=6:
// 子节点分裂
void btree_split_child(btree *T, btree_node *x, int idx)
{
int t = T->t;
btree_node *y = x->childrens[idx];
btree_node *z = btree_create_node(t,y->leaf);
z->num = t - 1;
int i = 0;
for (i = 0; i < t - 1; i++)
z->keys[i] = y->keys[t + i];
if (y->leaf == 0)//inner,内节点
{
for (i = 0; i < t; i++)
z->childrens[i] = y->childrens[t + i];
}
y->num = t-1;
// 移动、插入结点
for (i = x->num; i >= idx + 1; i--)
{
x->childrens[i + 1] = x->childrens[i];
}
x->childrens[idx + 1] = z;
// key 交换
for (i = x->num-1; i >= idx; i--)
{
x->keys[i + 1] = x->keys[i];
}
x->keys[idx] = y->keys[t-1];
x->num += 1;
}
先分裂再添加。B树添加的时候,将其添加到叶子结点,再根据是否满足分裂的条件进行分裂。如果父节点满了之后需要再进行分裂来增加树的高度。
void btree_insert_nonfull(btree *T, btree_node *x, KEY_VALUE k) {
int i = x->num - 1;
if (x->leaf == 1) {
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
x->num += 1;
}
else {
while (i >= 0 && x->keys[i] > k) i--;
if (x->childrens[i + 1]->num == (2 * (T->t)) - 1) {
btree_split_child(T, x, i + 1);
if (k > x->keys[i + 1]) i++;
}
btree_insert_nonfull(T, x->childrens[i + 1], k);
}
}
void btree_insert(btree *T, KEY_VALUE key) {
btree_node *r = T->root;
if (r->num == 2 * T->t - 1) {
btree_node *node = btree_create_node(T->t, 0);
T->root = node;
node->childrens[0] = r;
btree_split_child(T, node, 0);
int i = 0;
if (node->keys[0] < key) i++;
btree_insert_nonfull(T, node->childrens[i], key);
}
else {
btree_insert_nonfull(T, r, key);
}
}
假设一颗M阶B数T,当要删除的结点所在子树的key数量等于ceil(M/2)时,需要进行合并。
/*************************合并 merge*****************************/
void btree_merge(btree *T, btree_node *x, int idx)
{
btree_node *left = x->childrens[idx];
btree_node *right = x->childrens[idx + 1];
int i = 0;
// 合并keys
left->keys[T->t-1] = x->keys[idx];
for (i = 0; i < T->t-1; i++)
{
left->keys[T->t + i] = right->keys[i];
}
// 如果不是子树,需要拷贝结点
if (!left->leaf) {
for (i = 0; i < T->t; i++) {
left->childrens[T->t + i] = right->childrens[i];
}
}
left->num += T->t;
btree_destroy_node(right);
// x 的key前移
for (i = idx + 1; i < x->num; i++)
{
x->keys[i - 1] = x->keys[i];
x->childrens[i] = x->childrens[i + 1];
}
x->childrens[i + 1] = NULL;
x->num -= 1;
if (x->num == 0) {
T->root = left;
btree_destroy_node(x);
}
}
B树的查找时间复杂度: l o g n ( m / n ) logn(m/n) logn(m/n)。其中n是叉的数量。
借位:如果子树关键字数量=M/2-1,需要借位,**避免资源不足。**借不到就进行合并。
删除流程:B树可以删除的状态时直接删除;B树不可以删除状态时,先合并或借位,转换为B树可以删除的状态,再删除。
(1)关键字在叶子结点中,直接删除。
(2)当前结点为内结点,左孩子至少包含T个关键字。
先从左子树借位,再删除。
(3)当前结点为内结点,右孩子至少包含T个关键字。
先从右子树借位,再删除。
(4)左右孩子结点都是T-1个关键字。
先合并,再删除。
(5)相邻左右结点包含至少有T个结点,且当前孩子的结点是T-1个结点。
直接删除S会改变B树的特性(关键字数量满足ceil(M/2)-1 <=n<=M-1), 先借位合并,再删除。
void btree_delete_key(btree *T, btree_node *node, KEY_VALUE key) {
if (node == NULL) return ;
int idx = 0, i;
while (idx < node->num && key > node->keys[idx]) {
idx ++;
}
if (idx < node->num && key == node->keys[idx]) {
if (node->leaf) {
for (i = idx;i < node->num-1;i ++) {
node->keys[i] = node->keys[i+1];
}
node->keys[node->num - 1] = 0;
node->num--;
if (node->num == 0) { //root
free(node);
T->root = NULL;
}
return ;
} else if (node->childrens[idx]->num >= T->t) {
btree_node *left = node->childrens[idx];
node->keys[idx] = left->keys[left->num - 1];
btree_delete_key(T, left, left->keys[left->num - 1]);
} else if (node->childrens[idx+1]->num >= T->t) {
btree_node *right = node->childrens[idx+1];
node->keys[idx] = right->keys[0];
btree_delete_key(T, right, right->keys[0]);
} else {
btree_merge(T, node, idx);
btree_delete_key(T, node->childrens[idx], key);
}
} else {
btree_node *child = node->childrens[idx];
if (child == NULL) {
printf("Cannot del key = %d\n", key);
return ;
}
if (child->num == T->t - 1) {
btree_node *left = NULL;
btree_node *right = NULL;
if (idx - 1 >= 0)
left = node->childrens[idx-1];
if (idx + 1 <= node->num)
right = node->childrens[idx+1];
if ((left && left->num >= T->t) ||
(right && right->num >= T->t)) {
int richR = 0;
if (right) richR = 1;
if (left && right) richR = (right->num > left->num) ? 1 : 0;
if (right && right->num >= T->t && richR) { //borrow from next
child->keys[child->num] = node->keys[idx];
child->childrens[child->num+1] = right->childrens[0];
child->num ++;
node->keys[idx] = right->keys[0];
for (i = 0;i < right->num - 1;i ++) {
right->keys[i] = right->keys[i+1];
right->childrens[i] = right->childrens[i+1];
}
right->keys[right->num-1] = 0;
right->childrens[right->num-1] = right->childrens[right->num];
right->childrens[right->num] = NULL;
right->num --;
} else { //borrow from prev
for (i = child->num;i > 0;i --) {
child->keys[i] = child->keys[i-1];
child->childrens[i+1] = child->childrens[i];
}
child->childrens[1] = child->childrens[0];
child->childrens[0] = left->childrens[left->num];
child->keys[0] = node->keys[idx-1];
child->num ++;
node->keys[idx-1] = left->keys[left->num-1];
left->keys[left->num-1] = 0;
left->childrens[left->num] = NULL;
left->num --;
}
} else if ((!left || (left->num == T->t - 1))
&& (!right || (right->num == T->t - 1))) {
if (left && left->num == T->t - 1) {
btree_merge(T, node, idx-1);
child = left;
} else if (right && right->num == T->t - 1) {
btree_merge(T, node, idx);
}
}
}
btree_delete_key(T, child, key);
}
}
int btree_delete(btree *T, KEY_VALUE key) {
if (!T->root) return -1;
btree_delete_key(T, T->root, key);
return 0;
}
int btree_bin_search(btree_node *node, int low, int high, KEY_VALUE key) {
int mid;
if (low > high || low < 0 || high < 0) {
return -1;
}
while (low <= high) {
mid = (low + high) / 2;
if (key > node->keys[mid]) {
low = mid + 1;
} else {
high = mid - 1;
}
}
return low;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SUB_M 3 // M=6, and SUB_M=M/2
typedef int KEY_VALUE;
typedef struct _btree_node {
//int keys[2 * SUB_M - 1]; // 存储关键字,M-1
KEY_VALUE *keys; // 存储关键字
// void *value; // 存储数据
struct _btree_node **childrens; // 子树,M
int num; // 已存储的key数量
int leaf; // 是否为叶子结点
}btree_node;
typedef struct _btree {
btree_node *root;
int t; // M阶,t=M/2
}btree;
btree_node *btree_create_node(int t,int leaf)
{
btree_node *node = (btree_node *)calloc(1, sizeof(btree_node));
if (node == NULL)
return NULL;
node->keys = (KEY_VALUE *)calloc(1, (2 * t - 1)*sizeof(KEY_VALUE));
if (node->keys == NULL)
{
free(node);
return NULL;
}
node->childrens = (btree_node **)calloc(1, (2 * t)*sizeof(btree_node*));
if (node->childrens == NULL)
{
free(node->keys);
free(node);
return NULL;
}
node->leaf = leaf;
node->num = 0;
return node;
}
void btree_destroy_node(btree_node *node)
{
if (node == NULL)
return;
if (node->childrens != NULL)
free(node->childrens);
if (node->keys != NULL)
free(node->keys);
free(node);
}
/**********************分裂 split************************/
// 子节点分裂
void btree_split_child(btree *T, btree_node *x, int idx)
{
int t = T->t;
btree_node *y = x->childrens[idx];
btree_node *z = btree_create_node(t,y->leaf);
z->num = t - 1;
int i = 0;
for (i = 0; i < t - 1; i++)
z->keys[i] = y->keys[t + i];
if (y->leaf == 0)//inner,内节点
{
for (i = 0; i < t; i++)
z->childrens[i] = y->childrens[t + i];
}
y->num = t-1;
// 移动、插入结点
for (i = x->num; i >= idx + 1; i--)
{
x->childrens[i + 1] = x->childrens[i];
}
x->childrens[idx + 1] = z;
// key 交换
for (i = x->num-1; i >= idx; i--)
{
x->keys[i + 1] = x->keys[i];
}
x->keys[idx] = y->keys[t-1];
x->num += 1;
}
/*************************分裂 split end*****************************/
// 创建根结点
void btree_create(btree *T, int t) {
T->t = t;
btree_node *x = btree_create_node(t, 1);
T->root = x;
}
void btree_insert_nonfull(btree *T, btree_node *x, KEY_VALUE k) {
int i = x->num - 1;
if (x->leaf == 1) {
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
x->num += 1;
}
else {
while (i >= 0 && x->keys[i] > k) i--;
if (x->childrens[i + 1]->num == (2 * (T->t)) - 1) {
btree_split_child(T, x, i + 1);
if (k > x->keys[i + 1]) i++;
}
btree_insert_nonfull(T, x->childrens[i + 1], k);
}
}
void btree_insert(btree *T, KEY_VALUE key) {
btree_node *r = T->root;
if (r->num == 2 * T->t - 1) {
btree_node *node = btree_create_node(T->t, 0);
T->root = node;
node->childrens[0] = r;
btree_split_child(T, node, 0);
int i = 0;
if (node->keys[0] < key) i++;
btree_insert_nonfull(T, node->childrens[i], key);
}
else {
btree_insert_nonfull(T, r, key);
}
}
/*************************合并 merge*****************************/
void btree_merge(btree *T, btree_node *x, int idx)
{
btree_node *left = x->childrens[idx];
btree_node *right = x->childrens[idx + 1];
int i = 0;
// 合并keys
left->keys[T->t-1] = x->keys[idx];
for (i = 0; i < T->t-1; i++)
{
left->keys[T->t + i] = right->keys[i];
}
// 如果不是子树,需要拷贝结点
if (!left->leaf) {
for (i = 0; i < T->t; i++) {
left->childrens[T->t + i] = right->childrens[i];
}
}
left->num += T->t;
btree_destroy_node(right);
// x 的key前移
for (i = idx + 1; i < x->num; i++)
{
x->keys[i - 1] = x->keys[i];
x->childrens[i] = x->childrens[i + 1];
}
x->childrens[i + 1] = NULL;
x->num -= 1;
if (x->num == 0) {
T->root = left;
btree_destroy_node(x);
}
}
void btree_delete_key(btree *T, btree_node *node, KEY_VALUE key) {
if (node == NULL) return;
int idx = 0, i;
while (idx < node->num && key > node->keys[idx]) {
idx++;
}
if (idx < node->num && key == node->keys[idx]) {
if (node->leaf) {
for (i = idx; i < node->num - 1; i++) {
node->keys[i] = node->keys[i + 1];
}
node->keys[node->num - 1] = 0;
node->num--;
if (node->num == 0) { //root
free(node);
T->root = NULL;
}
return;
}
else if (node->childrens[idx]->num >= T->t) {
btree_node *left = node->childrens[idx];
node->keys[idx] = left->keys[left->num - 1];
btree_delete_key(T, left, left->keys[left->num - 1]);
}
else if (node->childrens[idx + 1]->num >= T->t) {
btree_node *right = node->childrens[idx + 1];
node->keys[idx] = right->keys[0];
btree_delete_key(T, right, right->keys[0]);
}
else {
btree_merge(T, node, idx);
btree_delete_key(T, node->childrens[idx], key);
}
}
else {
btree_node *child = node->childrens[idx];
if (child == NULL) {
printf("Cannot del key = %d\n", key);
return;
}
if (child->num == T->t - 1) {
btree_node *left = NULL;
btree_node *right = NULL;
if (idx - 1 >= 0)
left = node->childrens[idx - 1];
if (idx + 1 <= node->num)
right = node->childrens[idx + 1];
if ((left && left->num >= T->t) ||
(right && right->num >= T->t)) {
int richR = 0;
if (right) richR = 1;
if (left && right) richR = (right->num > left->num) ? 1 : 0;
if (right && right->num >= T->t && richR) { //borrow from next
child->keys[child->num] = node->keys[idx];
child->childrens[child->num + 1] = right->childrens[0];
child->num++;
node->keys[idx] = right->keys[0];
for (i = 0; i < right->num - 1; i++) {
right->keys[i] = right->keys[i + 1];
right->childrens[i] = right->childrens[i + 1];
}
right->keys[right->num - 1] = 0;
right->childrens[right->num - 1] = right->childrens[right->num];
right->childrens[right->num] = NULL;
right->num--;
}
else { //borrow from prev
for (i = child->num; i > 0; i--) {
child->keys[i] = child->keys[i - 1];
child->childrens[i + 1] = child->childrens[i];
}
child->childrens[1] = child->childrens[0];
child->childrens[0] = left->childrens[left->num];
child->keys[0] = node->keys[idx - 1];
child->num++;
node->keys[idx - 1] = left->keys[left->num - 1];
left->keys[left->num - 1] = 0;
left->childrens[left->num] = NULL;
left->num--;
}
}
else if ((!left || (left->num == T->t - 1))
&& (!right || (right->num == T->t - 1))) {
if (left && left->num == T->t - 1) {
btree_merge(T, node, idx - 1);
child = left;
}
else if (right && right->num == T->t - 1) {
btree_merge(T, node, idx);
}
}
}
btree_delete_key(T, child, key);
}
}
int btree_delete(btree *T, KEY_VALUE key) {
if (!T->root) return -1;
btree_delete_key(T, T->root, key);
return 0;
}
/******************测试************************/
void btree_print(btree *T, btree_node *node, int layer)
{
btree_node* p = node;
int i;
if (p) {
printf("\nlayer = %d keynum = %d is_leaf = %d\n", layer, p->num, p->leaf);
for (i = 0; i < node->num; i++)
printf("%c ", p->keys[i]);
printf("\n");
layer++;
for (i = 0; i <= p->num; i++)
if (p->childrens[i])
btree_print(T, p->childrens[i], layer);
}
else printf("the tree is empty\n");
}
int main() {
btree T = { 0 };
btree_create(&T, SUB_M);
srand(48);
int i = 0;
char key[30] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for (i = 0; i < 26; i++) {
//key[i] = rand() % 1000;
printf("%c ", key[i]);
btree_insert(&T, key[i]);
}
btree_print(&T, T.root, 0);
for (i = 0; i < 26; i++) {
printf("\n---------------------------------\n");
btree_delete(&T, key[25 - i]);
//btree_traverse(T.root);
btree_print(&T, T.root, 0);
}
return 0;
}
B树是多叉树的一种,但B树不等于多叉树;B树的主要目的是降低层高。B树和B+树的区别在于B树的所有结点都是存储数据的;而B+树的内结点不存储数据,而是作为索引,数据存储在外结点;B+树更适合做磁盘索引,性能优于B树。
假设一颗M阶B树,它满足:
(1)每个结点至多拥有M颗子树;
(2)根结点至少拥有两颗子树;
(3)除了根结点以外,其余每个分支结点至少拥有M/2颗子树;
(4)所有叶子结点在同一层上;
(5)有K颗子树的分支结点则存在K-1个关键字,关键字按照递增顺序进行排序;
(6)关键字数量满足ceil(M/2)-1 <=n<=M-1。